Neorickettsia risticii surface-exposed proteins: proteomics identification, recognition by naturally-infected horses, and strain variations

Neorickettsia risticii is the Gram-negative, obligate, and intracellular bacterial pathogen responsible for Potomac horse fever (PHF): an important acute systemic disease of horses. N. risticii surface proteins, critical for immune recognition, have not been thoroughly characterized. In this paper, we identified the 51-kDa antigen (P51) as a major surface-exposed outer membrane protein of older and contemporary strains of N. risticii through mass spectrometry of streptavidin-purified biotinylated surface-labeled proteins. Western blot analysis of sera from naturally-infected horses demonstrated universal and strong recognition of recombinant P51 over other Neorickettsia recombinant proteins. Comparisons of amino acid sequences for predicted secondary structures of P51, as well as Neorickettsia surface proteins 2 (Nsp2) and 3 (Nsp3) among N. risticii strains from horses with PHF during a 26-year period throughout the United States revealed that the majority of variations among strains were concentrated in regions predicted to be external loops of their β-barrel structures. Large insertions or deletions occurred within a tandem-repeat region in Ssa3. These data demonstrate patterns of geographical association for P51 and temporal associations for Nsp2, Nsp3, and Ssa3, indicating evolutionary trends for these Neorickettsia surface antigen genes. This study showed N. risticii surface protein population dynamics, providing groundwork for designing immunodiagnostic targets for PHF.

It was determined that N. risticii has similar genetic, antigenic, and morphologic characteristics to Neorickettsia helminthoeca [25,26], which were the major reasons it, as well as Neorickettsia (formerly Rickettsia, Ehrlichia) sennetsu, was regrouped into the genus Neorickettsia [27]. In addition, the bacterial parasite, known as the Stellantchasmus falcatus (SF) agent, isolated from metacercariae in fish from Japan and Oregon [28][29][30] belongs to this group. N. risticii also consists of a variety of strains, based on PCR and sequencing of 16S RNA and groEL, Western blot analyses using purified bacteria as antigen, and morphology [20,22,24,31].
Little is known about N. risticii surface-exposed proteins, and this missing information is crucial in the understanding of bacterium-host cell interactions. Antigenic and potential surface proteins ranging between 28 and 110-kDa in mass were previously detected by Western blotting, but these proteins were not identified [32]. Immunoprecipitation of N. risticii labeled with I 125 and N. risticii immune mouse sera revealed potential surface proteins ranging from 25 to 62-kDa in mass, although these proteins were not identified [33]. Antigenic proteins of 70, 55, 51, and 44-kDa masses have been demonstrated utilizing recombinant proteins; again the proteins were not identified [34]. Two highly-immunodominant proteins in two N. risticii strains were identified as GroEL and the 51-kDa antigen (P51) [35], but it was not shown whether these proteins were surface exposed. Strain-specific antigen (Ssa) was suggested as a surface immunogenic protein with potential use in vaccine production, although it was not determined to be bacterial surface exposed [24,36].
The identification of Neorickettsia proteins is now achievable with the availability of whole genome sequencing data on both the type strain (Miyayama) of N. sennetsu [37] and the type strain (Illinois) of N. risticii [38]. In this paper, we determined 1) major surface proteins by proteomics analysis on N. risticii, 2) horse immune recognition of N. risticii surface proteins, and 3) strain variations in aligned sequences of these major surface proteins with respect to their predicted secondary structures.
Biotinylation and streptavidin-affinity purification of N. risticii surface proteins Biotinylation of purified N. risticii Illinois and PA-1 from twenty-five 75-cm 2 flasks using EZ Link Sulfo-NHS-SS-Biotin (Pierce Biotechnology, Rockford, IL, USA) and subsequent bacterial lysis and collection of solubilized bacterial proteins were performed as previously described [39]. Streptavidin purification of Sulfo-NHS-SS-Biotinylated N. risticii proteins was then performed, followed by SDSpolyacrylamide gel electrophoresis (PAGE) and fixation and GelCode blue (Pierce) staining of the gel [39]. Proteins from seven bands from N. risticii Illinois and proteins from four bands or band collections from PA-1 were identified by capillary-liquid chromatography-nanospray tandem mass spectrometry (Nano-LC/MS/MS) as previously described [40].
Polymerase chain reaction, sequencing, and sequence alignment DNA was purified from buffy coats of PHF-positive horses or cultures of N. risticii in P388D 1 cells using the DNeasy Blood and Tissue Kit (QIAGEN, Valencia, CA, USA), according to manufacturer's instructions. PCR amplification was then performed using either Phusion or Taq DNA polymerase (New England BioLabs, Ipswich, MA, USA) and primers designed for conserved regions through alignment of multiple Neorickettsia spp. and/or N. risticii strains (see Additional file 1). Sequencing was performed by The Ohio State University Plant-Microbe Genomics Facility. Sequences containing whole genes or gene fragments were translated and aligned mainly through the CLUSTAL W (slow/accurate) method in the MegAlign program of DNAStar (DNAStar, Madison, WI, USA); P51 was first aligned by CLUSTAL V (PAM250) method, and Ssa3 was aligned both by CLUSTAL W and manually. External loops were also aligned separately by CLUSTAL W for both P51 and Nsp3. Amino acid (aa) variations in N. risticii strains and other Neorickettsia spp. for all proteins were determined in relation to N. risticii Illinois. Protein alignments of the same size (including deletions as dashes) were analyzed by PHYLIP (v3.66) to obtain bootstrap values for 1000 replicates (using the programs SeqBoot, Protdist, Neighbor, and Consense) and to create dendrograms (using the programs Protdist, Neighbor, and Drawgram) [42]. Protein properties, including antigenicity profiles and β-sheet predictions were determined using the Protean program (DNAStar). Gene and protein sequence homologies were also demonstrated using Basic Local Alignment Search Tool (BLAST) algorithms, including blastn, protein-protein blastp, and blastp [43,44].

Prediction of secondary structures
Predictions for Nsp2 and Nsp3 were based on a combination of the programming algorithm in the PRED-TMBB web server [45], hydrophobicity and hydrophobic movement profiles [46], and DNAStar MegAlign (DNAStar, Madison, WI, USA) alignment and analyses of all available N. risticii strain and Neorickettsia spp. sequences.

Nano-LC/MS/MS of streptavidin-affinity purified surface proteins
Given that only the N. risticii Illinois genome (NC_013009) has been sequenced [38], these data were used for proteomic analyses. Four N. risticii proteins in N. risticii Illinois (1984 isolate) and five N. risticii proteins (with conserved peptide sequences in relation to N. risticii Illinois) in PA-1 (2000 isolate) contained two or more peptide queries identified by Nano-LC/MS/MS (Table 3). Proteins identified for N. risticii Illinois were P51, GroEL (NRI_0614), Nsp3, and a conserved hypothetical protein (NRI_0567). The largest protein coverage and the largest number of peptides identified were both from P51. Proteins identified in PA-1 also included P51 and GroEL; the largest number of peptides was from P51. Minor proteins identified in PA-1 strain were DnaK (NRI_0017), ATP synthase F1, alpha subunit (AtpA, NRI_0132), and strain-specific antigen 3 (Ssa3, NRI_0872).

Immune recognition of major surface antigens by PHFpositive horse sera
Bacterial surface-exposed proteins are generally major antigens [47]. Though only Nsp3 was detected on the surface of N. risticii Illinois by nano-LC/MS/MS, rNsp2 was included in the Western blotting studies because both Nsp3 and Nsp2 from N. sennetsu Miyayama are significant surface proteins ( Figure 1, Table 4) [39]. All 15 PHF-positive samples demonstrated recognition of rP51, with 11 out of 15 sera having strong recognition. N. sennetsu Miyayama GroEL is 98% identical to N. risticii Illinois GroEL, and antisera to rGroEL of N. sennetsu cross-reacts with GroEL from multiple species of Rickettsiales, including N. risticii [41]. Six out of 15 PHF-positive serum samples demonstrated strong reactivity to rGroEL, with the rest having weak to no reactivity. Nsp2 and Nsp3 from N. sennetsu Miyayama are 83% and 84% identical to Nsp2 and Nsp3 from N. risticii Illinois, respectively, using protein-protein blastp. Only one serum sample reacted strongly to rNsp2, with the rest having weak to no reactivity. Three sera reacted strongly to rNsp3, with the rest having weak to no reactivity. All negative controls did not recognize any of the recombinant proteins.

Sequence variation in P51
P51 sequences are known to be strain variable [5,30]. Since P51 was found to be the major target of horse immune recognition, we examined in which part of the P51 molecule sequence variations occur. N. sennetsu P51 was predicted to have 18 transmembrane β-barrel proteins with nine external loops [39]. N. sennetsu and the SF agent, which are closely-related to N. risticii [28,30,48] were included for comparison. P51 alignments of a total of 52 sequences and sequence fragments from N. risticii during a 26-year period throughout the United States revealed high variability within regions corresponding to external loops 2 and 4 ( Figure 2). Forty-three P51 sequence fragments (aa 136-176) containing most of external loop 2 (aa 120-176), and 36 P51 sequence fragments (aa 259-286) containing the entire external loop 4 were analyzed using PHYLIP (Figure 3a and 3b). Both loops 2 and 4 created patterns of clustering for sequences from states in the Eastern and Midwestern United States (East/Midwest US) and sequences from Japan, Malaysia, and US states bordering the Pacific Ocean (Pacific coast). The California strain Doc and the Ohio strain 081 did not follow this pattern, both being in East/Midwest US for external loop 2 and in Pacific coast for external loop 4. In external loop 2, N. risticii Illinois was only loosely associated with the other East/Midwest US sequences; in external loop 4, N. risticii Illinois tightly clustered with several East/Midwest US sequences. External loop 4 of 081 clustered with the SF agent strains rather than with other N. risticii strains. All samples, except for PA-1 and SF Oregon are from naturally-infected horses. PA-1 is an isolate from an experimental equine infection utilizing N. risticiiinfected insects from Pennsylvania [6]. Both 081 and OV are strains of N. risticii previously described and with unique morphologies and sequences [5,20,22]. SF Oregon is a strain of the Stellantchasmus falcatus agent [30]. b The largest fragment size acquired containing the given gene(s) is shown. Multiple fragments may be present for a sample.

Sequence variation in Nsp2
Nsp2 sequences of N. risticii, other than the sequence from N. risticii Illinois, have not been determined. Nsp2 was predicted to have eight transmembrane β-barrel domains with four external loops. A total of 20 Nsp2 proteins and protein fragments were aligned. Amino acid variations were determined in relation to N. risticii Illinois. Variations mainly occurred in external loops, with the most variation occurring within external loop 4 ( Figure 4a). Full-length Nsp2 (including the signal peptide), with 11 sequences total, as well as the external loop 4 region (aa 244-297) with 19 sequences total were analyzed by PHYLIP (Figure 4b and 4c). For full-length Nsp2 and external loop 4, most N. risticii strains obtained after the year 2000 (post-2000 strains, Table 1) were 100% identical, whereas other strains were more diverse (Figure 4b and 4c). Nsp2 for both N. risticii Illinois and Herodia (which were 100% identical) were unique to all other N. risticii strains. For full-length Nsp2, 081 clustered with SF Oregon, rather than with other N. risticii strains. Additionally, external loop 2 (also demonstrating high variation) showed similar patterns of clustering as seen in full-length Nsp2 and external loop 4; the exceptions were MN, which was 100% identical to N. risticii Illinois and Herodia, and OH07-4, which had one amino acid difference in comparison to the majority of post-2000 strains in this region (data not shown).

Sequence variation in Nsp3
Nsp3 sequences of N. risticii, except for the sequence from N. risticii Illinois have also not been determined.
Nsp3 was predicted to have eight transmembrane β-barrel proteins with four external loops. Alignment of a total of 21 Nsp3 proteins and protein fragments demonstrated the highest variation within predicted external loop 2, yet there was less variation in the C-terminal region comprising external loops 3 and 4 ( Figure 5a). Fourteen fulllength Nsp3 sequences (including signal peptides) and 17 external loop 2 regions (aa 102-136) were analyzed by PHLYIP (Figure 5b and 5c). As seen in Nsp2, N. risticii  Illinois had marked differences to other sequences, in particular to most post-2000 strains (Table 1). TN02-1 and IL01-1 had the highest similarity to N. risticii Illinois.

Sequence variation in Ssa3
Ssa3 sequences of N. risticii, other than that of N. risticii Illinois have not been ascertained. Ssa3 was included in the analysis, since unknown Ssas were previously reported as major N. risticii surface antigens in the 1984 Maryland strain 25-D and the 1990 Maryland strain 90-12 [31], and a small amount Ssa3 was detected in both N. risticii PA-1 in this study and in N. sennetsu Miyayama [39]. There was no signal peptide predicted for Ssa3 [38], and Ssa3 was not predicted to have a β-barrel structure. It was originally shown that ssas contain a wide variety of mainly small repeats of 10-55 bp in size [31]. Tandem repeats ranging in size from 63-156 bp are present in ssa1, ssa2, and ssa3 of N. risticii Illinois [38]. In particular, the N terminus of Ssa3 contains 2.2 copies of a 52-aa (156 bp) tandem repeat in N. risticii Illinois (aa   [38]. Thirteen Ssa3 proteins and protein fragments were aligned and compared (Figure 6a). Within this N-terminal repeated region, Neorickettsia spp.
consisted of anywhere from zero to four repeated 52-aa peptides arranged in tandem followed by a terminal 40aa peptide similar to the 52-aa repeats (for N. risticii Illinois: 50% identical, E-value = 6 × 10 -8 , using protein-protein blastp). It appears that the number of 52-aa repeats increases over time; six post-2000 strains (Table 1)

Sequence variation in Ssa1
Ssa1 sequences of N. risticii, other than that of N. risticii Illinois have not been determined. Given the strongest similarities between ssa1 of N. risticii Illinois and the unknown ssas from N. risticii strains 25-D (isolated in 1984) and 90-12 (isolated in 1990) [38], two ssa1 fragments  Table 4.  (Figure 6b).

Discussion
The genes p51, nsp2, nsp3, and ssa3 are uniquely evolved in Neorickettsia spp. The gene p51 is a single copy gene and demonstrates only loose associations with other proteins of the family Anaplasmataceae [37,38]. The nsps and ssas are both potential operons, consisting of three genes tandemly arranged [38]. The nsps belong to pfam01617, and similar to Ehrlichia chaffeensis omp-1 (p28) genes (also from pfam01617) [49], the proteins  encoded by nsp2 and nsp3 were strain variable. As seen in the ssas, other members of the family Anaplasmataceae have genes encoding proteins containing strain-variable tandem repeats (involving amino acid variation and changes in the numbers of tandem repeats), including Trp120 (formerly gp120), Trp47 (formerly gp47), and VLPT (variable-length PCR target) from E. chaffeensis and Trp140 (formerly gp140), Trp36 (formerly gp36), and gp19 from Ehrlichia canis [50][51][52]. Of note, the proteins encoded by the ssas are not homologous to any proteins of the family Anaplasmataceae by blastp. Among p51, the nsps, and the ssas, there have been no signs of intragenomic recombination events, which are seen in the Anaplasma p44/msp2 expression locus [53,54]. Proteomics results performed on two strains of N. risticii established that P51 is a dominant surface-expressed protein. The recognition of recombinant P51 by PHF horse sera, even by 1:80 IFA titer sera suggests P51 is expressed and highly recognized within the present day naturallyinfected horses. Despite P51 amino acid sequence variation among N. risticii strains, this strong universal recognition by horse immune sera suggests rP51 may serve as a defined serodiagnostic antigen. Furthermore, the study suggests that there are immunodominant conserved  Tables 1 and 2. peptide sequences within P51 which might serve as even more specific PHF diagnostic antigens.
Sequence comparison of these surface-exposed proteins of N. risticii strains, with respect to the predicted protein secondary structure, the majority of which are clinical isolates, indicates there are hot spots within the genes with greater strain divergence. These include external loops 2 and 4 in P51, external loop 4 in Nsp2, external loop 2 in Nsp3, and the repeated region of Ssa3. P51 showed strong geographical association; and    Table 1.
There are outlier strains which do not fit the geographical and temporal patterns. These include 081 [20,22], the Kentucky strain OV [22], and the Kentucky strain Herodia. Unique sequences in other N. risticii strains, such as TN02-1 (P51, Nsp2, and Nps3), KY03-3 (Nsp2), IL01-1 (Nsp3), and OH10-1 (Ssa3), suggest that variation contrary to the popular geographical and temporal influences may be more widespread. When additional contemporary sequences and sequences from more varied geographic regions become available, these analyses are expected to improve.
Possible explanations for extensive DNA sequence variation within Neorickettsia include the defective DNA repair systems in both N. risticii and N. sennetsu [37,38]. This would result in higher mutation rates for Neorickettsia [56], which would agree with the temporal changes and the production of outlier strains of N. risticii. P51 variation showed substantial geographical association, suggesting these variations were selected under local environmental pressures. It is possible that geographical association of N. risticii sequence variation is due to N. risticii strains being selected within essential reservoir trematode populations. In addition, diverse N. risticii strains may have emerged due to selective pressures inflicted on the infected trematodes and/or on the trematodes' hosts [4][5][6][7][8][9][57][58][59]. Humoral immunity would thus not play any direct role in creating genetic diversity within N. risticii populations. Since Neorickettsia spp. are known (N. risticii and N. helminthoeca) and suspected to be vertically transmitted within their trematode hosts [8,13,60], mammalian infection is not expected to be required for maintaining Neorickettsia in the natural environment.
Regardless the cause, this genetic variation would result in increased N. risticii survival as a species. N. risticii surface protein genetic diversity revealed in the present study will help in understanding variations in PHF virulence and clinical signs. It may also be possible to use this new molecular knowledge for vaccine development. It would, however necessitate taking into account that the pathogen is an obligate intracellular pathogen, indicating that not only humoral immune responses, but also cell-mediated immunity would play an active role in preventing bacterial infection [61][62][63].
Genes encoding the two original Ssas, called P85 (90-12) and P50 (25-D) are most related to ssa1 from N. risticii Illinois [24,31,38,55], but they also show similarities to ssa2 and the non-coding region between ssa1 and ssa2 using blastn. Although both are Maryland isolates, the 25-D strain was isolated six years earlier than the 90-12 strain [31], suggesting both temporal variation and the potential development of chimeras of multiple Ssas and non-coding regions in P50, P85, and post-2000 Ssa1 (due to the similarities of PA-1 and OH07-1 Ssa1 fragments to P85). It is possible that the high variability of Ssa1 may have prevented PA-1 Ssa1 from being identified by proteomics. However, there is the obvious lack of large numbers of peptides identified by proteomics for Ssas in N. risticii Illinois using the isogenic Illinois strain sequence data and in N. sennetsu using Miyayama isogenic strain data [39]. It is likely that Ssas are not a dominant surface protein in mammalian cells.
In conclusion, our data demonstrate the variety present within major surface proteins of N. risticii, and they suggest conservation among geographical regions and time periods. In addition, P51 is implicated as the major surface antigen of N. risticii. These data will be valuable in developing better diagnostic methods and may help in the development of more efficacious vaccines.