Different regions of the CSFV genome have been proposed for phylogenetic analysis, namely fragments of the 5´NTR as well as partial E2 and NS5B encoding regions [7, 8, 16, 17, 36]. During the past two decades, determination of 5´NTR and E2 fragment sequences became the world-wide accepted standard for characterization of CSFV isolates, although this strategy has several limitations which are mainly due to the short sequence lengths of these regions. Today, new technological developments like next-generation sequencing allow rapid determination of full-length sequences, but due to limited access and high expenses the application of such techniques will be restricted to a limited number of institutions and a small number of selected CSFV isolates in the near future. Against this background, rapid and reliable diagnostics in outbreak situations will still rely on analysis of adequate, shorter genomic regions on the basis of an internationally harmonized standard.
To establish an improved strategy for CSFV phylogeny, the 5´NTR-E2 sequences of 33 CSFV isolates from the virus collection held at the EU and OIE Reference Laboratory for CSF (EURL) were determined in this study and used for comparative sequence analyses. For all isolates, including representatives of the three major genotypes, specific amplicons could be generated by RT-PCR using conserved primers. These virus isolates include frequently requested reference strains, isolates of rare CSFV genotypes as well as isolates obtained from recent CSF outbreaks (e.g. in Slovakia, Hungary, Lithuania). It was not possible to include isolates of all known subgenotypes as some subgenotypes (e.g. 3.1, 3.2 and 3.3) are very difficult to obtain and are not represented in the virus collection of the EURL. For most of the sequenced isolates only the short 5´NTR (150 nt) and E2 fragment (190 nt) sequences were available beforehand. Therefore, the 5´NTR-E2 sequences (3508–3510 nt) reported in the present study add significant sequence information to this collection of CSFV isolates. The majority of CSF outbreaks, which occurred during the past decades in Europe, were caused by genotype 2 viruses. In consequence, mainly sequences of genotype 2 virus isolates were determined, comprising 19 isolates of subgenotype 2.3 and five isolates of subgenotypes 2.1 and 2.2 each. Furthermore, 5´NTR-E2 sequences of the two distinct isolates “Congenital Tremor” (CSF0410, no assigned genotype) and “Kanagawa” (CSF0309, genotype 3.4), the reference strain “Brescia” (CSF0947, genotype 1.1) and one Malaysian isolate (CSF0306) of the rare genotype 1.3 were determined.
With regard to the entire 5´NTR-E2 sequences determined in this study and 22 additional sequences obtained from GenBank, all CSFV isolates were assigned to established genotypes and subgenotypes (Figure 3). Our analyses revealed that CSFV “strain 39” [GenBank: AF407339], which has been previously described to be a natural recombinant strain of parental subgenotype 1.1 and 2.1 isolates , actually represents a chimera of subgenotype 1.1 and 2.2 isolates (Figure 3, Figure 4). Furthermore, it was recognized that strain The Netherlands/xxxx “Bergen” (CSF0906, subgenotype 2.2) partially displayed a higher genetic similarity to some genotype 2.1 isolates, e.g. to CSFV isolate CSF0021, than to different 2.2 isolates (data not shown). This observation might be a hint for a recombination event between subgenotype 2.1 and 2.2 isolates and is under further investigation. In consequence, strain The Netherlands/xxxx “Bergen” (CSF0906) might disturb segregation of 2.1 and 2.2 isolates when further 2.1 and 2.2 isolates are added in phylogenetic analysis.
Variability and length of analyzed sequences are crucial parameters for the reliability of phylogenetic analyses. The overall variability observed for the different genomic regions is astonishingly uniform (Table 4). Exceptions are the more conserved fragment in the 5´NTR and the slightly more variable E2 fragment. In consequence, not variability but length of the used sequence seems to be crucial to optimize resolution and confidence levels of CSFV phylogeny. Low variability of 9% (14/150 nucleotide positions) in concert with the short sequence length of 150 nt explains the intrinsic limitation of the 5´NTR for phylogenetic analyses. Due to its variability, the 190 nt E2 fragment has the greatest intrinsic discriminatory ability with respect to the above mentioned 5´NTR, E2, and NS5B fragments . The E2 fragment encodes for the N-terminal part of the E2 protein harbouring several neutralizing epitopes resulting in selective pressure [22, 37–39]. When comparing the variability of the sequences encoding for the major immunogen E2 and the sequences of other viral proteins like Npro, E1 or C, which do not elicit a detectable immune response upon infection, it can be concluded that selection pressure mediated by specific immune reactions is not a major cause of E2 divergence since the overall sequence divergence in other genomic regions reaches similar levels (Table 4). Nevertheless, it can be speculated that lack of antigenic selection pressure might be a reason for the failure of Npro- and E1-based analyses to discriminate genotype 1.1 and 1.2 isolates (data not shown). Genotype 1 represents an old and therefore highly variable CSFV genotype. Antigenic selection pressure might have been an important force for development of the 1.1 and 1.2 subgenotypes, while sequence divergence is less pronounced in genomic regions encoding for less immunogenic proteins like Npro and E1. In the present study, analysis of genetic variability in the regions encoding the individual viral proteins (overall 46% variable positions) did not identify regions of adequate length that are more variable than the 504 nt Npro encoding sequence and the 190 nt E2 fragment (50% variable positions). Taking into account the above mentioned limitations of the short 5’NTR fragment as well as the limitations of the nucleotide sequences encoding Npro and E1 for CSFV phylogeny, extension of the short sequence of the E2 fragment to full-length E2 gene sequences is an excellent strategy to obtain data for reliable and detailed phylogenetic analyses (Figure 4).
Calculation and analysis of genetic distances with respect to full-length E2 encoding sequences revealed that genetic distances of more than 15% define a genotype and distances of less than 14% can be found on subgenotype and isolate level (Figure 2). These values will probably not have consistency with an increasing number of analyzed sequences. Furthermore, it was not possible to define universally valid breakpoints between isolate and subgenotype level. Discrimination of the isolate and subgenotype categories based on previously reported ranges for the NS5B fragment (4.5% and 10.5% genetic distance, respectively) is not supported by the analyses of the presented study .
For phylogenetic analysis, the use of a standardized method for tree calculation is desirable to achieve a better comparability of internationally published data. In the presented study, genetic distances calculated by the Kimura 2-parameter method and phylogenetic trees generated by Neighbor Joining method subsequently rooted at the strain “Congenital Tremor” (CSF0410) - representing the isolate most distinct from all other CSFV isolates known so far - led to appropriate tree topologies and reliable confidence levels (Figure 3, Figure 4). The phylogenetic trees either generated with full-length E2 encoding sequences or with the 5´NTR-E2 sequences showed the same segregation of CSFV isolates into genotypes and subgenotypes. Compared to E2 full-length sequences, the sequences derived from the 5´NTR and E2 fragments which are currently used for phylogenetic analyses are considerably less suited for differentiation and tracing of CSFV isolates. In case of the 5´NTR fragment the sequence length and intrinsic variability are too low and in case of the E2 fragment the short sequence length significantly limits the information content and consequently diminishes confidence levels of many groupings. The data presented in Figure 3 and Table 5 demonstrate the limited ability of the 5´NTR based trees to differentiate between isolates within a certain subgenotype. In addition, analysis of the 5´NTR fragments fails to segregate isolates into defined subgenotypes as observed for 1.1 and 1.2. This problem was also recognized earlier with other isolates of genotype 1 . Segregation within genotype 1 can be improved by using the E2 fragment, but within a subgenotype, like 2.3, the ability to differentiate closely related isolates (e.g. Slovakian isolates) is still insufficient (Figure 4). Moreover, the trees generated with the E2 fragment sequences display only very low confidence levels which do not allow a further division of the established subgenotypes or a reliable epidemiological interpretation. The high similarity among European isolates, mainly belonging to genotype 2, makes the implementation of a strategy based on larger sequence sets an incontrovertible necessity. This is illustrated by the following examples of CSFV isolates not distinguishable on basis of the short 5´ NTR sequences (Table 5). With respect to the analyzed 5´NTR-E2 sequences, the two isolates CSF0277 (Germany, 1997) and CSF0283 (The Netherlands, 1997) differed in two sites, one of them located in the E2 encoding sequence. These isolates were obtained from a cross-border epidemic and have a direct epidemiological link . Isolates CSF1027 and CSF1032 were obtained from wild boar during the 2007 epidemic in Slovakia and Hungary, respectively, and displayed two nucleotide differences in the E2 encoding sequences. Closely related virus isolates obtained from different German CSF outbreaks in the 1990s (CSF0083 and CSF0600; CSF0485 and CSF0638) were clearly distinguishable on the basis of full-length E2 encoding sequences (Figure 4, Table 5). Furthermore, isolates displaying a high degree of sequence similarity without an epidemiological link (e.g. isolates “LOM” and “Alfort187”) also illustrate the discriminatory ability of the full-length E2 encoding sequences. These examples as well as the recent experiences regarding the Lithuanian outbreaks in 2009 and 2011 clearly demonstrate that the information obtained by analysis of the full-length E2 encoding sequences allows to discriminate even between very closely related virus isolates from the same epidemic and from (nearly) the same geographical origin (Figure 5). Assuming a mutation rate of 3.3 × 10-3 to 3.7 × 10-3 substitutions/nucleotide/year in the E2 encoding sequence as estimated for the E2 fragment sequence [7, 15], approximately 0.6-0.7 nucleotide exchanges may be expected in the short E2 fragment (190 nt) and 3.7-4.1 exchanges in the complete E2 encoding sequence (1119 nt) per year, respectively. Although analysis of full-length E2 encoding sequences results in a significant increase of information, the mutation rate is probably too low for exact determination of infection chains.
To date, both fragments, 5´NTR and E2, are routinely amplified and sequenced for identification and characterization of novel CSFV isolates. The recent CSF outbreak in Lithuania demonstrated that determination of both sequences corresponding to the 5´NTR and E2 fragments was neither able to differentiate between isolates obtained during outbreaks in 2009 and 2011 nor to detect differences between the isolates originating from different outbreak holdings in 2011 (Figure 5). In contrast, phylogenetic analysis of full-length E2 encoding sequences allowed the discrimination of the 2009 and 2011 Lithuanian isolates and identified significant differences between isolates of case no.4 and the isolates of the four other cases. These results suggest that the index case was the source of virus transmission for outbreaks no.2, 3, and 5, while it can be speculated that the virus isolate from case no.4 was introduced either after additional steps of (undetected) transmission or from another source. To allow a reliable interpretation of this finding, more full-length E2 encoding sequences from different CSF epidemics and corresponding epidemiological information need to be analyzed. Against this background, molecular clock analyses of sequences obtained from well documented CSF epidemics would be highly desirable and will be the aim of future studies. Such analyses need to take into account that speed of virus evolution is influenced by many factors including host immunity, vaccination campaigns, presence of virus reservoirs, number of passages in hosts, and last but not least socio-economic determinants . Nevertheless, even without detailed knowledge about speed of molecular evolution in CSF epidemics, the analysis of full-length E2 encoding sequences provides valuable information about the origin of virus introduction as this method increases the probability to identify the ancestral virus isolate. In case of the two Lithuanian outbreaks in 2009 and 2011, identical isolates would have indicated an arrest of molecular clock like in infectious material being frozen (frozen meat, frozen laboratory isolate, etc.). The latter scenario could be clearly excluded by analysis of the full-length E2 encoding sequences. Accordingly, the Lithuanian example illustrates the benefit of phylogenetic analysis of full-length E2 encoding sequences with regard to molecular virus tracing.
Taken together, the proposed strategy based on complete E2 coding sequences allows a clear assignment of CSFV isolates to a subgenotype, results in reliable and statistically significant bootstrap values, and even enables the discrimination of highly similar virus isolates without requiring more time or higher expenses.