Host-range shift of H3N8 canine influenza virus: a phylodynamic analysis of its origin and adaptation from equine to canine host

Prior to the emergence of H3N8 canine influenza virus (CIV) and the latest avian-origin H3N2 CIV, there was no evidence of a circulating canine-specific influenza virus. Molecular and epidemiological evidence suggest that H3N8 CIV emerged from H3N8 equine influenza virus (EIV). This host-range shift of EIV from equine to canine hosts and its subsequent establishment as an enzootic CIV is unique because this host-range shift was from one mammalian host to another. To further understand this host-range shift, we conducted a comprehensive phylodynamic analysis using all the available whole-genome sequences of H3N8 CIV. We found that (1) the emergence of H3N8 CIV from H3N8 EIV occurred in approximately 2002; (2) this interspecies transmission was by a reassortant virus of the circulating Florida-1 clade H3N8 EIV; (3) once in the canine species, H3N8 CIV spread efficiently and remained an enzootic virus; (4) H3N8 CIV evolved and diverged into multiple clades or sublineages, with intra and inter-lineage reassortment. Our results provide a framework to understand the molecular basis of host-range shifts of influenza viruses and that dogs are potential “mixing vessels” for the establishment of novel influenza viruses.


Introduction
Influenza A virus (IAV) is a member of Orthomyxoviridae. The viral genome consists of eight negative-sensed RNA gene segments. According to the antigenicity of the two viral surface proteins, haemagglutinin (HA) and neuraminidase (NA), IAV is currently classified into 17 HA and 9 NA subtypes. All combinations of the 17 HA and 9 NA subtypes have been found in waterfowl, its natural host [1]. However, only a few defined combinations of HA and NA subtypes circulate in mammalian hosts, e.g., H3N2 and H1N1 among humans and pigs and H3N8 among horses. The establishment of these mammalian influenza viruses, including human influenza A virus (hIAV), was the result of interspecies transmission of an avian influenza virus (AIV) in the past, either directly or through reassortment with circulating contemporary virus. Current evidence supports that interspecies transmission of influenza viruses occurs frequently; however, these occurrences usually result in "dead-end transmission", possibly due to lack of viral adaptation to the new host species. The roles of the HA, NP, NS, PB1 and PB2 genes in adaption to mammalian hosts have been implicated [2][3][4][5][6][7][8][9]. However, these studies mostly focused on the interspecies transmission of AIV to humans or experimentally inoculated mouse models.
Prior to 2003, there was no evidence of circulation of a canine-specific influenza virus despite close contact with humans and with other mammalian species, such as swine and equines. Most influenza viruses can be cultivated in Madin-Darby Canine Kidney (MDCK) cells, indicating the potential susceptibility of dogs to influenza virus infection, at least at the cellular level. In fact, there were reports of isolation of H3N2 human influenza A virus (hIAV) from dogs during influenza epidemics [26], as well as serological evidence of hIAV infections in dogs [27]. Natural and experimental transmission of pandemic H1N1/2009 influenza virus to dogs has been reported [28]. Notably, there were reports of H3N8 EIV infection in a British foxhound [29] as well as in dogs in the United States [30]. These reports suggest that dogs can be a host for influenza A virus. However, these influenza infections were "dead-end" infections, as there was no further sustained spread among dogs.
The emergence of H3N8 CIV in Florida during 2003 [31,32] was unique. The high genetic homology of the emerging H3N8 CIV strain to the circulating H3N8 EIV of the Florida-1 clade and the geographic coincidence indicated that H3N8 CIV was likely a result of a hostrange shift event. Interestingly, H3N8 EIV had been isolated from pigs [33], indicating the susceptibility of pigs to this virus. Interestingly, there was no seroconversion after the experimental inoculation of pigs with H3N8 EIV [34]. There is an apparent species-barrier for H3N8 EIV in pigs, although pigs have been implicated as a "mixing vessel" for influenza viruses [1,35,36]. Furthermore, there were no phenotypic differences between H3N8 CIV and H3N8 EIV despite their diverged evolution [37]. Currently, H3N8 CIV is expanding in its geographic distribution and canine host breeds. In addition, CIV has evolved and diverged into multiple lineages and clades.
A significant knowledge gap exists on how and why H3N8 EIV was able to cross the species barrier from equine to canine to become an enzootic CIV. In this study, we performed an extensive and in-depth phylodynamic analysis to address these questions.

Sequence information CIV and EIV whole genome dataset compilation
We conducted an exhaustive search for H3N8 CIV complete genome sequences from the Global Initiative on Sharing All Influenza Data (GISAID) database to generate a working dataset (Additional file 1). A total of 119  HA, 206 M1, 89 NA, 73 NP, 203 NS, 54 PA, 56 PB1, 51   PB2 gene sequences were complied. Likewise, for H3N8  EIV, a total of 349 HA, 212 M1, 331 NA, 170 NP, 233  NS, 153 PA, 158 PB1, and 157 PB2 gene sequences were complied. Eight datasets were created by combining each of the eight gene segments from H3N8 EIV and CIV for subsequent phylogenetic analysis. Phylogenetic analysis of the eight complete datasets was performed using the maximum likelihood (ML) method, followed by a second phylogenetic analysis using a refined subset of gene sequences comprising all H3N8 CIVs and their closely related EIV lineages using RAxML [38]. This refined subset was then analysed to identify the origin of H3N8 CIV.

H3N8 CIV datasets for evolutionary adaptation analysis
After excluding partial sequences and non-full-length genome data, a total of forty-four (n = 44) complete genomes of H3N8 CIVs were included. The length of each gene segment was as follows: HA: 1695 nucleotides (nt); MP: 756 nt; NA: 1407 nt; NP: 1494 nt; NS: 690 nt; PA: 2148 nt; PB1: 2271 nt; and PB2: 2277 nt. For simplicity, only M1 and NS1 were included for analysis despite their alternative internal RNA splice sites. In all, a total of 12,738 nt for the eight gene segments were analysed.

Alignment and model selection
Multiple sequence alignments (MSAs) were generated using MUSCLE (version 3.8.31) [39], followed by manual editing using MEGA (version 7.0) [40]. The best-fit model of nucleotide substitution for each sequence dataset was selected using jModelTest [41] according to the Bayesian information criterion (BIC) score. TempEst (version 1.5.1) was used to analyse the temporal signal in the refined dataset sequences.

Phylogenetic and evolutionary dynamics analyses
All of the maximum-likelihood (ML) trees were constructed by RAxML (version 8.2.4) using the general time reversible (GTR) model and a gamma distribution (G) to account for the rate heterogeneity among sites, and 1000 bootstrap replications were performed [42]. The divergence time between EIV and CIV was estimated using the Bayesian MCMC method implemented in the BEAST (version 1.8.4) package. The GTR or Hasegawa-Kishino-Yano (HKY) model plus gamma distributed rate heterogeneity were used in the nucleotide substitution model. The lognormal relaxed molecular clock was used as the molecular clock model [43], and the Bayesian skyline coalescent model was set as the tree prior. Markov Chain Monte Carlo (MCMC) sampling was run for 10 8 generations, with trees and posteriors sampled every 10 4 steps. Each gene dataset was independently run twice and combined using LogCombiner. After a burn-in of 10%, the final maximum clade credibility (MCC) tree was summarized using TreeAnnotator [44] and visualized using Figtree [43]. In addition, the Bayesian inference (BI) tree of each gene segment was built using MrBayes with the HKY + G model, 10 7 generations and a sampling frequency of 10 3 [45]. After removing the reassortment segment, the time of most recent common ancestor (tMRCA) and evolutionary rates were estimated using the Bayesian MCMC method.

Amino acid substitutions and U content analysis
Consensus sequences of different H3N8 CIV clades were aligned, and corresponding mutations (deviations from the consensus) were identified. The mutation position in each clade was confirmed using MESQUITE (a modular, extendible software for evolutionary biology). The number of amino acid changes in each enzootic cluster was counted. The U content of the HA and NA gene segments from multiple influenza A viruses was calculated using BioEdit [46]. Regression and correlation analyses were performed using GraphPad version 7.

Selection analysis
The MCC trees (Figures 1C, D, 2 and 5) were used as the input reference trees in DATAMONKEY [47], which was used to estimate selection pressures. The SLAC (Single Likelihood Ancestry Counting), FEL (Fixed Impact Probability), MEME (Evolutionary Mixed Effects Model) and FUBAR (Fast, Unconstrained Bayesian Approximation) methods were used to identify codons under positive selection. The branch-site REL model was used to determine the selection pressure along the branches [48][49][50]. We considered p values of SLAC, FEL and MEME less than 0.1 and a FUBAR posterior probability > 0.9 as significant, and only sites supported by at least three methods were reported. To reduce the bias due to reassortment, a further refined dataset was produced by removing suspected sequences. The mean ratio of nonsynonymous substitution per site (dN/dS) was then calculated using the SLAC method.

Origin of H3N8 CIV
In agreement with published reports, both temporal-spatial and molecular evidence suggest that H3N8 CIV originated from H3N8 EIV. Figure 1 shows the phylogenies of the gene segments for the surface proteins, HA and NA; Figures 1A, B show the initial ML trees, and Figures 1C, D were generated by selecting subtrees that included all H3N8 CIVs and related lineages to identify the origin. Similarly, the phylogenies of the other six internal genes (M1, NP, NS1, PA, PB1, PB2), as shown in Figure 2, further support the origin of CIV from EIV. In addition, the original ML trees also indicated that each gene segment of H3N8 CIV was closely related to the H3N8 EIV lineage ( Figure 3), except for A/Florida/242/2003, A/Florida/15592.1/2004, and A/Florida/43/2004, which were not clustered with the other CIV segments in the NS1 gene tree ( Figure 3C). These isolates were among the "emerging clade". Therefore, the emerging H3N8 CIV appeared to originate from a reassortant H3N8 EIV. Further evidence is provided below.

Phylogenetic and evolution dynamics of H3N8 CIV
From the dataset of the 44 full viral genome sequences, after splicing for segments M and NS, the segments were concatenated (HA, M1, NA, NP, NS1, PA, PB1, and PB2) for each virus, followed by generation of a maximumlikelihood (ML) tree ( Figure 4). According to the topology of this concatenated ML tree, H3N8 CIV could be divided into six major clades. Furthermore, a regression analysis using the root-to-tip distance of the ML tree of the full-length genome ( Figure 4 insert) showed that the R 2 was 0.61, indicating a somewhat linear relationship between nucleotide divergence and time, hence satisfying the criterion for Bayesian analysis. The MCC trees for each of the eight gene segments were subsequently generated, as shown in Figure  A Bayesian MCMC method was then utilized to estimate the evolutionary rate for each gene segment (Figure 6). Despite the small sample size, the MCC trees of each gene segment exhibited similar topology and maintain five stable clades for M1, NP, PA, PB1, and PB2. However, an additional clade VI exists for HA, NA, and NS. Furthermore, as shown in Figure 5, intrasubtypic reassortment was detected among these six clades. To achieve accurate dating of each gene segment, reassortant sequences were first removed from the dataset. The time of the most recent common ancestor (tMRCA) and the time of divergence were computed as described in the Materials and methods. As shown in Table 1 Table 2. There were variations among the six identified CIV clades (I to VI). Some were unique for specific clades, e.g., L50I was unique for clade II. Some were "fixed" for later clades, e.g., V156I and L185F for clades IV to VI. Some had more than one variation, e.g., P212H for clade II and P212S for clade V. Some of these unique amino acid residues were within identified functional domains, e.g., a nuclear transport signal (137-146): 140G and 140R for clade I and clade II to VI, respectively [51].  Taken together, the emergence of CIV from EIV was not from a whole EIV progenitor virus per se but by a "reassortant" virus involving a unique NS gene segment.
We also calculated the evolutionary rates for different H3N8 CIV gene segments. Overall, after removing the reassortant sequences of each segment, the eight gene segments evolved at similar rates over the 13 years since the first report in dogs. The evolutionary rate of HA was 1.83 × 10 −3 (95% HPD 1.3-2.37 × 10 −3 substitutions/site/ year) and that of NA was 2.09 × 10 −3 (95% HPD 1.54-2.68 × 10 −3 substitutions/site/year). For the other gene segments, the evolutionary rates ranged from 0.85 to 2.58 × 10 −3 substitution evolutionary rates/site/year (Figure 6). In general, the evolutionary rate for each gene segment was higher in H3N8 CIV than in H3N8 EIV [52]. However, these rates were at the low end of the 95% HPD when compared to that of H3N2 hIAV [53].
To reconstruct the population dynamics for the emergence of H3N8 CIV, a skyline plot was constructed. As shown in Figure 7, the effective population size of H3N8 CIV increased sharply between 2005 and 2006 for almost all the gene segments, with the exception of NS1. In Figure 6 Evolutionary rates of canine, equine, and human IAVs. Evolutionary rates of each segment of H3N8 CIV, H3N8 EIV, H3N2 CIV, and H3N2hIAV. The evolutionary rates of each segment were simulated using the BEAST (V1.8.4) program with a GTR plus gamma nucleotide substitution mode, relaxed molecular clock, and a prior coalescent Bayesian skyline tree with 10 8 generations.   addition, this sharp increase coincided with the time of divergence of H3N8 CIV into multiple clades (I to V).

Adaptation of H3N8 CIV in canine species
To investigate the molecular changes that occurred in H3N8 CIV after interspecies transmission to the new host species, changes in the amino acid residues for the eight viral proteins, including HA, M1, NA, NP, NS, PA, PB1 and PB2, were analysed. Variations in the amino acid sequences are tabulated in Additional file 3. Regarding the HA gene of CIV and EIV (the positioning or numbering of amino acid residues in the HA gene is based on the starting codon for H3 HA), as previously described by Wen et al. [54], there was a W222L substitution at the receptor-binding site in all CIV HA genes. However, for the antigenic sites A to D [25], there was a N54K substitution at antigenic site C in the CIV HA gene. Several non-synonymous substitutions in the HA gene were noted: V41I for clade V; G479D for clades II, III, and IV; and G479N for clade V. However, it reverted back to 479G for clade VI. Whether this N495G was a true reversion or a result of reassortment remains to be determined. Additional variations for other viral proteins are shown in Additional file 3. The phenotypic significance of these variations, however, remains to be determined.
In addition, we also analysed the Uracil (U) content of each segment in both H3N8 CIV and H3N8 EIV to assess the degree of adaptation to the corresponding host species [55,56]. For H3N8 EIV, the time frame spans from 1963 to present, whereas for H3N8 CIV, it spans from 2002 onwards. As shown in Figure 8, there was an increasing trend in the U content over time (p < 0.0001) for all gene segments in both viruses, with the exception of the HA segment of H3N8 CIV, in which the U content gradually decreased over time. However, this difference was not statistically significant. Therefore, the HA gene of this emerging H3N8 CIV, in contrast to other gene segments, appeared to be "adaptation stagnant". For NA, NP, PA, PB1 and PB2, the increasing rate of the U content for H3N8 CIV was similar to that of H3N8 EIV. Furthermore, for M1 and NS1, the rate of increase in CIV was significantly greater than that in EIV, suggesting that these two genes were under selective pressure to adapt in the new canine host.
Finally, the mean ratio dN/dS was calculated using SLAC, as shown in Figure 9. Interestingly, the mean dN/ dS values for all gene segments of H3N8 CIV were higher than the corresponding gene segments of H3N8 EIV, indicating that CIV accumulated more non-synonymous substitutions after interspecies transmission to canines. The mean dN/dS values of the major coding regions of CIV ranged from 0.14 to 0.40, except for NS1, for which the dN/dS ratio was 0.6. In particular, there was a significant difference in the M1 gene, as the dN/dS ratio of CIV was almost double that of EIV.

Time of emergence
Overall, our results indicate that the divergence time of H3N8 CIV was between 2002 and 2003. Interestingly, serological evidence of H3N8 EIV infection in British foxhounds was also determined to be approximately 2002 [29]. Furthermore, the H3N8 EIV infecting British foxhounds was closely related to representatives of a sub-lineage of American viruses; this was a first incursion of this lineage into the region [57]. Without viral isolation, it is difficult to determine if the virus infecting the British foxhounds was identical to that from Florida. Why did these two events of interspecies transmission by an apparently similar contemporary EIV result in different clinical outcomes? Our results indicated that the emerging H3N8 CIV was a reassortant virus that had fully adapted to the canine host when it emerged in 2002. However, whether there were multiple "attempts" by H3N8 EIV to establish in a canine host or whether there were "back-n-forth" progenitor viruses between the two host species, of which one successfully established, remains a major knowledge gap. That "successful" progenitor virus appeared to be a reassortant virus, as discussed below.

The emerging strain was a reassortant virus
From the topology of the ML and MCC trees, we concluded that the emerged H3N8 CIV in Florida was a reassortant virus with an NS1 gene segment from an earlier EIV, which was also supported by the tMRCA of the NS1 gene segment to have emerged 1997-2005. The NS1 gene of influenza viruses, including H3N8 EIV, has been shown to counteract host innate immunity [51,58,59]. Furthermore, NS1 of EIV was categorized into a predivergent lineage, Eurasian and American/Kentucky sublineages, and subsequently into Florida-1 and Florida-2 clades [60]. Due to the segmented genome, the viral phenotype depends on the combination of the gene segments or its "gene constellation" [61]. It has also been shown that multiple EIVs have infected canines, and reassortment events have occurred in canines. The host-switch of EIV to CIV by the Florida-1 clade virus was "successful", while the interspecies transmission event in England was not. Furthermore, during the spread of H3N8 EIV in Australia in 2007, dogs in close contact with infected horses developed clinical symptoms, with virus isolation from two index cases, but the virus did not spread further [62]. One possibility is that in A/equine/Sydney/2007 (or "would be" A/canine/Sydney/2007), unlike A/canine/Florida/2003, the H3N8 EIV that crossed species to dogs in Sydney lacked the NS1 gene segment from a putative earlier H3N8 EIV and hence became a "deadend" event, as shown by the phylogeny of the HA1 gene segment [63]. However, other factors, such as improved quarantine and a history of "back and forth" virus transmission, cannot be ruled out. Notably, the "donor" of the NS1 gene segment to the emerging strain of H3N8 CIV is particularly interesting and remains to be elucidated. It suggests that possibly its "evolutionary clock" differs from that of other gene segments, but this is unlikely. An alternative hypothesis is that there was a "frozen" virus strain in circulation. However, taking together, this "successful" interspecies transmission was a "rarity", as a unique combination of gene segments was required, plus a unique amino acid substitution at the receptor-binding site in the HA gene (W222L) [54]. Therefore, this interspecies transmission was not a "wholesale" event, but rather occurred during at least two events, one involving reassortment of a unique combination of gene segments and a mutation or adaptation at the receptor-binding site in the HA. However, whether these events were simultaneous or sequential remains unknown.

Adaptations to canine hosts
According to the tMRCA for each gene segment with regard to the time of virus emergence, the "adaptation" had occurred while in the equine host. This "pre-emergence adaptation" could be a characteristic of EIV in the Florida-1 clade, which might not occur in other sublineages. Phenotypic heterogeneity is common in equine influenza virus. For example, there is heterogeneity in virulence among different strains of EIV in the early American lineage (unpublished results). As discussed above, although the emerging strain was a reassortant strain and the reassortment event likely occurred in the equine host, back-and-forth interspecies transmission between the canine and equine host prior to the emergence of CIV cannot be ruled out.
In addition, the U content of each gene segment was analysed for both H3N8 CIV and H3N8 EIV. An increasing U content in viral gene segments is an indicator of viral adaptation to a new host species [64]. There was an increasing trend in the U content of each gene segment over time for both H3N8 CIV and H3N8 EIV (Figure 8), indicating that both viruses were adapting to their respective hosts. However, the rate of increase was significantly higher in the NS1 and M1 genes from H3N8 CIV. As discussed above, NS1 is involved in counteracting the host innate immune response, specifically downregulating interferon expression and interferon-induced antiviral responses [65]. Our results indicate that the adaptation of H3N8 CIV to canine species involved overcoming host innate immunity both at the pre-emerging and post-enzootic stages. Elucidating the underlying mechanism of this differential rate of increase in the U content between CIV and EIV for NS1 and M1 may shed light on this host-range shift from one mammalian host to another.
Changes in critical amino acid residues may affect the functionality of viral proteins, and the changes may be a result of selection or adaptation. Variations, particularly among different clades, as shown in Additional file 3, in addition to known functions such as nuclear transport signalling or receptor-binding, can only be delineated by experimental approaches. These experiments are ongoing.

Diverged evolution and intrasubtypic reassortment
Our phylogenetic analyses revealed that enzootic H3N8 CIV has evolved into five to six clades (I to VI). This rapid divergence is in contrast to the parental H3N8 EIV, which only diverged in approximately 1990 after a long-term monophyletic evolution since it was first identified in 1963. Whether it was due to a "founder effect" by geographic restriction or a function of selective pressure remains to be determined.
Reassortment among multiple lineages of co-circulating influenza viruses is not uncommon [66][67][68]. Likewise, for the diverged sublineages of H3N8 CIV, different gene segment clustering in different clades was observed in several viruses, indicating intrasubtypic reassortment. Murcia et al. [52] showed that intrasubtypic reassortment is common in H3N8 EIV, and reassortment results in enhanced virulence. Whether these CIV reassortments among these multiple clades have biological significance remains to be determined.
The overall evolutionary rate of H3N8 CIV was lower than that of H3N2 hIAV, higher than that of H3N8 EIV, and similar to that of H3N2 CIV [69]. In addition, the evolutionary rate of each gene segment of H3N8 CIV was similar to each other. In addition, from the skyline plot, the effective population size of H3N8 CIV increased between 2005 and 2006, consistent with the genetic divergence during this period.
Interestingly, current epidemiological data indicate that H3N2 CIV mainly affects pet dogs, while H3N8 CIV affects stray dogs and dogs in shelters. Given the recent report of the susceptibility of dogs to a wide spectrum of influenza viruses, the high frequency of reassortment among these influenza viruses [70] and that stray dogs have a high risk of contact with other species, including birds and humans, the risk of dogs as a "mixing vessel" generating a pandemic influenza virus should not be overlooked.
In conclusion, by conducting a phylodynamic analysis to reconstruct the evolutionary path of CIV, we have not only demonstrated that the interspecies transmission of H3N8 EIV to canine occurred in approximately 2002 but also shown that the progenitor virus was a reassortant virus involving an "ancestral NS1 gene segment". Interspecies transmission of influenza viruses might occur more frequently than previously thought, but to adapt and remain in a new host species, that is, to undergo a host-range shift, unique combinations of gene segments plus critical mutations are required. Furthermore, H3N8 CIV has diverged into multiple sublineages or clades, possibly as a result of founder effect, in which the founder virus evolves within geographic or facility boundaries. Similar to other influenza viruses, frequent reassortment within and among these clades was observed. These findings provide a conceptual framework to understand the mechanism of interspecies transmission and host-range shifts of influenza viruses. Finally, although we utilized a computation-intensive genomics approach to reconstruct this host-range shift and the subsequent evolution and adaptation of CIV, phenotypic studies are still required to validate these results. These experiments are ongoing.