Research Article |
Corresponding author: Steffen U. Pauls ( steffen.pauls@senckenberg.de ) Academic editor: Anna Hundsdoerfer
© 2023 Steffen U. Pauls, Wolfram Graf, Anna E. Hjalmarsson, Alan Lemmon, Emily Moriarty Lemmon, Malte Petersen, Simon Vitecek, Paul B. Frandsen.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
|
Streams represent a special case of directional environmental gradients where ecological opportunity for diversification may be associated with upstream and downstream dispersal into habitats that differ in selective pressures. Temperature, current velocity and variability, sediment erosion dynamics and oxygen saturation are key environmental parameters that change in predictable ways from springs to river mouth. Many aquatic insects occupy specific longitudinal regions along these gradients, indicating a high degree of adaptation to these specific environmental conditions. In caddisflies, the evolution of tracheal gills in larval and pupal stages may be a major driver in oxygen uptake efficiency and ecological diversification. Here we study the evolution of larval gill structure in the Rhyacophila vulgaris species group using phylogenomic methods. Based on anchored hybrid enrichment, we sequenced 97 kbp of data representing 159 independent nuclear protein coding gene regions to infer the phylogeny of the R. vulgaris species group, whose species exhibit both high diversity of gill types and varied longitudinal preferences. We find that the different gill types evolved independently as derived characters in the genus and that gill structure is linked to the longitudinal habitat preference, thereby serving as a possible ecological key innovation in the R. vulgaris group.
Anchored hybrid enrichment, phylogenomics, Trichoptera, Rhyacophilidae, gill evolution
Along strong environmental gradients, ecological opportunity for diversification may be associated with dispersal into habitats that differ in selective pressures (
A prominent feature of stream ecosystems is the differential availability of oxygen in different, hierarchically structured river stretches and (micro)habitats which is linked to current velocity, water temperature, nutrient load (N, P, and C), sediment structures, and aquatic and riparian vegetation (
In contrast to salmonids or stoneflies, caddisflies have evolved to occupy virtually all types freshwater habitat outside of Antarctica, from highly oxygenated headwater streams to stagnant, ephemeral ponds. One hypothesis associates the basis of this ecological diversity with case and pupal building variation among caddisflies (Mackay & Wiggins 1979,
Based on current knowledge, most caseless caddisflies are limited to such well-oxygenated habitats. A few representatives of caseless caddisflies, most notably gilled Hydropsychidae and gillless Polycentropodidae, are, however, exceptional in that they comprise species that occur in poorly oxygenated stretches along the stream continuum, including stagnant pools. Rhyacophilidae are caseless predators that primarily occupy cold, turbulent and well-oxygenated streams. They are Holarctic and Oriental in distribution with major centers of diversity in the Oriental region (~500 species) as well as Nearctic and European mountain ranges (
Known gill configurations in Rhyacophila and Himalopsyche kuldschensis groups, shown as a latero-apical depiction of the gill filaments on the side of larval abdominal segments. Grey box indicates R. vulgaris group species; dashed line segregates the representative of Himalopsyche. — * For the purpose of this manuscript we define gills of single unbranched filaments as “simple” gills. — ** For the purpose of this manuscript we define “complex” gills as those gills that exhibit a branching pattern; + sensu
Diversity in gill structure is not uncommon in caddisflies. For example, final instars of Limnephilidae generally have abdominal gills that can be made up of single filaments, multiple filaments, a mixture of these and/or have varying configurations on different abdominal segments (Waringer & Graf 2011). In Himalopsyche, four distinct gill types are known, all of which are complex (
While species of Rhyacophila are restricted to fast-flowing water in North America and Asia, some European species occur in moderately flowing stream sections. In this context the R. vulgaris group is particularly interesting. This species group currently comprises 70 species in Europe. Among these 70 species we find a striking diversity of gill types (Fig.
In this study, we analyze ecological traits and phylogenetic data to help shed light on how morphological traits evolved and what ecological function they may have. Specifically, we ask 1) if individual gill types in Rhyacophila evolved once or repeatedly, 2) if structurally complex gill types are derived or ancestral in the R. vulgaris group, and 3) if there is a link between gill structure and distribution of Rhyacophila along the stream zonation gradient. To address these questions, we infer phylogenetic relationships in the genus Rhyacophila using an anchored hybrid enrichment-based multi-locus phylogenomic approach, with a particular focus on the R. vulgaris group. We estimate ancestral character states for different gill types along the phylogeny, particularly within the R. vulgaris group where numerous gill types of varying structure are known (Fig.
Specimens were collected with light traps or sweep nets and preserved in either 70% or 95% ethanol. Prior to DNA extraction, wings (in adults), heads and terminal abdominal segments were removed as specimen vouchers. DNA was extracted with hot sodium hydroxide as described in
The probes were designed in collaboration with the Center for Anchored Phylogenomics following
Extracted DNA was sonicated using the Covaris Ultrasonicator to a size distribution of 150–400 bp. Libraries were then prepared following
Raw read pairs passing the CASAVA high-chastity filter were merged following
High-throughput sequencing data have been shown to be susceptible to contamination (e.g.
We used Gblocks (
We then used a relative branch length approach to identify potential contaminants (Fig.
The absolute branch lengths vary between a) and b) because the evolutionary rates of the two loci differ. But the relative branch length of each terminal taxon is constant. There is some variation in the relative branch length depending on the rate of variation regarding each taxon in each locus (compare b) and c)). This „normal“ variation leads to normal distribution of relative terminal branch lengths for each taxon (d)). In case a sequence is affected by contamination, e.g. cross-contamination of DNA from a different organism, or chimera assembly of different organisms, the placement of the contaminated taxon may be different in a new phylogenetic inference (e.g. taxon B in e)). But especially using many short loci, most single loci phylogenetic inferences are poor and not very precise. It is thus difficult to assess if the different topology is the product of limited phylogenetic signal in the locus, a contamination or true gene tree discordance. If, however we constrain the topology of the data set in e), the odd taxon will sit on an exceptionally long terminal branch (f)), that can be identified as an outlier in the distribution of relative terminal branch lengths for taxon B. In the case of true gene tree discordance we can expect the relative branch lengths for several taxa to be flagged as being outliers. The approach should be independent of the constraint tree used, as the distribution of branch lengths for a misplaced taxon will generally be too long, but an outlier can still be detected.
Prior to using this approach on our study data, we examined its utility and reliability on simulated DNA sequence data. First, we used Seq-Gen v.1.3.3 (Rambaut & Cassidy 1997) to simulate DNA sequences for 100 loci (500 bp, HKY model, nucleotide frequencies A: 0.294, T: 0.246, G: 0.241, C: 0.221) based on the characteristics of our own concatenated alignment and the resulting guide tree. To simulate contamination, we manipulated the simulated data set in three ways: 1) by randomly replacing 10 complete loci with others from within the simulated data set to imitate cross contamination within a study; 2) by randomly inserting 10 120-bp-fragments from within the simulated data to other species sequences to imitate chimera production in the original sequence assembly; and 3) by randomly inserting 10 120-bp-long fragments of random data to imitate contamination from an unknown source in the original sequence assembly. Randomization of loci, taxa and insert position were based on Random Thing Picker (http://andrew.hedges.name/experiments/random/pickone.html, accessed on May 12, 2016). We then performed phylogenetic analysis on all 4 simulated data sets (each form of contamination individually and all contaminations combined) under the topological constraint and estimated relative terminal branch lengths for each taxon individually across all loci. We used the approach described above to identify if the integrated contaminants resulted in outlier branch lengths. Under the topological constraint of our guide tree, our approach identified 25 of the 30 contaminations (all random inserts, all cross contaminations, five of the 10 chimeras) from our simulated data set. We also flagged two false positive outliers. These results show that the approach is useful in detecting contaminants, including small random inserts and potentially even chimeras with congeners. However, half of the chimeras were not flagged and it is possible that some false positives could be flagged and uncritically removed from a data set. We view the approach as a tool for flagging potential contaminants that allows us to further scrutinize the flagged sequences by hand, since it is time-consuming and error-prone to scrutinize all sequences of all loci manually.
We applied our contamination detection approach to our original 61 taxon data set. For the input constraint tree, we used a maximum likelihood tree generated in RAxML v. 8 (
Our approach flagged 33 sequences from the 66 loci that had sequence data for all 61 taxa, and 230 sequences from the 385 loci with missing data, resulting in 263 total flagged sequences. We removed these sequences unless the outlier branch length was the result of the sister taxon to the flagged taxon having been removed from the single locus constraint tree due to missing data. In this case, the terminal branch length of the flagged specimen is necessarily longer than usual because the closest taxon is missing from the single locus constraint tree. Using these guidelines we removed 126 potential contaminant sequences.
Additionally, we removed 116 potential cross-contaminated sequences that were identical among non-sister taxa (based on the constraint tree). In total 242 individual sequences were removed (1.03%).
The most important predictor of phylogenetic informativeness of a particular locus is simply sequence length (
This filtering resulted in a final data set comprising 60 taxa, 159 (super) loci ranging from 302-1922 bp in length (Ø 570.5 bp), and a total concatenated alignment length of 90,717 bp. The molecular data sets are available at the Senckenberg (Meta) Data Portal under https://dataportal.senckenberg.de/dataset/cf47cdbb-7384-47e4-a89d-cb1abcf65963.
We followed three general approaches to infer the phylogenetic relationships in our data set. First, we generated a maximum likelihood tree on the concatenated alignment (90,717 bp). We used the PartitionFinder 2.0 relaxed clustering algorithm with default settings to select an optimal partitioning scheme (Lanfear et al. 2017). Then we selected nucleotide substitution models for each subset of the optimal partitioning scheme using ModelFinder as implemented in IQtree v. 1.6 (
Second, we generated a species tree in ASTRAL v. 5.6.1 (
Third, we identified the best partitioning scheme and nucleotide substitution models via ModelFinder. For each of the 159 (super)loci, we used independent, variable substitution rates assuming an average substitution rate of 1 across all (super)loci and independent character state frequencies, transition/transversion rate ratios, gamma parameter shapes and proportions of invariable characters. Based on this model we generated a Bayesian tree with MrBayes v. 3.1 (Ronquist & Huelsenbeck 2003). We ran Bayesian inference for 10 x 106 generations in 2 independent runs with 8 chains each, sampling each 5000th generation. We used average standard deviations of split frequencies as reported by MrBayes and visualizations of MrBayes parameter files in combination with ESS estimates as available in Tracer v. 1.6 (
In total, we differentiated the eleven known gill types in Rhyacophilidae that were included in our phylogeny. Species where larval gill types are not known were coded as “unknown” and treated as missing data. We estimated the ancestral character state of gill type at each supported node along the backbone of our phylogeny as well as all nodes within the R. vulgaris group. Ancestral character estimation (ACE) was performed in BayesTraits V3.0 (available from http://www.evolution.rdg.ac.uk), using the BayesMultistate method (
We performed MCMC ancestral character estimation on 16 nodes with a uniform prior on a sample of trees from the MrBayes output at two levels: first with a binary coding of gills/no gills, and second for all eleven character states (Figs
Strict consensus tree from MrBayes output with BayesTraits ancestral state reconstructions for gill types. Gill types are illustrated on the left and assigned a color, which are assigned to taxa on the right-hand side of the tree. The posterior probability is shown on those nodes with a posterior probability below 1.0. Unlabelled nodes have a posterior probability of 1.0. In nodes with bootstrap support <70 in the IQTree ML inference, the bootstrap values are shown after the posterior probability for the respective node. Pie charts with the probabilities for the various gill types ancestral states are connected to their assigned node via numbered lines.
Abdominal tracheal gills in caddisflies have been associated with osmoregulatory and breathing functions (
Our final data set comprised 60 taxa (Supplementary Material A). Of these, 19 species belonged to the vulgaris-species group sensu
Briefly, all phylogenetic analyses recovered almost congruent tree topologies: only minor changes in branch length were returned and almost all relationships received high support values (Fig.
Of the vulgaris branch species in Clade A, R. oreata, R. dongkyapa and R. petersorum form a monophylum. The remaining vulgaris branch species form a clade (Clade C; pp = 1.0) within Clade B. In Clade C, R. fuscula of the fuscula group is sister to the supported vulgaris species group clade. Within the vulgaris group (Clade D-G; all pp = 1.0) all relationships are resolved. However, some internodes are very short.
Both structurally simple and complex gill types are distributed across the phylogeny, with only a few gill types representing apomorphies for specific branches, veins or species groups: H. kuldschensis, R. alberta, R. bonaparti and R. coloradensis have unique gill types in the phylogeny, while gill types observed in R. alberta, R. coloradensis and H. kuldschensis are also known from other species that we did not include in our study (
In the vulgaris group (Fig.
The vulgaris group then splits into three clades that correspond to different gill configurations. The Pararhyacophila clade (Clade E) comprises R. rectispina, R. intermedia, R. lusitanica and R. munda, whose larvae exhibit four digitate gills on each abdominal segment. The Pararhyacophila clade is sister to the Hyperrhyacophila + sensu stricto clades. The Hyperrhyacophila clade (Clade F) comprises all sampled species with larvae exhibiting comb-like gills (R. torrentium, R. evoluta, R. occidentalis) as well as the sensu stricto species R. moscaryi at its base. All other species with tufted gills in the data set fall into the sensu stricto clade (Clade G). Based on the polarization of the tree with the outgroup, it appears that the clades within the vulgaris group follow an evolutionary trajectory where deeper nodes exhibit ancestors with simple gill configurations and shallow nodes bear ancestors with complex gills.
The IQTree ML analysis resulted in a topology that showed the same topology as the Bayesian Inference. We highlight differences in support values on those nodes with bootstrap supports (bs) lower than 70 (Fig.
The results from the Bayesian multistate analysis showed that acceptance rates of proposed rate changes stabilized around 38% indicating sufficient mixing of the MCMC chain. The resulting probabilities for ancestral character states in the binary coding (no gills/gills) were the same as those for the sum of all individual gill states vs no gills. In accordance with our sampling design and hypotheses we focused on the ancestral character state estimations in the R. vulgaris group (sensu
Of the 10 recognized gill types in Rhyacophila it appears that only single digit gills (R. loxias and R. laevis) evolved more than once. While tufted gills also evolved multiple times, they did so with different configurations in the vulgaris group (single tuft), brunnea group (three tufts) and grandis group (two tufts) (Giersch & Wissemann 2012). Ancestral character estimations indicate convergent evolution in both cases.
The distribution of river zonation occurrences for individual gill types is presented in Figure
Box and whisker plots of longitudinal occurrence of 50 European Rhyacophila species based on the longitudinal zonation index (see Supplementary Material C). The species are grouped by the six gill types known to occur in the European fauna. Groups with significantly different longitudinal distribution ranges are indicated by letters above the box and whisker plots. Boxes represent the 50th percentiles, the whiskers extend to the 100th percentile. Horizontal lines within boxes indicate median values.
Our results show that structurally complex gill types, i.e. the four digitate gills, comb-like gills and single-tufted gills evolved as derived character states in the R. vulgaris group (Fig.
Tracheal gills in aquatic insects have been shown to improve oxygen uptake (e.g., Eriksen & Moeur 1990).
Phenotype-environment correlations and trait evolution in adaptive radiations have been widely studied to gain insight into the dynamics underpinning rapid species diversification (
If this were the case, we would, however, expect to see a similar pattern evolve multiple times across the genus because the transition of physical properties from headwater to midstream to downstream habitat conditions is a ubiquitous feature of running waters. It thus seems more likely that structurally complex gills evolved progressively in the vulgaris group, and subsequently allowed these species to cope better with low oxygen conditions that are more frequently encountered in relatively downstream conditions (
The taxon sampling in our phylogeny is clearly insufficient to draw expansive conclusions on the phylogeny or extensive biogeographic analyses of the family. Yet some observations suggest that such a study would be very worthwhile. Where our sampling comprises more than one relevant taxon, many hypotheses on the relatedness of species, i.e. the composition and distinction of species groups, seem supported. However, regarding the deeper relationships, our analysis did not recover most of the groupings suggested by
SUP and WG designed the study. AL performed the wet lab methodology. PF, MP, SUP, SV and AEH performed the analyses and prepared the figures; AEH and SV illustrated the gill types. SUP and PF drafted the paper. All authors edited the manuscript and approved of the final version.
The authors declare that they do not have any conflicts of interest.
We thank Karl Kjer for important discussions on this study, and Anna Hundsdörfer and three anonymous reviewers for helpful comments on an earlier version of this manuscript. Jürgen Otte (Senckenberg, Frankfurt) is thanked for assisting in DNA extractions. We thank Christine Frandsen for assisting with the figures. This study was supported by the LOEWE-Centre for Translational Biodiversity Genomics funded by the Hessen State Ministry of Higher Education, Research and the Arts (HMWK).
Supplementary Material A–C
Data type: .pdf
Explanation note: A. Specimens used in the phylogenomic analysis in this study.— B. ASTRAL species tree based on 60 taxa and 159 individual loci of > 302 bp in 8 length. — C. Rhyacophila gill type and longitudinal habitat preference. — Molecular Data is available at the Senckenberg (Meta) Data Portal under https://dataportal.senckenberg.de/dataset/cf47cdbb-7384-47e4-a89d-cb1abcf65963