Research Article |
Corresponding author: Shixiang Zong ( zongshixiang@bjfu.edu.cn ) Academic editor: André Nel
© 2024 Yiming Niu, Fengming Shi, Xinyu Li, Sainan Zhang, Yabei Xu, Jing Tao, Meng Li, Yuxuan Zhao, Shixiang Zong.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
|
Longhorn beetles (Cerambycidae) play a vital role in global ecosystems. Some of them contribute to nutrient cycling and pollination, while others, pose a threat to forestry production. Despite their ecological importance, there has been a lack of comprehensive analyses on the mitochondrial genomes of Cerambycidae beetles. In this study, we have conducted mitochondrial genome sequencing and annotation for four Cerambycidae beetles: Monochamus sutor, Monochamus guerryi, Monochamus galloprovincialis, and Monochamus latefasciatus. Our analysis revealed a high degree of conservation in these mitochondrial genomes, with rare gene rearrangements observed across the Cerambycidae family. Additionally, a notable bias towards AT content was identified, with most genes using ATN as the start codon and TAA as the stop codon. Except for trnS1, all tRNA genes showed typical cloverleaf secondary structures. Phylogenetic analysis using IQ-TREE and Phylobayes consistently produced congruent topologies. At the gene level analyses, our results highlighted the remarkable conservation of the COX1 gene. Furthermore, at the species level, we observed strong adaptability in the Spondylidinae and Lepturinae subfamilies. We also offer our insights into contentious aspects of the phylogeny. Overall, our research contributes to a deeper understanding of the phylogeny and evolution of Cerambycidae, laying the groundwork for future population genetic investigations.
Cerambycidae, Comparsion, Monochamus, Mitochondrial genomic, Phylogeny, Systematics
The order Coleoptera (beetles) is the largest and most widely distributed order in insect with a staggering 380,000 documented species worldwide. Beetles make up approximately a quarter of all known animal species on Earth and many more species yet to be documented (
Longhorn beetles play a crucial role in global ecosystems. They aid in the decomposition of dead branches and woody plants, promoting ecosystem cycling. They serve as pollinators, providing essential assistance in the reproduction of plants (Nie et al. 2020). However, longhorn beetles can also pose a threat to forests, such as Anoplophora glabripennis (Motschulsky), Jebusaea hammerschmidtii Reiche, and Aromia bungii (Faldermann) (
With the continuous advancement of molecular biology technology, mitochondrial genome (mitogenome) has been widely used in species identification, kinship identification, population genetics, phylogeny, Geographic distribution, migration, evolution, and so on (
However, the number of complete mitochondrial genome sequences in longhorn beetles is remarkably limited and there is still a lack of comprehensive comparative analysis of the mitochondrial genomes among them. Therefore, we sequenced the mitochondrial genomes of four species within the Monochamus genus of longhorn beetles and conducted comparative mitochondrial genomic analyses in conjunction with other longhorn beetle species available in GenBank (https://www.ncbi.nlm.nih.gov). Our study enriches the mitochondrial genome and provides robust data support for an in-depth investigation of the phylogeny of the Cerambycidae. It may shed new light on the complex phylogenetic relationships in Cerambycidae.
Adult Monochamus sutor (Linnaeus), Monochamus guerryi Pic, Monochamus galloprovincialis (Olivier), and Monochamus latefasciatus (Breuning) species collected information was shown in Table
Species Name | Host Plant | Date | Locality |
Monochamus sutor | Larix gmelinii (Ruprecht) Kuzeneva | Jul., 2020 | CHINA – Beijing – 39°54′N, 116°25′E |
Monochamus guerryi | Quercus glauca Thunb. | Jun., 2020 | CHINA – Guangxi – 22°02′N, 106°47′E |
Monochamus galloprovincialis | Pinus sylvestris var. mongolica | Jul., 2020 | CHINA – Heilongjiang – 51°02′N, 122°11′E |
Monochamus latefasciatus | unkonwn | Jul., 2019 | CHINA – Guangxi – 22°02′N, 106°47′E |
DNA samples of four longhorn beetles were sequenced by next-generation sequencing at Biomarker Technologies Co., Ltd. (Beijing, China) to obtain complete mitochondrial genes. An Illumina TruSeq library was generated, with average lengths of 350 bp and 250 bp paired-end reads sequenced on Illumina Hiseq 2500 platform (San Diego, CA). After filtering using the NGS QC Toolkit v2.3.3 (
Protein-coding genes (PCGs) and ribosomal RNA (rRNA) genes were annotated by aligning with known homologous Cerambycidae species mitogenome sequences from GenBank. Additionally, the secondary structures of transfer RNA (tRNA) genes were identified using tRNAscan-SE Search Server v2.0 (http://lowelab.ucsc.edu/tRNAscan-SE/) (Lowe et al. 2016). The limits of PCGs and rRNA genes were determined by comparing them with analogous genes found in other Cerambycidae species and by referencing the positions of transfer RNA (tRNA) genes. To affirm the precision of the deduced locations, the nucleotide sequences of 13 PCGs were translated into amino acids through the utilization of MEGA X (
The mitogenomes of 89 species obtained from the NCBI (https://www.ncbi.nlm.nih.gov) and 4 newly sequenced species were used for analysis (Table S1). Subsequently, an online software tool called Interactive Tree Of Life (iTOL) v 6.8 (Letunic et al. 2019) was employed to visualize the gene order of all species with different gene represented in distinct colors for enhanced clarity and comparison. Mitogenome rearrangement events were completed through the manual comparison, with Drosophila yakuba Burla as a reference (Clary et al. 1985). Pairwise comparisons of DNAs between M. sutor and other longhorn beetles mitochondrial genomes were performed using BLASTN searches in the CGView Comparison Tool (CCT) (https://github.com/paulstothard/cgview_comparison_tool). The base composition, amino acid usage, codon usage, and relative synonymous codon usage (RSCU) were analyzed through MEGA11/ PhyloSuite v1.2.3 (
The mitochondrial genome data were obtained from the NCBI nucleotide database. A total of 86 species in 10 groups were selected from the longhorn beetles. Additionally, the dataset incorporated the mitochondrial genomes of the four newly sequenced species.
After filtering the redundant sequence, 13 PCGs and 2 rRNAs were extracted by PhyloSuite v1.2.3 (
Maximum likelihood (ML) phylogenies and Phylobayes MPI 1.5a were used to construct phylogenetic trees mainly about Cerambycidae on PhyloSuite v1.2.3 and CIPRES Science Gateway (https://www.phylo.org). Seven species with complete sequences from other families that similar to Cerambycidae were selected as outgroups (Eucryptorrhynchus scrobiculatus Motschulsky, Eucryptorrhynchus brandti (Harold), Curculio davidi Fairmaire, Sympiezomias velatus (Chevrolat), Naupactus xanthographus Germaer, Aegorhinus superciliosus (Guérin), Sitophilus zeamais Motschulsky). For ML analysis, the IQ-TREE v. 1.6.8 (
Complete mitogenomes of four Lamiinae species were newly sequenced: M. sutor, M. guerryi, M. galloprovincialis and M. latefasciatus, which ranged in size from 15759 (M. guerryi) to 15929 (M. latefasciatus) base pairs and each contained 37 genes (13PCGs, 22 tRNA and 2 rRNA) (Fig.
The mitochondrial genomes of the longhorn beetles display a noticeable AT bias, with an A+T content ranging from 67.9% (Prioninae, Agrianome spinicollis (MacLeay)) to 81.2% (Lepturinae, Sachalinobia koltzei (Heyden)). Interestingly, only four species exhibit an A+T content exceeding 80%. Lepturinae stands out with the highest A+T content, averaging 79.5%, while Prioninae has the lowest, averaging 70.8%. The AT-skew values are predominantly positive, ranging from –0.016 (Lamiinae, Pterolophia sp. ZJY-2019) to 0.125 (Prioninae, Priotyrannus closteroides Thomson), while GC-skew values are mainly negative, spanning from –0.286 (Prioninae, P. closteroides) to –0.136 (Cerambycinae, Xystrocera globose (Oliver)). Notably, the newly sequenced Lamiinae species all have A+T contents higher than the average, with positive AT skew. Although the nucleotide compositions across the 10 groups are generally similar, there are differences in A+T content, AT-skew, and GC-skew (Table S3; Fig.
To explore the base content characteristic in the coding region, we specifically analyzed the PCGs of mitogenomes. The results revealed a significant AT bias in the entire PCGs, with A+T content ranging from 65.42% (Prioninae, A. spinicollis) to 79.73% (Lepturinae, S. koltzei). The Lepturinae again exhibits the highest A+T content, averaging 77.9%, while the Prioninae has the lowest, averaging 67.9%, the same trends observed in the entire mitochondrial genome. The AT-skew values are negative across all PCGs, while most GC-skew values are also negative. This AT bias is particularly prominent at the 3rd codon position, consistent with a high mutational pressure toward AT nucleotides at this position (
Furthermore, there is a clear AT bias in RNA genes, with AT-skew and GC-skew values mainly positive in tRNA but predominantly negative in rRNA. Among the four newly sequenced beetles, the AT content in RNA genes is higher than the average, while the GC content is lower than the average (Table S3).
As depicted in Figure
Graphical map of the BLASTN results showing the nucleotide identity between the M. sutor mitochondrial genome and that of 85 other species listed in Table S1. CCT arranges the BLASTN results such that the sequence most similar to the reference (M. sutor) is placed closer to the outer edge of the map.
After analyzing the PCGs of 86 mitogenomes, we observed that the starting codon usage bias was quite similar at the species level when ATN codons were used as the start codons. Most species predominantly employed ATT or ATG as the start codon, while some species used ATA, ATC, and a few even utilized TTG and GGA. As for stop codons, TAA, TAG, and TTA were used in these mitogenomes, with TAA being the most frequently employed. At the gene level, most PCGs preferred ATT and ATG as start codons and still commonly used TAA as a stop codon. However, there was diversity in the types of start codons for NADH dehydrogenase genes, with some less common ones like TTG. Additionally, 8 genes exhibited incomplete stop codons in PCGs, with only atp6, atp8, nad6, and nad1 being exceptions. In most cases, the incomplete stop codon was “T”, and very rarely “TA” (Table S5). These incomplete stop codons are relatively common in insect mitochondrial genomes (
Relative synonymous codon usage (RSCU) values of all 10 groups and the four new sequenced species were plotted in Fig.
In the 10 groups, the frequency of amino acid usage was relatively consistent. Overall, Leu was the most frequently used amino acid, followed by Ile, Phe, Met, and Ser in that order. However, some variations were observed in Spondylidinae, Prioninae, Cerambycinae and Disteniinae. In these groups, there was a general trend of Ser being more frequently used than Met (Table S6; Fig.
The effective number of codons (ENC) and the codon bias index (CBI) in these species ranged from 37.03 (Lepturinae, S. koltzei) – 51.2 (Disteniinae, Typodryas sp. N143) for ENC, and from 0.275 (Prioninae, P. closteroides) – 0.643 (Lepturinae, S. koltzei) for CBI. Notably, Spondylidinae and Orsodacinade had the lower average ENC value of 43.79 and 43.61, while Prioninae had the highest average ENC value of 50.32. As for CBI, Prioninae had the lowest average value of 0.33, while Spondylidinae, Lepturinae and Orsodacinade had the higher average CBI value of 0.50 (Table S4). The ENC and CBI results showed that Spondylidinae and Orsodacinade species exhibited a higher codon bias, suggesting that synonymous codons were more frequently used in their genomes with a lower codon selection, implying relatively lower genomic complexity. These Spondylidinae and Orsodacinade species appear to possess strong adaptability to specific ecological environments (
The number of intergenic regions in the mitochondrial genomes of the entire 10 groups ranged from 2 (Cerambycinae, Ceresium sinicum ornaticolle Pic) to 14 (Lamiinae, Lamiinae sp. 1 ACP-2013 and Prioninae, P. closteroides). On the other hand, gene overlap regions varied from 5 (Disteniinae, Distenia punctulatoides Hubweber) to 25 (Lamiinae, Oberea diversipes Pic). These gene overlaps were generally small, often spanning from 1 to 7 bp. The largest overlap regions were frequently observed between atp6 and atp8 (
Groups | Intergenic region amount | Gene overlap amount | atp8-atp6 Normal intergenic region (-7bp) amount | atp8-atp6 Special intergenic region | nad4-nad4L Normal intergenic region (-7bp) amount | nad4-nad4L Special intergenic region |
Lamiinae | 4 -14 | 9–25 | 26 | - 4, H. nigromaculata - 4, Batocera rubus (Linné) - 4, Annamanum lunulatum (Pic) - 4, Pterolophia sp. ZJY-2019 - 1, Blepephaeus succinctor (Chevrolat) - 4, Morimospasma tuberculatum Breuning | 17 | - 4, Eutetrapha metallescens (Motschulsky) - 4, Paraglenea fortunei (Saunders) - 4, Thyestilla gebleri (Faldermann) - 4, H. nigromaculata - 4, B. rubus - 4, A. lunulatum - 4, M. latefasciatus - 4, M. guerryi - 4, M. sutor - 4, M. urussovii - 4, Agelasta perplexa Pascoe - 4, A. saltator - 4, Morimospasma sp. - 4, Niphona lateraliplagiata Breuning - 4, Jamesia sp. KM-2017 |
Vesperidae | 8–13 | 6–8 | 6 | / | 6 | / |
Spondylidinae | 6–13 | 10–12 | 2 | - 4, C. oberthueri | 4 | - 4, C. oberthueri |
Lepturinae | 6–9 | 9–13 | 6 | / | 5 | - 4, Anastrangalia sequensi (Reitter) |
Oxypeltinae | 7 | 13 | 1 | / | 0 | - 4, Oxypeltus quadrispinosus Blanchard |
Orsodacnidae | 7–11 | 6–10 | 2 | / | 2 | / |
Dorcasominae | 6–9 | 10–11 | 2 | / | 1 | - 4, Tsivoka simplicicollis (Gahan) |
Prioninae | 4–14 | 9–14 | 7 | - 4, P. closteroides - 4, Phaolus metallicus (Newman) - 4, Agrianome spinicollis (MacLeay) - 4, Analophus parallelus Waterhouse - 4, Archetypus frenchi (Blackburn) - 4, Olethrius laevipennis Vitali | 10 | - 4, P. closteroides - 4, Callipogon relictus Semenov - 4, Aegosoma sinicum White |
Cerambycinae | 2–12 | 8–15 | 12 | - 4, Closteromerus claviger Fairmaire - 4, Demonax pseudonotabilis Gressitt & Rondon - 4, Ceresium sinicum ornaticolle | 12 | - 0, C. claviger - 4, D. pseudonotabilis - 4, Xystrocera globosa (Olivier) |
Disteniinae | 5 -7 | 5–11 | 5 | / | 5 | / |
The genes atp6, atp8, nad4, and nad4L are likely to play crucial roles in essential cellular processes like energy production and oxidative phosphorylation (
As depicted in Fig.
All four newly sequenced species exhibited a complete set of 22 tRNA genes, with entire tRNA lengths 1441 (M. latefasciatus), 1444 (M. guerryi), 1452 (M. sutor), 1452 (M. galloprovincialis). When individually examining these tRNAs, their lengths ranged from 61 to 70 in M. latefasciatus, from 63 to 70 in M. guerryi, from 63 to 69 in M. sutor, and from 63 to 69 in M. galloprovincialis, respectively. Except for trnS1, all tRNAs exhibited the characteristic cloverleaf structure, where the dihydrouridine arm of trnS1 formed as a single loop. This peculiar structure of trnS1 may arise from abnormalities during transcription or post-transcriptional processes, which could subsequently affect the development of its typical cloverleaf structure (
The tRNA structures in all four newly sequenced beetles exhibited a cloverleaf-like secondary structure overall; the phenomenon of tRNA-Ser1 lacking the DHU arm was present in all four newly sequenced beetles. Additionally, some degree of base mismatch occurred, with the most common type being the G-U mismatch. Specifically, there were 25 (21 G-U, 3 U-U, 1 A-G) mismatched base pairs in M. latefasciatus, 24 (20 G-U, 3 U-U, 1 A-G) mismatched base pairs in M. guerryi, 20 (15 G-U, 4 U-U, 1 A-G) mismatched base pairs in M. sutor, 21 (16 G-U, 4 U-U, 1 A-G) mismatched base pairs in M. galloprovincialis (Fig.
The rRNA components in the mitochondrial genomes of the four newly sequenced species encompass both lrRNA and srRNA. Specifically, lrRNA is situated between tRNA-Leu and tRNA-Val, while srRNA is positioned between tRNA-Val and the control region. The length of lrRNA genes ranged from 1329 bp in M. guerryi to 1334 bp in M. latefasciatus, while the sizes of srRNA genes varied from 775 bp in M. latefasciatus to 781 bp in M. guerryi, M. galloprovincialis, and M. sutor (Table S5).
The substitution saturation index (Iss) for the six nucleotide sequence datasets was notably lower than the critical Iss values (Iss.cSym and Iss.cAym) for both symmetrical and asymmetrical trees. This observation strongly indicated the absence of substitution saturation, providing clear evidence of a robust phylogenetic signal (all Iss < Iss.cSym or Iss.cAsym, P < 0.05) (Table
Data partition | Iss | Iss.cSym† | Psym‡ | Iss.cAsym§ | Pasym¶ |
PCG1 | 0.458 | 0.818 | 0.0000 | 0.572 | 0.0000 |
PCG12 | 0.458 | 0.818 | 0.0000 | 0.572 | 0.0000 |
PCG123 | 0.462 | 0.818 | 0.0000 | 0.572 | 0.0000 |
PCG1+rRNA | 0.461 | 0.818 | 0.0000 | 0.572 | 0.0000 |
PCG12+rRNA | 0.455 | 0.818 | 0.0000 | 0.572 | 0.0000 |
PCG123+rRNA | 0.602 | 0.819 | 0.0000 | 0.573 | 0.0000 |
To assess nucleotide divergence heterogeneity, pairwise comparisons were conducted within a multiple sequence alignment. The results revealed that datasets involving the third codon position (PCG123, PCG123R) exhibited lower heterogeneity in sequence composition. In comparison, datasets with the third codon position removed (PCG12, PCG12R) displayed intermediate heterogeneity levels. The AA dataset exhibited the highest heterogeneity. It’s worth noting that some species, such as C. claviger, Dynamostes audax Pascoe, H. nigromaculata, Oberea formosana Pic, and V. conicicollis, exhibited greater heterogeneity due to incomplete sequences caused by missing data (Fig.
Heterogeneity of the sequence composition of the mitochondrial genomes in different datasets. The pairwise Aliscore values are indicated by colored squares. Darker colors indicate full random similarity, and lighter colors indicate the opposite. All taxon names are listed on the left side of the heat map.
We selected a total of 86 species of longhorn beetles and analyzed four datasets using IQ-TREE and Phylobayes two methods. The results indicated that most nodes within the phylogenetic trees displayed strong support values, indicating robust branching patterns (Fig.
In these phylogenetic trees, results obtained from the AA dataset and the PCG123 dataset both constructed using IQ-TREE are notably similar with relatively high node support (Fig.
Through constructing phylogenetic tree and the analysis of mitochondrial genome data presented in this study, we have presented following perspectives on some contentious issues. The consistent clustering of Spondylidinae and Aseminae in the phylogenetic trees supports the idea that these two groups may be treated as a single entity, Spondylidinae, with some cases of synonymous nomenclature (Linsley et al. 1961;
In two methods employed for the analysis, the four newly sequenced beetles consistently grouped within the Lamiinae. This clustering demonstrates their close genetic affinity with other species within the Lamiinae. The proximity of closely related species within the same subfamily is evident in the phylogenetic trees, indicating a shared evolutionary history and genetic similarity among these species (Fig.
This study sequenced four species of Cerambycidae and downloaded 82 sequences of longhorn beetles from platforms such as NCBI, resulting in a dataset comprising 86 sequences. These sequences were employed in a comparative investigation of mitochondrial genomes within the longhorn beetles. Various aspects of genetic codons were analyzed, coupled with the construction of phylogenetic trees using two distinct methodologies and four distinct datasets. The study yielded several noteworthy conclusions: Firstly, at the gene level, multiple analytical approaches consistently showed that cox1 and cox2 exhibited the least variation and were relatively conservative in the overall evolutionary process. On the contrary, genes associated with respiration and energy production, such as atp6, atp8, nad4, displayed comparatively heightened genetic variations, indicative of their adaptability throughout evolutionary history (
This study was carried out at Beijing Key Laboratory for Forest Pest Control, Beijing Forestry University. The work was supported by the National Key Research & Development Program of China (2021YFD1400900). We thank the editor and reviewer for the valuable comments and suggestions. We thank TopEdit (www.topeditsci.com) for linguistic assistance during the preparation of this manuscript.
Tables S1–S4
Data type: .xlsx
Explanation notes: Table S1. Samples and gene information of mitochondrial genomes used in the study. — Table S2. Best partitioning schemes and models based on different datasets calculated by ModelFinder and PartitionFinder for maximum likelihood and Bayesian inference analysis. — Table S3. Base composition and strand bias across mitochondrial genomes used in the study. — Table S4. Evaluation of codon bias across Cerambycidae mitogenomes. ENC, effective number of codons..
Tables S5
Data type: .xlsx
Explanation notes: Sequence characteristics of the mitochondrial genomes of 86 longhorn beetles.
Tables S6
Data type: .xlsx
Explanation notes: The codon usage patterns of PCGs in the mitochondrial genomes of 86 longhorn beetles.
Figures S1, S2
Data type: .zip
Explanation notes: Figure S1. Nucleotide substitution saturation plots of mitochondrial genomes for different datasets. — Figure S2. Phylogenetic tree of Cerambycidae inferred from the AA, PCG123, PCG12RNA, PCG123R dataset using PhyloBayes (BI, A, B, C, E) and PCG12RNA, PCG123R dataset using IQ-TREE (ML, D, F), Branches are labeled with Bayesian posterior probabilities (PPs) more than 0.75 and parsimony bootstrap values (BPs) higher than 50%.