Assembly and annotation of the chloroplast genomes
Assembly resulted in a whole cp genome sequence of C. hirtinoda with a length of 139, 561 bp (Fig. 1), consisting of 83, 166 bp large single-copy region, 20, 811 bp small single-copy regions, and two 21,792 bp IR regions, comprising the typical quadripartite structure of terrestrial plants. The cp genome of C. hirtinoda was annotated with 130 genes, including 85 protein-coding genes, 37 tRNA genes, and 8 rRNA genes (Table 1). Most of the 15 genes in the C. hirtinoda cp genome contain introns. Of these, 13 genes contain one intron (atpF, ndhA, ndhB, petB, petD, rpl2, rpl16, rps16, trnA-UGC, trnI-GAU, trnK-UUU, trnL-UAA, trnV-UAC) and only the gene cyf3 includes two introns, and the gene clpP intron was deleted (Supplementary Table S1). The rps12 gene contained two copies, and the three exons were spliced into a trans-splicing gene18.
The accD, ycf1, and ycf2 genes were missing in the cp genome of C. hirtinodaand the introns in the genes clpP and rpoC1 were lost. This phenomenon is consistent with previous systematic evolutionary studies on the genome structure of plants in the Poaceae family19. The phenomenon of missing genes is reported in other plants20,21,22,23.
The total GC content in the C. hirtinoda cp genome was 38.90%, and the content for each of the four bases, A, T, G, and C, was 30.63%, 30.46%, 19.57%, and 19.33%, respectively (Table 2). The LSC region (36.98%) and SSC region (33.21%) exhibited much lower values than the IR region (44.23%), indicating a non-uniform distribution of the base contents in the cp genome, probably because of four rRNAs in the IR region, which in turn makes the GC content higher in the IR region. These values were similar to cp genome results previously reported for some Poaceae plants24.25.
Repeat sequences and codon analysis
SSR consists of 10-bp-long base repeats and is widely used for exploring phylogenetic evolution and genetic diversity analysis26,27,28,29.
In total, 48 SSRs were detected in C. hirtinoda, including 27 mononucleotide versions, accounting for 56.25% of the total SSRs, primarily consisting of A or T. Additionally, four dinucleotide repeats consisting of AT / TA and TC / CT repeats, and 3 tri, 13 tetra, and 1penta-repeats ( Fig. 2A). From the SSRs distribution perspective, the majority (79%) of SSRs (38) were observed in the LSC area, whereas 6 SSRs in the IR region (13%) and 4 SSRs in the SSC region (8%) were discovered (Fig. . 2B). Previous research suggests that the distribution of SSRs numbers in each region and the differences among locations in GC content are related to the expansion or contraction of the IR boundary30.
The REPuter program revealed that the cp genome of C. hirtinoda was identified with 61 repeats, consisting of 15 palindromic, 19 forward and no reverse and complement repeats (Fig. 3). We noticed that repeat analyzes of three Chimonobambusa genus species exhibited 61–65 repeats, with only one reverse in C. hejiangensis. Most of the repeat lengths were between 30 and 100 bp, and the repeat sequences were located in either IR or LSC region31 (Supplementary Table S2).
We identified 20,180 codons in the coding region of C. hirtinoda (Fig. 4, Supplementary Table S3). The codon AUU of Ile was the most used, and the TER of UAG was the least used codon (817 and 19), excluding the termination codons. Leu was the most encoded amino acid (2,170), and TER was the lowest (85). The Relative Synonymous Codon Usage (RSCU) value greater than 1.0 means a codon is used more frequently32. The RSCU values for 31 codons exceeded 1 in the C. hirtinoda cp genome, and of these, the third most frequent codon was A / U with 29 (93.55%), and the frequency of start codons AUG and UGG used demonstrated no bias (RSCU = 1).
Comparative analysis of genome structure
The nucleotide variability (Pi) values of the three cp genomes discovered in the Chimonobambusa genus species ranged from 0 to 0.021 with an average value of 0.000544, as demonstrated from DnaSP 5.10 software analysis. Five peaks were observed in the two single-copy regions, and the highest peak was present in the trnT-trnE-trnY region of the LSC region (Fig. 5). The Pi value for LSC and SSC is significantly higher than that of the IR region. In the IR region, highly different sequences were not observed, a highly conserved region. The sequences of these highly variable regions are reported in other plants during examinations for species identification, phylogenetic analysis, and population genetics research.33,34,35.
The structural information for the complete cp genomes among three Chimonobambusa genus species revealed that the sequences in most regions were conserved (Fig. 6). The LSC and SSC regions exhibit a remarkable degree of variation, higher than the IR region, and the non-coding region demonstrates higher variability than the coding region. In the non-coding areas, 7–9 k, 28–30 k, 36 k and other gene loci differed significantly. Genes rpoC2, rps19, ndhJ and other regions differ in the protein-coding region. However, the agreement between the tRNA and rRNA regions is 100%. A similar phenomenon has also been reported by others36.
IR contraction and expansion in the chloroplast genome
Due to the unique circular structure of the cp genome, there are four junctions between the LSC / IRB / SSC / IRA regions. During species evolution, the stability of the two IR regions sequences was ensured by the IR region of the chloroplast genome expanding and contracting to some degree, and this adjustment is the primary reason for chloroplast genome length variation37.38.
The variations at IR / SC boundary regions in the three Chimonobambusa genus chloroplast genomes were highly similar in organization, gene content, and gene order. The size of IR ranges from 21,797 bp (C. tumidissinoda) to 21,835 bp (C. hejiangensis). The ndhH gene spans the SSC / IRa boundary, and this gene extended 181–224 bp into the IRa region for all three Chimonobambusa genus. The gene rps19 was extended from the IRb to the LSC region with a 31–35 bp gap. The rpl12 gene was located in the LSC region of all genomes, varied from 35–36 bp apart from the LSC / IRb (Fig. 7).
Three chloroplast genomes of the Chimonobambusa genus were compared using the Mauve alignment. The results showed that all sequences show perfect synteny conservation with no inversion or rearrangements (Fig. 8).
We performed a phylogenetic analysis using the complete chloroplast genomes and queen gene reflecting the phylogenetic position of C. hirtinoda. The maximum likelihood (ML) analysis based on the complete chloroplast genomes indicated seven nodes with entirely branch support (100% bootstrap value). However, the three Chimonobambusa genera exhibited a moderate relationship due to fewer samples used, supporting that C. hirtinoda is closely related to C. tumidissinoda with a 62% bootstrap value more than C. hejiangensis. A phylogenetic tree based on the queen gene revealed that Chimonobambusa species clustered in one branch was consistent with the phylogenetic tree constructed by the complete cp genome tree (Fig. 9). The results show that the whole chloroplast genome identified related species better than the former, consistent with the previous study39.