Characterization of the Complete Chloroplast Genome Sequence of Juniperus polycarpos K. Koch (Cupressaceae), from Ziarat, Pakistan

Objectives: To broaden the genetic information base of Juniperus and resolve phylogeny of Juniperus polycarpos through sequencing and characterization of its chloroplast genome. Methods: The chloroplast (cp) genome of J. polycarpos was sequenced and assembled using the Next-Generation Sequencing paired-end reads platform of BGISEQ-500 and annotated using CpGAVAS. The phylogenetic analysis was performed in MEGA7. Findings : Here, we report the complete cp genome sequence of J. polycarpos . The cp genome size is 127,825 bp with a typical circular structure and lack canonical inverted repeats having a total of 119 genes comprised of 82 protein-coding genes, 33 tRNA genes and four rRNA genes. The cp genome encodes 105 single copy genes and ﬁve duplicated genes ( ndhK, ccsA, rps12, trnE-TTC and trnQ-TTG ), and one tetraplicated gene ( trnM-CAT ). In these genes, 9 genes ( rpl2, ycf2, trnA-TGC, trnE-TTC, rpoC, rpoB, ndhB, ndhA and atpF ) harboring a single intron, three genes ( accD, rrn23s and ycf3 ) having two introns and one gene ( ycf1 ) harboring three introns. The overall GC content of J. polycarpos chloroplast DNA was 35%. Phylogenetic analysis among 14 species of order Coniferales based on cp genomes indicated a close relationship between J. polycarpos , J. cedrus and J. communis . Novelty and application: This is the ﬁrst report on the cp genome of J. polycarpos . The current study is expected to add to the already available genomic resources needed for more comprehensive population genetics studies and resolving phylogenetic relationships of order Coniferales. Besides, it will provide baseline data for future research on Juniperus of Pakistan in particular.


Introduction
The genus Juniperus (Cupressaceae), commonly known as "cedar", is among the most diverse genera of conifers: lacking consensus, however, on number of species. Farjon (1) , for example, reported 52 species of Juniperus while Adams (2) documented 67 species. In Pakistan, five species of Juniperus are reported. The Juniperus L. are divided into two sections (Juniperus and Sabina) and three subsections; (Juniperus, Oxycedrus and Caryocedrus), though some studies treat Caryocedrus as a complete section rather than subsection (3,4) . Juniperus L. genus are geographically widely distributed occurring across the Northern hemisphere, Arctic, Central American mountains, East Africa, Central Asia, and South Asia (2,5,6) .
The Juniper forest (Juniperus polycarpos) of Ziarat area in the province of Balochistan, Pakistan, are stretched on an area of 110 000 ha and is said to be the second largest of its kind in the world. The trees of juniper here are believed to be among the oldest living trees in the world and are, therefore, referred to as "living fossils" (7) making the species of immense importance for climate change and ecological studies.
The forest lies in the dry temperate woodland region where J. polycarpos is disbursed between 20 • 9´N and 30 • 37´N and between 67 • 1´E and 68 • 3´E, with elevation ranging from 1200 m to 3000 meters above sea level (8) . Besides supporting diverse plant species, the juniper ecosystem provides habitat for endangered wildlife species; including Black bear and the Sulaiman Markhor. The Juniper forest of Ziarat, in view of its importance for biodiversity, climate change and ecology, was declared the Biosphere Reserve in 2013 by UNESCO. Similarly, owing to old growing trees and large area, the ecosystem (as part of the global forest vegetation) has attained enormous importance as carbon stock (2) J. polycarpos K. Koch is a part of the J. excelsa complex, one of the most complicated taxonomic groups of Juniperus (9) . This complex consists of four morphologically cryptic taxa, and when recognized at the specific level are: J. excelsa M. Bieb., J. seravschanica Kom, J. polycarpos K. Koch, and J. turcomanica B. Fedtsch (10) . The phylogenies indicated that the J. excelsa complex is composed of three distinct clades at the species level: J. excelsa, J. polycarpos and J. seravschanica and two varieties of J. polycarpos: J. polycarpos var. polycarpos and J. polycarpos var. turcomanica. (11) . In many studies, J. excelsa and J. polycarpos are grouped as a unit (12,13) . Differences, however, exist between the two. In a study conducted in Iran, J. excelsa is termed unisexual (dioecious) while J. polycarpos as ambisexual (monoecious) (14) . Contrary to that a recent publication suggests J. excelsa as both monoecious and dioecious while J. polycarpos as dioecious only (15,16) . The Juniper of Ziarat in many studies is still listed under the name of Juniperus excels subsp. polycarpos, although RAPD and essential oil analysis indicated that the taxon should be treated as J. polycarpos (17) .
The current work is an attempt to broaden the genetic information base of the Juniperus. Advancement in sequencing technologies has opened avenues for revealing complete chloroplast genomic information in abundance of species. In this study, we characterized complete cp genome of J. polycarpos based on Next-Generation Sequencing approach using BGISEQ-500 followed by a de novo and reference guided assembly. We analyzed the genome features of J. polycarpos and compared them with cp genomes from other Gymnosperm species. We used cp genome and 14 shared cp genes to perform phylogenomic analysis to study the phylogeny of order Coniferales and resolve the phylogenetic position of J. polycarpos.

Chloroplast genome sequencing, assembly and annotation
Total genomic DNA from the silica-dried leaves of J. polycarpos was extracted using DNeasy Plant Mini Kit (QIAGEN Inc., Valencia, CA, USA). The whole genome of J. polycarpos was sequenced and assembled using the Next-Generation Sequencing paired-end reads platform of BGISEQ-500 (BGI, Shenzhen). We respectively used SOAPfilter_v2.2 and seed-extension-based de novo assembler NOVOPlasty v2.6.1 (18) to filter low quality raw reads and carrying out de novo assemblies of whole genome. To conduct the assemblies, complete rbcL gene sequence of J. excelsa, accession number HM024303, downloaded from National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm. nih.gov/) was used as the seed. The J. cedrus, was used, as a reference for further assembly with the help of 'mitochondrial baiting and iterative mapping' approach assembler MITObim v1.8 (19) to recover complete plastid genomes of J. polycarpos. To retrieve the missing sequence, imputation technique was utilized. To carry out imputation, eight closely related complete genome sequences of Juniperus sp. (Table 1) to J. polycarpos were downloaded from NCBI via BLASTn. Pairwise and multiple alignments were carried out using ClustalW (BioEdit). After the sequences were aligned, consensus sequence was generated in BioEdit 7 (20) . The consensus sequence was considered as a reference sequence and was aligned with the sample sequence via ClustalW. A huge gap was observed in sample sequence after alignment, the sequence was copied from reference and was attached in sample sequence where huge gap was generated. Eventually, the complete chloroplast genome was annotated using CpGAVAS (http://www.herbalgenomics. org) and the primary annotated results were corrected using Dual Organellar GenoMe Annotator -DOGMA -and Basic Local Alignment Search Tool -BLAST. tRNAscanSE was used to identify the tRNAs.

Phylogenetic analysis
We downloaded 14 cp genomes of closely related species of order Coniferales from GenBank to study the phylogenetic relationships among Juniperus sp. (Table 1). Callitropsis vietnamensis, Cupressus jiangeensis, Thuja standishii, Hesperocyparis benthami, and Taxus fauna were used as out-group. Initially, multiple sequence alignments of complete cp genomes, based on the conserved structure and gene order, were carried out by using BioEdit (20) with default parameters. Neighbor-Joining tree were constructed (21) in MEGA 7 (22) with 1000 bootstrap replicates and Kimura 2-parameter (23) . https://www.indjst.org/

Genome assembly and genome features of J polycarpos
The complete genome of J. polycarpos was sequenced using Next-Generation Sequencing approach (BGISEQ-500) utilizing Denovo and reference-based assembly. The sequencing produced raw data of about 65 Gb for the whole genome with an average read length of 100 bp. Using J. cedrus rbcL region as seed, we extracted and assembled Chloroplast genome of J. polycarpos from the whole genome by means of the 'mitochondrial baiting and iterative mapping' approach assembler MITObim v1.8 (19) . We recovered the missing junk of approximately 441bp by Imputation technique after comparing cp genome of J. polycarpos with cp genome of J. cedrus and annotating by web-based program CpGAVAS (24) . DOGMA was used to check and curate the annotations for missing annotations. The cp genome of J. polycarpos was 127,825bp in length. The assembled cp genome was a typical circular molecule that lack canonical inverted repeats which separate the small and large single copy regions (Figure 2). The chloroplast genome map of J. polycarpos is given in Figure 2. The overall GC content of the cp genome was 35%.

Cp Genome Annotation
The cp genome of J. polycarpos contained 119 genes divided into the following three categories: 82 protein-coding genes, 4 ribosomal RNA genes and 33 transfer RNA genes ( Table 2).

Genes of unknown function.
ycf2 is a split gene with a single intron and unknown function.

Transfer RNA gene
Thirty-three (33) transfer RNA were identified by CPGAVAS and validated by DOGMA and tRNAscan-SE (Table 2). tRNAscan-SE identifies 99-100% of transfer RNA genes in DNA sequence whilst giving much less than one false superb per 15 gigabases. The Av. tRNA size is 89bp and it anticipated 29 tRNAs ensuing 28 tRNAs.

Split Genes
There are 13 split genes were identified in Chloroplast genome of J. polycarpos, in which 9 genes (rpl2, ycf2, trnA-TGC, trnE-TTC, rpoC, rpoB, ndhB, ndhA and atpF) having single intron, three genes (accD, rrn23s and ycf3) having two introns and one gene (ycf1) harboring three introns. DOGMA validated all genes annotated by CPGAVAS and tRNAscan besides detecting three more genes viz., a conserved open reading frame gene of unknown function ycf68 having three copies and two open reading frame genes orf42 and orf188.

Comparative Analyses of the Chloroplast Genome with Other Coniferales
When compared to different Juniperus species the chloroplast genome has more or less similar genes except that there are five duplicated genes (ndhK, ccsA, rps12, trnE-TTC and trnQ-TTG), and one tetraplicated gene(trnM-CAT) ( Table 3) and 13 split genes (Table 3).

Phylogenetic analysis
In the present study, 14 complete chloroplast genomes of six genera of order Coniferales were utilized to depict the phylogenetic relationships. The history of evolution was inferred using the Neighbor-Joining tree. The bootstrap consensus tree formed from 1000 replicates by using the Kimura 2-parameter method. The results showed that all Juniperus species involved were clustered into two supported monophyletic groups that belong to the Juniperus sect. Juniper and Juniperus sect. Sabina, respectively. Within the former J. polycarpos is more closely related to J. cedrus and J. communis (Figure 3). https://www.indjst.org/

Discussion
Gymnosperm chloroplast (cp) genomes, especially in coniferous species, have distinctive characteristics compared to those of angiosperms, including paternal inheritance (25)(26)(27) , relatively high levels of intra-specific variation (28,29) , and a different pattern of RNA editing (30) . The complete cp genome of Juniperus polycarpos was assembled in the present work, using BGISEQ-500 (BGI, Shenzhen) paired-end reads derived from the whole genome. Several other studies have adopted similar strategy of obtaining the cp genome without prior isolation of the cpDNA (31)(32)(33)(34) .
The size of the complete cp genome of J. polycarpos (127,825bp) was consistent with cp genomes from the other sequenced Juniperus species (i.e., ranging from 127,126 bp in J. cedrus to 127,792 bp in J. squamata ( Table 3). The absence of inverted repeat (IR) sequence in conifers largely accounts for relatively small cp genome size, as observed for J. polycarpos in the present report and for other published works on cp genomes of Cupressaceae species (35)(36)(37)(38)(39) . Conifers like Douglas-fir and radiata pine also lack the large (20-25 kb) inverted repeat that characterizes most land plants (40) . Similarly, a single group of land plants consisting of several allied tribes in the subfamily Papilonoideae of the legume family (Fabaceae) lack major inverted repeat of roughly 10-76 kb that contains the rRNA genes and adjacent DNA which results in extensive genome sequence rearrangements (40,41) . The angiosperm cp genomes, on the other hand, range in size from 130 to 160 kb, and contain two identical inverted repeats (IRs) splitting the genomes into Large (LSC) and small single-copy (SSC) regions (31) .
The overall GC content of the cp genome, which was 35% in our work, corresponded with the GC contents calculated for other Juniper species (Table 3). The cp genome of J. polycarpos has typical Coniferales circular structure ( Figure 2) with similar genes resembling other Juniperus species (Table 3).
Many phylogenetic studies in land plants have used genome sequences of chloroplast to analyze relatedness and identify the species (42)(43)(44) . Still in other similar studies some Coding Sequences (CDSs) e.g., matK, rbcL, rpoB and intergenic regions https://www.indjst.org/ were used as barcode markers. However, low variability of plant barcode markers (45,46) , particularly for closely related species, necessitates genome-based analysis. The present report is a step forward in this direction. The availability and comparison of cp genomes will facilitate designing better barcode markers discriminating juniperus species.
Chloroplast genome based phylogenetic analyses in the current report provided strong support for the monophyly of J. polycarpos in the Coniferales; endorsing previous studies on the phylogeny of the Coniferales (47) . Our findings have further elucidated the phylogenetic relationships among Coniferales species. However, availability of additional complete chloroplast genome sequences will add to resolution of the comprehensive phylogenies of this order.

Conclusion
Our study maidenly reported the complete chloroplast genome of J. polycarpos. The cp genome organization and gene content were found similar to that of congeneric species. The comparative analysis of the genome structure of six Coniferales plants showed several variation hotspots, which could be used to develop more specific DNA barcodes for the authentication of Coniferales species. These highly variable regions also presented a resource for phylogenetic studies in the family Cupressaceae. We depicted the phylogenetic relationships of some species of order Coniferales and confirmed the phylogenetic relationship between J. cedrus and J. communis .