Approval of law opened the window for scientific community to conduct research and cultivate hemp. Since then, 33 US states and more than 47 countries around the world have been growing hemp for research and industrial use . On the other hand, Marijuana research and legalization have been expanding at a comparatively slower rate and till now only 16 countries have legalized medicinal cannabis . Furthermore, a detailed study would be desirable to understand the gene function, the genetic composition, and the underlying mechanisms regulating the diversity of cannabinoids in both major varieties. Availability of the regeneration protocol and transformation studies could be utilized for the expression studies to unravel the mystery of these mechanisms, especially in trichomes. Glandular trichomes are the primary site for cannabinoid biosynthesis and accumulation in C. sativa. The biosynthesis of cannabinoids starts from the plastidial localized methylerythritol 4-phosphate pathway resulting in the formation of geranylpyrophosphate and the fatty acid pathway leading to the production of olivetolicacid.GPP and OA in the presence of olivetolic acid cyclase and an aromatic prenyltransferase catalyze the reaction to form the cannabigerolicacid, which is the centralprecursor for cannabinoids biosynthesis. van Bakel et al., 2011 analyzed the transcriptomic and genomic data and described the exclusive presence of the THCAS and CBDAS in the drug and hemp typeplant, respectively . It is suggested that the activation of respective enzymes from the central precursor CBGA is responsible for regulating the THC and CBD concentration for eachchemotype. However, the precise regulatory mechanism is still unknown.Besides biosynthesis, understanding the trichome physiology is also vital to elucidate the trafficking and localization of metabolites. For cannabinoid biosynthesis, there exist three major reactions: biosynthesis of monoterpene precursor via MEP and fatty acid intermediate from polyketide pathway, prenylation of the precursors, and cyclization. The MEP pathway in plastid prenylation is localized in the chloroplast membrane, where the C-prenylated CBGA synthase is membrane-bound.
The integration of the enzyme in the membrane seems essential, and the folding pattern is crucial for its functioning. Therefore,simple cloning and functional expression of this enzyme in a heterologous host such as yeast to generate the desired cannabinoids is challenging. Terpenoid cyclization reactions are the most complex reactions found in nature and the biotransformation from CBGA to THCA by the THCA synthase is assumed to occur in the cytosol. This hypothetical model is under ongoing debate and it might be likely that biocatalysis occurs in the extracellular oil container under a non-aqueous environment .In 1992, Mahlberg and Kim postulated that THCA synthase is located in the outer membrane of the head cells or even attached on the outer membrane surface extending into the essential oil . In recent studies, LC-MS/MS was used to detect a functional active THCA and CBGA synthase in the exudates from glandular trichomes of cannabis . Zirpel et al.,described the need for an excellent understanding of protein chemistry and folding of these enzymes to produce the cannabinoid using a heterologous host . Detailed knowledge of genetic regulatory mechanisms underlying cannabinoid biosynthesis is a future challenge. Identification of regulatory elements such as transcription factors and micro RNAs could be utilized to understand the mechanistic insights of trichomes initiation, development, and densities. An in-depth understanding could be applied toward the breeding of genetically improved cannabis varieties with enhanced cannabinoids concentration in trichomes. Drug- and fiber-type plants differ in biosynthesis, concentration, and composition of metabolites . To determine the genetic variations regulating plant-specific differences, it is essential to compare the genomes. Advanced sequencing technologies combined with continuously improving bioinformaticstools have allowed rapid sequencing and analysis of multiple genomes and transcriptomes. The very first draft genome of C. sativa was released in 2011 by Bakel et al. . They sequenced cannabis grow racks cultivar Purple Kush by using Illumina short reads and assembled them in combination with 454 reads. They also sequenced fiber-type hemp cultivar Finola for a genome-level comparison. In addition to whole genome, the first complete mitochondrial reference genome was also obtained in 2016from the cannabis hemp variety Carmagnola.Later in July 2016, two complete chloroplastgenomes of marijuana African variety Yoruba Nigerian and Korean hemp non-drugvariety Cheungsam were sequenced and used to determine the phylogenetic position of C. sativa relative to other members in the order Rosales.
Furthermore, in September 2016released complete chloroplast genomes of two Cannabis hemp varieties, the Carmagnola and Dagestani , to determine their genetic distance compared with the closest cannabaceae chloroplast of Humulus lupulus variety Saazer .Increasingly growing support for open-data policy by multiple industries is improving transparency in cannabis agriculture. In 2016, the industrial lead in cannabis research from Courtagen Life Sciences and Phylos Bioscience independently generated the genomes of hybrid marijuana strain Chemdog91 and marijuana strain Cannatonic , respectively.Phylos Bioscience also released genomic data of 850 Cannabis strains as a part of ‘‘Open Cannabis Project’’ for plant breeding programs. With an objective to explore Cannabis population genetics, PhylosBio science developed three-dimensional interactive map of nearly 1000 cannabis strains . In 2017, the genome of hybrid marijuana cultivar Pineapple Banana Bubba Kush was released as part of Cannabis Genomic Research Initiative. In 2018, Grassa et al. generated the first chromosome-level assembly for the genome of CBDRx, a high CBD cultivar of C. sativa by using advanced long-read Oxford Nanopore Technology and PacBio Single-Molecule Real-Time sequencing. Later in 2019, Laverty et al., improved the initial draft assemblies of drug-type Purple Kush and hemp-type Finola to chromosome-level by using ultra-long PacBio reads. In addition to genomes of high CBD and THC marijuana and hemp cultivars, a medicinal Cannabis strain with a balanced THC/CBD ratio was sequenced by Shivraj et al. .Until 2020, nearly all Cannabis genomes had been obtained from the hemp and marijuana cultivars, selectively bred for generations. However, cultivars lose genetic diversity owing to domestication and successive plant breeding for selected traits. In contrast, the wild-type genomes exhibit relatively high heterozygosity and genetic diversity, which might provide unique evolutionary insights into the cannabis genome. Therefore,in 2020, Gao et al. sequenced the first samples of C. sativa wild-type ‘‘Jamaican Lion’’ variety growing in the geographically isolated Himalayan region in Tibet. Because these wild-type plants retained theancestral genetic make-up, therefore, the data generated from this study was used as a tool to determine the inheritance patterns and evolutionary inference of cannabis .The published genomes of high THC, high CBD marijuana cultivars, and hemp varieties, exhibited inconsistent chromosomal nomenclature, arrangement, and varying degree of gaps. Therefore, by end of 2020,Shivraj Braich et al. generated a relatively complete draft genome assembly for Cannbio-2, the medicinal cannabis strain with a balanced THC/CBD ratio .
To present date, only 13 Cannabis genomes are publicly available at National Center for Biotechnological Information.Of which 3 assemblies are at chromosome-level, 7 at contig-level, and one at scaffold-level. However, by March 2021, the 1000 Cannabis Genomes Project comprises of genomic data of nearly 1000 samples from multiple cannabis strains. These datasets were the first genome data released on Google Cloud Big Query database.Continuously expanding the list of cannabis genomes needs collaborative efforts toward curating the information.Therefore, academic and industry experts in diverse fields formed the International Cannabis Research Consortium during the annual PAG meeting in 2020. Despite several cannabis genome assemblies, the selection of single standard reference genome is still a huge challenge for the scientific community, especially plant breeders. Therefore, ICRC proposed CBDRX Cs10 assembly as the most complete reference for use in cannabis genome research . Additionally, a member genomics company, NRGene, generated an integrated Cannabis, and Hemp Genomic Database optimized and curated for the genomics-based breeding of cannabis varieties. Finally, in 2021, the first-ever open-access and comprehensive database of cannabis genome Cannabis GDB were released with integrated bio-informatic tools for the analysis of datasets.Overall, the genomic data of diverse cannabis genotypes are the untapped reservoirs of genetic information which could be applied toward pan-genomic understanding of cannabis evolution and determining the effect of genetic variations upon the pathways, development, and concentration of cannabis derivatives.Detailed genetic atlas would facilitate the designing and further breeding of cannabis varieties forpreferred metabolic yields. The availability of several high-quality cannabis grow system genomes made it easier to apply the transcriptome sequencing to elucidate detailed expression dynamics in time-, tissue-, stage-, and chemotype-dependent manner. Furthermore, the differential expression analysis provides in-depth insight into co-related genenet works. In 2011, Bakel et al. sequenced and compared the transcriptomes of marijuana variety PurpleKush and hemp cultivars Finola and USO-31. Gene expression analysis revealed preferential expression of cannabinoid and precursor pathway-associated genes in marijuana . Expression ofTHCA synthase in the PK and cannabidiolic acid synthase in FN was found to be consistent with the exclusive production of psychoactive THC in marijuana. In a recent study, transcriptomics of hemp-type plants was analyzed to determine the expression profile of the fiber-type plant at the various developmental stages . Similarly, the transcriptome of marijuana flowers at different stages was captured and sequenced and found the gene expression pattern consistent with the cannabinoid contents.As glandular trichomes are the central reservoir for cannabinoids ,therefore, the trichome transcriptome could yield valuable insight to determine the variation in cannabinoid biosynthesis, composition, and concentration between the drug and fiber-type plants. Importantly,the identification of the differentially expressed genes could unravel the underlying molecular mechanisms of natural genetic and metabolic variation. The gene expression in trichomes of female plant strain Cannobio-2 was compared with genome-wide transcriptomics of female floral tissues at different stages of development as well as other tissues including female and male flowers, leaves, and roots .
The extensive-expression atlas was applied toward the identification of genes expressed preferentially in various tissues at different developmental stages. Interestingly, the majority of genes involved in terpenoidand cannabinoids synthesis were significantly over-expressed in trichomes. In 2021, Grassa et al. usedgenomic, and expression associated expression of THCAS and CBDAS with THC:CBD ratio by Quantitativetrait Loci analysis of Cannabis cultivars .Datasets from similar genomics, transcriptomics, microbiome, and metagenomics studies of various cannabis strains are currently accessible from the Sequence Read Archive repository at NCBI. In the past 3 years, there has been unprecedented growth in Cannabis genome and transcriptome studies and corresponding SRA entries. To date, there are over 4571 Bio Samples from multiple studies related to Cannabis of which 2871 public Bio Samples are exclusively for C. sativa with 2546 DNA and 325 RNASeqdatasets in SRA. The SRA data for transcriptomics and metagenomics have reportedly procured from various tissues including seeds , flowers , leaves , shoot stem , root , and trichomes, while genomic data lacks tissue-specific information. In-depth transcriptomic studies will be required in the future to improve the understanding of regulatory genetic networks. One of the fundamental aspects of patents, especially in medical science or biotechnology, is to involve industrial partners in investing in research and development .Cannabis-related patents have been issued by the US-patent office since 1942. More than 1,500 applications have been filed only in the US patent office. Among them, approximately 500 applications got patent protection rights and most of them were from the last decade.Here, we analyzed the patentsspatiotemporally and categorized them into four main classes: patents related to cannabinoids as constituents, pharmaceutical applications,endocannabinoid pharmacology, and genome and gene related. Among the suggested four categories, the patents related to the pharmaceutical application were the most significant category with 73 patents registered. These are further sub-grouped into the preparation of the drugs, treatment, delivery technology, and detection method each with 14,33, 13, and 13 patents, respectively. Endocannabinoids-related patents comprised of the CB1/2 receptor, TRPV1 , and GPR119 reviewed in . The category of cannabinoids consists of cannabinoid isolation, extraction, and synthesis or biosynthesis-related patents each with 6, 6, and 12patents granted, respectively. For the division of the sequences, 15 patents are from enzyme inhibition followed by the gene and the protein each with two patents. Most of the patents are from the US followed by the GB and the other European countries Figure 2 . In addition, 25 patents for fiber/textile, 10 for foodstuff, 5 for the paper industry, 3 for architecture,1 for biofuel, and 3 for plant breeding have been registered. Also, four patents each in the category of oil,extracts, and cosmetics each with four have been filed.