Detailed genetic atlas would facilitate the designing and further breeding of cannabis varieties for preferred metabolic yields

For cannabinoid biosynthesis, there exist three major reactions: biosynthesis of monoterpene precursor via MEP and fatty acid intermediate from polyketide pathway, prenylation of the precursors, and cyclization. The MEP pathway in plastid prenylation is localized in the chloroplast membrane, where the C-prenylated CBGA synthase is membrane-bound. The integration of the enzyme in the membrane seems essential, and the folding pattern is crucial for its functioning. Therefore, simple cloning and functional expression of this enzyme in a heterologous host such as yeast to generate the desired cannabinoids is challenging. Terpenoid cyclization reactions are the most complex reactions found in nature and the biotransformation from CBGA to THCA by the THCA synthase is assumed to occur in the cytosol. This hypothetical model is under ongoing debate and it might be likely that biocatalysis occurs in the extracellular oil container under a non-aqueous environment . In 1992, Mahlberg and Kim postulated that THCA synthase is located in the outer membrane of the head cells or even attached on the outer membrane surface extending into the essential oil . In recent studies, LC-MS/MS was used to detect a functional active THCA and CBGA synthase in the exudates from glandular trichomes of cannabis . Zirpel et al., described the need for an excellent understanding of protein chemistry and folding of these enzymes to produce the cannabinoid using a heterologous host . Detailed knowledge of genetic regulatory mechanisms underlying cannabinoid biosynthesis is a future challenge. identification of regulatory elements such as transcription factors and microRNAs could be utilized to understand the mechanistic insights of trichomes initiation, development, and densities. An in-depth understanding could be applied toward the breeding of genetically improved cannabis varieties with enhanced cannabinoids concentration in trichomes.

Drug- and fiber-type plants differ in biosynthesis, concentration, and composition of metabolites . To determine the genetic variations regulating plant-specific differences, it is essential to compare the genomes. Advanced sequencing technologies combined with continuously improving bioinformatics tools have allowed rapid sequencing and analysis of multiple genomes and transcriptomes. The very first draft genome of C. sativa was released in 2011 by Bakel et al. . They sequenced marijuana cultivar Purple Kush by using Illumina short reads and assembled them in combination with 454 reads. They also sequenced fiber-type hemp cultivar Finola for a genome-level comparison. In addition to whole genome,cheap grow tents the first complete mitochondrial reference genome was also obtained in 2016 from the cannabis hemp variety Carmagnola . Later in July 2016, two complete chloroplast genomes of marijuana African variety Yoruba Nigerian and Korean hemp non-drug variety Cheungsam were sequenced and used to determine the phylogenetic position of C. sativa relative to other members in the order Rosales. Furthermore, in September 2016 released complete chloroplast genomes of two Cannabis hemp varieties, the Carmagnola and Dagestani , to determine their genetic distance compared with the closest cannabaceae chloroplast of Humulus lupulus variety Saazer. Increasingly growing support for open-data policy by multiple industries is improving transparency in cannabis agriculture. In 2016, the industrial lead in cannabis research from Courtagen Life Sciences and Phylos Bioscience independently generated the genomes of hybrid marijuana strain Chemdog91 and marijuana strain Cannatonic , respectively. Phylos Bioscience also released genomic data of 850 Cannabis strains as a part of ‘‘Open Cannabis Project’’ for plant breeding programs.

With an objective to explore Cannabis population genetics, Phylos Bioscience developed three-dimensional interactive map of nearly 1000 cannabis strains . In 2017, the genome of hybrid marijuana cultivar Pineapple Banana Bubba Kush was released as part of Cannabis Genomic Research Initiative. In 2018, Grassa et al. generated the first chromosome-level assembly for the genome of CBDRx, a high CBD cultivar of C. sativa by using advanced long-read Oxford Nanopore Technology and PacBio Single-Molecule Real-Time sequencing . Later in 2019, Laverty et al., improved the initial draft assemblies of drug-type Purple Kush and hemp-type Finola to chromosome-level by using ultra-long PacBio reads . In addition to genomes of high CBD and THC marijuana and hemp cultivars, a medicinal Cannabis strain with a balanced THC/CBD ratio was sequenced by Shivraj et al. . Until 2020, nearly all Cannabis genomes had been obtained from the hemp and marijuana cultivars, selectively bred for generations. However, cultivars lose genetic diversity owing to domestication and successive plant breeding for selected traits. In contrast, the wild-type genomes exhibit relatively high heterozygosity and genetic diversity, which might provide unique evolutionary insights into the cannabis genome. Therefore, in 2020, Gao et al. sequenced the first samples of C. sativa wild-type ‘‘Jamaican Lion’’ variety growing in the geographically isolated Himalayan region in Tibet. Because these wild-type plants retained the ancestral genetic make-up, therefore, the data generated from this study was used as a tool to determine the inheritance patterns and evolutionary inference of cannabis . The published genomes of high THC, high CBD marijuana cultivars, and hemp varieties, exhibited inconsistent chromosomal nomenclature, arrangement, and varying degree of gaps. Therefore, by end of 2020, Shivraj Braich et al. generated a relatively complete draft genome assembly for Cannbio-2, the medicinal cannabis strain with a balanced THC/CBD ratio . To present date, only 13 Cannabis genomes are publicly available at National Center for Biotechnological Information . Of which 3 assemblies are at chromosome-level, 7 at contig-level, and one at scaffold-level. However, by March 2021, the1000 Cannabis Genomes Project comprises of genomic data of nearly 1000 samples from multiple cannabis strains. These datasets were the first genome data released on Google Cloud Big Query database.

Continuously expanding the list of cannabis genomes needs collaborative efforts toward curating the information. Therefore, academic and industry experts in diverse fields formed the International Cannabis Research Consortium during the annual PAG meeting in 2020. Despite several cannabis genome assemblies, the selection of single standard reference genome is still a huge challenge for the scientific community, especially plant breeders. Therefore, ICRC proposed CBDRX Cs10 assembly as the most complete reference for use in cannabis genome research . Additionally, a member genomics company, NRGene, generated an integrated Cannabis, and Hemp Genomic Database optimized and curated for the genomics-based breeding of cannabis varieties. Finally, in 2021, the first-ever open-access and comprehensive database of cannabis genome Cannabis GDB were released with integrated bioinformatic tools for the analysis of datasets. Overall, the genomic data of diverse cannabis genotypes are the untapped reservoirs of genetic information which could be applied toward pan-genomic understanding of cannabis evolution and determining the effect of genetic variations upon the pathways, development, and concentration of cannabis derivatives.The availability of several high-quality cannabis genomes made it easier to apply the transcriptome sequencing to elucidate detailed expression dynamics in time-, tissue-, stage-, and chemotype-dependent manner. Furthermore, the differential expression analysis provides in-depth insight into co-related gene networks. In 2011, Bakel et al. sequenced and compared the transcriptomes of marijuana variety Purple Kush and hemp cultivars Finola and USO-31. Gene expression analysis revealed preferential expression of cannabinoid and precursor pathway-associated genes in marijuana . Expression of THCA synthase in the PK and cannabidiolic acid synthase in FN was found to be consistent with the exclusive production of psychoactive THC in marijuana.

In a recent study, transcriptomics of hemp-type plants was analyzed to determine the expression profile of the fiber-type plant at the various developmental stages . Similarly, the transcriptome of marijuana flowers at different stages was captured and sequenced and found the gene expression pattern consistent with the cannabinoid contents . As glandular trichomes are the central reservoir for cannabinoids , therefore, the trichome transcriptome could yield valuable insight to determine the variation in cannabinoid biosynthesis, composition, and concentration between the drug and fiber-type plants. Importantly, the identification of the differentially expressed genes could unravel the underlying molecular mechanisms of natural genetic and metabolic variation. The gene expression in trichomes of female plant strain Cannobio-2 was compared with genome-wide transcriptomics of female floral tissues at different stages of development as well as other tissues including female and male flowers, leaves, and roots . The extensive-expression atlas was applied toward the identification of genes expressed preferentially in various tissues at different developmental stages. Interestingly, the majority of genes involved in terpenoid and cannabinoids synthesis were significantly over-expressed in trichomes. In 2021, Grassa et al. used genomic, and expression associated expression of THCAS and CBDAS with THC:CBD ratio by Quantitative trait Loci analysis of Cannabis cultivars . Datasets from similar genomics,grow tent indoor transcriptomics, micro-biome, and metagenomics studies of various cannabis strains are currently accessible from the Sequence Read Archive repository at NCBI. In the past 3 years, there has been unprecedented growth in Cannabis genome and transcriptome studies and corresponding SRA entries. To date, there are over 4571 BioSamples from multiple studies related to Cannabis of which 2871 public BioSamples are exclusively for C. sativa with 2546 DNA and 325 RNASeq datasets in SRA. The SRA data for transcriptomics and metagenomics have reportedly procured from various tissues including seeds , flowers , leaves , shoot stem , root , and trichomes , while genomic data lacks tissue-specific information.

In-depth transcriptomic studies will be required in the future to improve the understanding of regulatory genetic networks.One of the fundamental aspects of patents, especially in medical science or biotechnology, is to involve industrial partners in investing in research and development . Cannabis-related patents have been issued by the US-patent office since 1942. More than 1,500 applications have been filed only in the US patent office. Among them, approximately 500 applications got patent protection rights and most of them were from the last decade. The exponential increase in the number of patents shows the future potential for the growing cannabis industry. Here, we analyzed the patents spatiotemporally and categorized them into four main classes: patents related to cannabinoids as constituents, pharmaceutical applications, endocannabinoid pharmacology, and genome and gene related. Among the suggested four categories, the patents related to the pharmaceutical application were the most significant category with 73 patents registered. These are further sub-grouped into the preparation of the drugs, treatment, delivery technology, and detection method each with 14, 33, 13, and 13 patents, respectively. Endocannabinoids-related patents comprised of the CB1/2 receptor , TRPV1 , and GPR119 reviewed in . The category of cannabinoids consists of cannabinoid isolation, extraction, and synthesis or biosynthesis-related patents each with 6, 6, and 12 patents granted, respectively. For the division of the sequences, 15 patents are from enzyme inhibition followed by the gene and the protein each with two patents. Most of the patents are from the US followed by the GB and the other European countries Figure 2 . In addition, 25 patents for fiber/textile, 10 for foodstuff, 5 for the paper industry, 3 for architecture, 1 for biofuel, and 3 for plant breeding have been registered. Also, four patents each in the category of oil, extracts, and cosmetics each with four have been filed. However, we have to keep in mind that a certain cannabinoid invention can be referred into more than one patent category.

For instance, cannabinoids are highly hydrophobic by nature and thus they have low bioavailability in the human body. As a result, a new class of cannabinoid-glycosides has been created, whose representatives are produced through enzymatic glycosylation. This novel strategy led to increased aqueous solubility of the target cannabinoids and resulted in four patents . Recently a new method of producing one or more cannabosides by feeding an insect a cannabinoid was patented . These new classes of cannabinoid glycosides generated vast structural diversity and have greatly improved water solubility, enabling new pharmaceutical formulations, and multiple administration routes . The discovery of the genes encoding glycosyltransferases may belong to different categories of the cannabinoid patent family, that is, genes, enzymes, delivery technology, etc. The exponential enhancement of the patent number during recent years in the diverse areas of cannabinoid applications is indicative of the increased commercial interest in this class of natural compounds. The various pharmaceutical applications will continue to shape primarily the the path of the future invention cannabinoids.C. sativa has been well-known for the anti-inflammatory properties reviewed in .