Cannabis Genome Mapping Beats Human Genome Project Benchmark
In 2011, Medicinal Genomics became the first company to sequence the cannabis genome. At the time Medicinal Genomics’ CEO, Kevin McKernan said that he was inspired by several of his friends who had been diagnosed with cancer and were following developments in medicinal cannabis research. McKernan felt that the complete sequencing and open-access publishing of the cannabis genome could enable researchers in countries where the plant material itself is illegal to study the drug and that creating this wider pool of knowledge could expedite the development of novel cannabis-based therapeutics.
Genome mapping records
While McKernan’s company did indeed make history by sequencing the genome of Cannabis sativa, due to the sequencing equipment available to the researchers at the time, the cannabis genome fell short of the standards set by the Human Genome Project (HGP) a decade earlier. These standards are usually reported as the N50 of a genome, which is a statistical expression of the average length of a set of sequences. For examples, the HGP reported an N50 of 500kb, meaning that 50% of the genome fragments in the HGP sequence had a length of more than 500,000 base pairs.
Until now, this 500kb N50 standard had been unattainable for cannabis science researchers in their attempts to sequence the cannabis genome to the same level of detail. In early August 2018, Medicinal Genomics and their financial backer, Dash, announced that they had successfully sequenced the cannabis genome at 640kb N50, representing a 28% improvement on the HGP standard. Using technology pioneered at Pacific Biosciences, the Medicinal Genomics team are predicting that an N50 of 1Mb could be achievable within a year.
Why does genome fragment length matter?
Having longer DNA sequences is incredibly helpful when it comes to analyzing the information that is stored within the genome accurately. In an interview with Dash’s public news service, Dash Force News, McKernan illustrated the importance of contiguity for accurate interpretation.
“Imagine trying to read a book where the chapters are all shuffled. You can make a lot of false narratives. Take this example: “The”, “Theirs”, “The IRS”. Three very different meanings. Expand this to paragraphs and chapters and it’s a big deal. We used to have the genome continuous enough to make 2-3000 letter words, but now we can have it in full 640,000 chapters.”
Being able to read the cannabis genome in these long ‘chapters’ is extremely useful when it comes to dealing with polymorphism in genomes. With older genome sequencing techniques which couldn’t achieve as large an N50, the amount of polymorphism in a genome was a major contributor to how difficult a genome was to sequence. In humans, you’ll find a polymorphism rate of 1 in 1000 bases, but in the cannabis genome this is an order of magnitude larger, giving a polymorphism rate of 1 in 100 which underlies the huge difficulty gap when it comes to sequencing cannabis. Polymorphic gene bases disrupt the normal gene sequence expected from the mother and father genome, and so affect how easily a computer system can recognize the proper patterns from reconstruction. The creation of a cannabis sequence with a far larger N50 has minimized the statistical effect of the polymorphic genes and made it far easier for computers to parse and reconstruct the cannabis genome.
Impact outside of the laboratory
As the accuracy of cannabis sequencing increases, so too does the number of potential applications. What initially started as a mission to make cannabis research more accessible may now also reform the way in which cannabis strains are patented. The precision achieved by this improved sequencing may allow cannabis breeders to more easily prove that they have produced unique strains during the process of obtaining a patent for the strain. There may also be purview for using this more accurate analysis to fight unjustly issued patents by sequencing strains cultivated by small-scale cannabis breeders that pre-date an existing established patent. Genomic sequencing could provide a way of definitively proving if two strains are similar enough to challenge the patent.
This entrepreneurial support is a key objective for Dash, who aim to revolutionize the current scientific publishing market by using their cryptocurrency platform to create a crypto-incentivized and crypto-recorded peer review system. By using a crypto-incentivized model it is hoped that reviewers with a good reputation will be more fairly compensated for their work than in the current science publishing model. The crypto-recording system would ensure that research data associated with the cannabis genome would remain in the public archive for years to come with no risk of data decay.