We've updated our Privacy Policy to make it clearer how we use your personal data.

We use cookies to provide you with a better experience, read our Cookie Policy

Analytical Cannabis Logo
Home > News > Science & Health > Content Piece

Machine Learning Could “Fill the Cannabis Knowledge Gap,” Say Researchers

By Alexander Beadle

Published: Sep 30, 2020   
Listen with
Register for FREE to listen to this article
Thank you. Listen to this article using the player above.

With such an intense focus placed on the THC and CBD by the general cannabis industry, it can be easy to overlook the fact that cannabis actually produces hundreds of other unique and interesting cannabinoids, plus a wealth of other biologically active compounds, such as terpenes.

These compounds are not normally produced to the same significant levels as THC or CBD and their exact concentration varies largely from strain to strain, but that doesn’t mean they can’t make their effects felt. One growing school of thought believes that interactions between these minor cannabinoids – the “entourage effect” – can meaningfully influence the effects felt by the user.

But despite the quiet power of these minor cannabinoids, scientists from the University of Colorado Boulder have revealed that the cannabis industry has only a limited amount of data on these compounds. In new research, published in PLOS ONE, they detail the extent of the unknowns and demonstrate how advanced machine learning techniques could bridge the gaps.

Understanding the unknowns

Dispensaries will often talk about cannabis strains belonging to one of two categories: sativa or indica, or possibly a hybrid of the two. Sativa-type cannabis, they may say, give users an energetic and uplifting high, whereas indica-type strains will have a more mellowing effect.

These sativa/indica classifications are largely rooted in the past geographic origin and lineage of each uniquely-named cannabis strain, and further scientific research has established these labels cannot be considered a proper chemo-taxonomical classification for the strains, due to the effects of generations of cultivation and strain hybridization. Yet, the naming practice persists.

Similarly, strain names often have little bearing on the cannabinoid content of a product or its reported effects. Any difference in effects or flavor is likely down to the concentration of these minor cannabinoids, terpenes, and flavonoids, and their interactions as per the entourage effect. The problem for cannabis scientists and users is that these minor compounds are not commonly analyzed and reported, which can make it tough to predict the effects of a given product.

“But because regulations only require reporting on a few compounds like THC and CBD, there’s very little data being collected on these other compounds or how they interact,” Daniela Vergara, a research associate at the University of Colorado Boulder and the lead author of this new study, said in a statement. “We’re not getting the whole picture.”

How many data is missing?

In the new UC Boulder study, Vergara and her team analyzed a dataset consisting of more than 17,600 cultivars of cannabis flower. The data covered eight years’ worth of cannabinoid testing results generated by its provider, Steep Hill Inc., one of the nation’s largest specialist cannabis testing facilities.

The researchers focused on seven well-known cannabinoids, four of which are part of the same biochemical pathway: THC, CBD, cannabigerol (CBG), cannabichromene (CBC). Cannabinol (CBN) was also selected as it is a breakdown product of THC, and tetrahydrocannabivarin (THCV) and cannabidivarin (CBDV) were included due to their supposed medicinal properties, despite their biochemical pathways being unknown.

The team found that only 1.4 percent of cultivars were missing information about THC content – an unsurprising finding, given that regulatory jurisdictions almost always require testing for THC and its acid form, THCA. Similarly, as interest has grown in the therapeutic effects of CBD, the amount of missing data for CBD and CBDA was relatively low, at around 38 percent.

More dramatic amounts of missing data were seen in the cases of CBN (60.3 percent missing), CBC (63.1 percent), THCV (81.2 percent), and CBDV (96.6 percent). Across the entire 17,600-sample dataset, only 153 cultivars included data on all seven cannabinoids of interest.

Machine learning for cannabis science

How are scientists to overcome the hurdles presented by these missing data? The UC Boulder scientists believe that data science and machine learning could be key.

“We thought that data science methods could help with what is fundamentally a missing data problem,” said Keegan. “Could we use the data we have about the chemical profiles of some strains to impute, or guess, the values of those where we have no data?”

By applying machine learning techniques to interpolate the missing data, the UC Boulder team set out to try and uncover hidden patterns and clusters in the cultivar database. Instead, they very quickly realized that one of their key assumptions had been wrong.

As THC and CBD are both formed using the same biochemical pathway, it had been assumed that strains high in THC would be low in CBD, and vice versa.

“It didn’t turn out that that way,” said Keegan, noting that some strains were high in both. “This suggests we don’t know as much about these chemical pathways as we thought we did.”

With this in mind, the analysts grouped the strains into four clusters based on distinct chemical properties. Each cluster corresponded with a different intended use: recreational, medicinal, combined, or industrial.

“This study reaffirms the misnaming of cannabis varieties by the industry, since strain identity cannot be predicted according to the clustering groups, even though the clusters are reflective of the chemotype” the authors wrote. “Strain name is not indicative of potency or overall chemical makeup.”

The UC Boulder scientists believe that machine learning and advanced data science techniques could fill the current knowledge gap and improve researchers’ understanding of the plant, but it would require more data sharing and collaboration from within the cannabis industry. Keegan says he can even envision a day where it is possible to develop custom strains for medicinal use based on accurate predictions of how the cannabinoids present in a sample might interact, and where strain names could be verified and standardized.

“Machine learning has played a huge role in shaping other industries, from Facebook and Twitter to Target,” said Vergara. “It can help fill in the blanks for the cannabis industry as well.”


Like what you just read? You can find similar content on the topic tags shown below.

Science & Health Testing

Stay connected with the latest news in cannabis extraction, science and testing

Get the latest news with the FREE weekly Analytical Cannabis newsletter