Cannabis has been used for thousands of years medicinally and spiritually to provide relief for its patients. Now, as cannabis legalization is slowly sweeping over the country, and a new slew of potential users turn to the green plant for a solution to their ailments that may not have been alleviated by pharmaceuticals, we have to ask ourselves.

How do we prescribe a particular strain to a patient that will fulfill the patients needs the first time.

That is what I intend to find out using Natural Language Processing and a dataset composed of over 2,500 different strains, descriptions, ailments, types, flavors, and effects.

What my trial yielded were models that will return the most relevant strains based on what the patient inputs and the corresponding summary statistics

For Example:

  • A user would input their desired effects and ailments to treat, and the model would return the five closest strains that matched their input.

  • The model would also return a probability value of what type of strain they were looking for (Hybrid, Sativa, Indica, Hybrid-Indica, Hybrid-Sativa)

  • The model would also return the top 3 Flavors, Effects, and Ailments associated with the user’s input.

Word Embeddings

Here are some of the most common flavors, effects, and ailments associated with cannabis found in the data. The visuals use word embeddings to associate a number with each word using SpaCy’s pre-trained model to apply the appropriate weights to help accurately map the distances; each word was from one another.

The visuals showed a common association of words that we would expect to see — seeing these associations indicate that the model is associating the right words together correctly.

Effects of Cannabis

Flavors of Cannabis

Treated Ailments by Cannabis


The data is a merger of 2 datasets that contained the name of the strain, strain type, flavors, effects, ailments, and descriptions for most strains. Strains that did not have one of the features present instead contained an empty space.

Sample Row:

strain name type flavors effects ailments desciption
My Example Sativa Sweet, apple, honey, … Euphoric,Uplifted, Talkative, … Pain, Stress, Insomnia, … Example Desciption


The models are loaded into a simple flask API for use by anyone willing to send a JSON object to the destination addresses. Upon receipt of the input, the API would return a JSON output correlated to which endpoint of the API was being used.

Sample Input From Patient:

  "patient_input": "i want to feel happy and uplifted"

Resulting Output

        "ailments": "",
        "description": "Arabian Gold is a heavy sativa strain of mysterious origins. It may leave most consumers feeling mentally checked out, so you could feel foggy-headed and, for example, lose your train of thought when conversing with friends. ",
        "effects": "Giggly, Euphoric, Creative, Tingly, Sleepy",
        "flavor": "Tea",
        "strain": "Arabian Gold",
        "type": "Sativa"
        "ailments": "",
        "description": "Skunky Diesel is a nice indica-dominant strain that is a cross of Sensi Skunk with NYC Diesel.  A nice relaxing high that you will feel in your face pretty quickly.  She definitely carries the diesel taste and lovely diesel effects.",
        "effects": "Happy, Relaxed, Energetic, Uplifted, Sleepy",
        "flavor": "Skunk, Earthy, Diesel",
        "strain": "Skunky Diesel",
        "type": "Hybrid"
        "ailments": "",
        "description": "Purple Princess is not the girl from your average fairy tale. Thought to be the daughter of Cinderella 99 and Ice Princess, this hybrid has a habit of creeping up on you. Fruity and skunky, Purple Princess produces a medium-level effect. While it won’t leave you stuck on the couch, you will feel medicated. Purple Princess is characterized by dense, small, purple buds and typically flowers around 5-6 weeks.",
        "effects": "Happy, Euphoric, Relaxed, Arouse, Uplifted",
        "flavor": "Earthy, Pungent, Sweet",
        "strain": "Purple Princess",
        "type": "Hybrid"
        "ailments": "",
        "description": "While normally difficult to ignore, this Walrus is one sneaky gal. Upon first taste, many users may not feel the Walrus’s effects; however, give it a few minutes and this strain will surprise you in ways you never thought possible. Perhaps slightly indica-dominant, Walrus Kush may not be the best medication for getting things done. Fairly well-balanced, it’s a giggly and sociable strain while providing deep, body-relaxing effects at the same time. Great for users who suffer from stress or migraines, Walrus just might be worth a weekend trip to the dispensary.",
        "effects": "Happy, Giggly, Hungry, Creative, Uplifted",
        "flavor": "",
        "strain": "Walrus Kush",
        "type": "Hybrid"
        "ailments": "",
        "description": "A British Columbia native, Killer Queen is the outcome of an imaginative cross between G13 and Cinderella 99. Uplifting and thought-provoking, this hybrid is great for the workaholic who would like some daytime relief. The effects of this strain are felt most heavily in the face, eyes, and forehead, evidencing the sativa aspects of this hybrid. Upon first taste, Killer Queen takes up the fruity characteristics of Cinderella 99. The tropical flavor, however, is quickly followed by an earthy, herbal tone. If you are searching for an energizing strain that allows you to focus, Killer Queen may be just the perfect match.",
        "effects": "Happy, Energetic, Creative, Uplifted, Talkative",
        "flavor": "Tropical, Sweet, Citrus",
        "strain": "Killer Queen",
        "type": "Hybrid"

2nd Output

    "probability": "There is a 95.68% chance that your looking for a Hybrid",
    "top_ailments": ", Depression, Stress",
    "top_effects": "Happy, Uplifted, Relaxed",
    "top_flavors": "Earthy, Sweet, Pungent"


The first output returns the five closest strains based on the users’ input. If a patient wants something to fall asleep, then strains that serve that need should be shown first. The more descriptive the patients’ input, the higher the accuracy will be.

The second output returns a probability statistic of what type of strain most closely identifies with their description. It also takes the top 30 strains associated with the patients’ input and returns the top 3 flavors, effects, and ailments related to their description

Data Collection

The inputs received by the API are saved along with their corresponding results from both endpoints using a PostgreSQL database. The newly collected data will be used for topic modeling and model training with the hope of creating a better description of each strain.

The collected data contained the following features obtained from the 30 closest strains of each input received:

  • Patient Input
  • Index values
  • Names
  • Types
  • Effects
  • Ailments
  • Descriptions

Business Implementation

A dispensary could now use the model to recommend strains to new patients and give their consumer-facing employees a new tool to help find patients the right medicine for their needs. A business could also upload their data and recommend strains from their inventory instead. If a company has multiple locations, then recommended strains based on location could also be implemented. This model can also be used to recommend strains to patients based on user profiles also.


Given more time, I would source additional clinical data associated with each strain. The proof of concept displayed here serves a more recreational audience. Medicinal patients would need to know other features like THC, CBD, Terpenes, etc. to get a clear understanding of the medication. Talking with a domain expert would also be useful to examine the results; the model is outputting and making sure they are accurate recommendations for patients compared against alternate sources of data describing the same strains.

How to use

The API currently doesn’t have a frontend and is only accepting GET and POST request to these addresses:

Endpoint 1

  • This returns the 5 closest strains to the patients description.

Endpoint 2

  • This returns the summary statistics

Both endpoints are expecting the following input:

  "USER_INPUT_STRING": " <patients input> "

There is also a feature-limited patient-friendly way to use the application here. It only shows the closest strain to your description but gives a general idea of how the model works.