Taking a drug from research to market is estimated to cost on average 2.6 billion dollars, and more than fifteen years on average. Currently, researchers are faced with the challenge of manually examining and testing thousands of different compounds to arrive at five or less that make it through to clinical trials and demonstrate anti-symptom, drug-like properties. For each compound that the researcher must test, they must facilitate the procurement of the compound, which can be incredibly difficult if the compound is novel. In this case, researchers need to manually create a retrosynthesis pathway for the compound.
The Solution: Synbiolic
In response to major inefficiencies involved in the process of drug discovery, we developed Synbiolic, a platform that leverages machine learning to accelerate rational drug design by generating novel molecules with user-specified properties and creating the retro-synthesis pathway.
Overview of Synbiolic’s Pipeline
Specifically, we are deploying a variational auto-encoder, a type of machine learning model that generates new data, trained on two separate dataset, a MUV (maximum unbiased validation) dataset of 74,000 molecules to validate the generalizability of our model and another dataset of 1 million molecules from ChEMBL to generate variations of molecular compounds that exhibit drug-like characteristics. The generated compounds are filtered by computing its quantitative estimation of drug-likeness (QED), and only choosing molecules with a high score (>0.5).
To generate molecules with the desired property, Synbiolic leverages a type of reinforcement learning approach called policy approximation to train the generate model to create molecules with specific property in the specified range. Synbiolic accomplishes so by designing effective reward functions that entice the model to generate models within desired property range.
To create retro-synthesis pathways to help facilitate the researchers to synthesize the generated compounds, Synbiolic employs the Monte Carlo tree search algorithm. This model is trained on over 1.2 M reactions gathered from a dataset of US Chemical Reaction Patents.
Our software is developed using Tensorflow (an open-source machine learning library), RDKit (an open-Source Cheminformatics Software), OpenChem. Using Synbiolic, researchers can greatly reduce inefficiencies associated with drug discovery, being able to create medicine faster and cheaper. Our goal with Synbiolic is to give everyone in the world access to medicine, reducing poverty, disease, and creating a better future for humanity.
Unique Value Proposition
The use of artificial intelligence in drug discovery is extremely new and none has yet to be implemented in most research institutions for drug discovery. Currently, there are only a few companies that are researching and developing such services like Cyclica which we’ve reached out to, Insilico Medicine, and etc. While companies do exist that use AI, Synbiolic’s approach is much different and creates a huge edge over competitors. The generative models used are unique to our project and can go toe-to-toe with other states of the art models, and often even outperform them. Other approaches to generate molecules such as leveraging only an encoding-decoding method fail to control the property of the generated molecules, in fact, these approaches are unable to control the property of molecules. Another approach that uses recurrent neural networks used by some of Synbiolic’s competitors in the AI and drug discovery space also fails to control the property of molecules well. Synbiolic uses a novel approach leveraging machine learning technologies such as reinforcement learning, variational autoencoders, and memory-augmented recurrent neural networks to construct our generative model which is capable of effectively controlling the property of generated molecules, thus its ability to generate molecules with desired effect.
For a more in-depth technical explanation of our project, visit our two medium articles below:
https://medium.com/datadriveninvestor/drug-design-made-fun-using-reinforcement-learning-212a4f867f33 --> Explains how we can leverage reinforcement learning to design novel molecules with the desired effect.
https://towardsdatascience.com/unlocking-drug-discovery-through-machine-learning-part-1-8b2a64333e07 --> Explains how we can use AI to generate novel compounds.
https://github.com/joeym-09/Leveraging-VAE-to-generate-molecules --> Starter Code for Part 1 of Synbiolic's pipeline, generating novel molecules.
https://github.com/aryanmisra/synbiolic --> Code for Part 2 of Synbiolic's pipeline, using reinforcement learning to output retrosynthesis pathways for synthesizing molecules.
Synbiolic’s team may be young, but we believe as a whole that age is “merely a number” and that we are committed to not be defined solely by our age. Our team consists of three core members: William Law, Aryan Misra, Sigil Wen. The three of us are currently a part of The Knowledge Society, the #1 human accelerator providing Olympic level training for youth with a mission of cultivating the next Elon Musks. We are collectively obsessed and passionate about leveraging exponential technologies to solve meaningful problems that will drive a positive impact in the world.
William Law is a 17-year-old software and machine learning developer for Seattle startups Collective Brains and WORKPLACE21. Over the past year, he has founded two startups, one in retail to create BPA-free phone cases where it has gone to achieve international sales, and started a FinTech startup aimed to automate certain processes of the accounting stack for early-stage startups. On the side, he is also creating a data platform to universally store and access medical images/records and is currently seeking funding from ventures like 1517 Fund. At the moment, William is working on probabilistic deep learning projects with Holy Grail AI to automate the R&D process using machine learning.
Aryan Misra is a 16-year old machine learning developer and synthetic biology enthusiast, revolutionizing the way personalized health treatments are created. This summer, he was heavily involved in the startup community at the DMZ growing Autophase: an AI startup working on redefining the way graphic design is being done. During this time, Aryan was also interning at Skintelligent: a Singapore-based AI company, where he developed deep learning models for advanced skin analysis. Aryan enjoys building interesting projects, developing his personal philosophy, learning from smart people and about the world around him. He has worked on numerous projects including telehomecare skin cancer detection algorithm and an analysis solution with ChenMed.
Sigil Wen is a 16-year old machine learning developer and adobe creative cloud polymath who is on the way of becoming a world class innovator. He is the cofounder of DeepDev, a no code machine learning platform and salutem.ca, an NLP physician chatbot with the goal of making healthcare accessible to everyone. In the past, he has built solutions to big problems using technologies such as Blockchain, Computer vision, Augmented Reality, and NLP. Combining his passion for machine learning with videography, Sigil placed top 30 out of 12,000 contestants from 200 countries in the 2019 Breakthrough Junior Challenge with his submission on Deep Learning with Neural Networks. Over the summer, Sigil attended LaunchX, an entrepreneurship program from MIT where he began his journey in the startup world. Currently, he is a video editor for Pioneer.app, the world's first fully remote startup accelerator and hopes to intern there this coming summer. (sigilwen.ca , https://www.linkedin.com/in/sigil-wen-081774163/)