FIRST LOOK: Leveraging Machine Learning for Personalized Cancer Treatments

ml 1

It is not obvious how decades of hard won biological knowledge can be incorporated into building machine learning algorithms for predicting how a drug will affect a patient. We study this problem from theoretical, computational, and practical angles.

Click here to watch Dr. Craft’s First Look presentation.

Our computational studies provide strong evidence that incorporating system knowledge, rather than using pure data driven machine learning approaches, offers substantial improvements to predictive accuracy. We describe open source machine learning datasets we have created for studying methods to incorporate prior knowledge and to assess the value in doing so. Boolean networks, graphs of interconnected logical (AND, OR) switches, represent complex dynamical systems and are used by systems biologists to model cellular signaling. We produce large random Boolean networks and compare machine learning methods which use knowledge of the underlying network connectivity with those that do not. We also use simulation models of biological processes (some taken from literature [e.g. a flowering time prediction problem], others created by our group [e.g. cellular response to DNA damage from radiation]) to generate datasets. We demonstrate in all cases that incorporating prior knowledge significantly enhances predictive capacity. Turning to real datasets, we present ideas and results for machine learning a cell line radiation sensitivity experiment. Prior knowledge here takes the form of expert gene selection, automated PubMed searches, Watson for Drug Discovery, and/or incorporating hierarchical biological information available at We argue that such knowledge incorporation is critical given the “large p small n” regime we are in: p=number of (genetic) parameters, 100s of thousands, and n=number of samples we have, usually in the 100s. The personalized cancer medicine problem is in its infancy due to the complexity of human cancers. This talk will reflect on what makes this problem so difficult and will give evidence that multidisciplinary efforts to include existing biological knowledge are of vital importance to developing a high quality clinical prediction tool.

For more information about Dr. Craft’s research, please contact Partners HealthCare Innovation by clicking here.

Example technique to build in prior knowledge via detailed simulations. ML=machine learning.

Cellular response to radiation model, used to generate data for machine learning algorithm testing.

Results demonstrating superiority of prior knowledge (here called SimKern, for simulation-based kernel learning) machine learning. NN=nearest neighbor, SVM=support vector machine, RF=random forest, RBF=radial basis function. Accuracy is classification accuracy, R2 is coefficient of determination.

Most Recent Posts:

Blog Chen Molecule Unlocking Glaucoma image

Mass Eye and Ear Scientist Identifies Molecule Unlocking Glaucoma

Potential Cure May Impact Other Neurodegenerative Disease Therapies   There is no cure (yet). Glaucoma often robs…

Read More

First Brigham Ignite Awards of the Year Announced

The Two Projects Focus on Robotic AI-Guided Intubation and Gamma Delta T Cell Therapy for Solid Tumors…

Read More
Blog Barriers to Innovation image

‘Barriers to Innovation’ Survey Shows Improvement in Underrepresented Faculty’s Perception as Innovators

Findings inform educational programs like CILP, next slated for May 15   Progress is being made in…

Read More