FIRST LOOK: Leveraging Machine Learning for Personalized Cancer Treatments

ml 1

It is not obvious how decades of hard won biological knowledge can be incorporated into building machine learning algorithms for predicting how a drug will affect a patient. We study this problem from theoretical, computational, and practical angles.

Click here to watch Dr. Craft’s First Look presentation.

Our computational studies provide strong evidence that incorporating system knowledge, rather than using pure data driven machine learning approaches, offers substantial improvements to predictive accuracy. We describe open source machine learning datasets we have created for studying methods to incorporate prior knowledge and to assess the value in doing so. Boolean networks, graphs of interconnected logical (AND, OR) switches, represent complex dynamical systems and are used by systems biologists to model cellular signaling. We produce large random Boolean networks and compare machine learning methods which use knowledge of the underlying network connectivity with those that do not. We also use simulation models of biological processes (some taken from literature [e.g. a flowering time prediction problem], others created by our group [e.g. cellular response to DNA damage from radiation]) to generate datasets. We demonstrate in all cases that incorporating prior knowledge significantly enhances predictive capacity. Turning to real datasets, we present ideas and results for machine learning a cell line radiation sensitivity experiment. Prior knowledge here takes the form of expert gene selection, automated PubMed searches, Watson for Drug Discovery, and/or incorporating hierarchical biological information available at We argue that such knowledge incorporation is critical given the “large p small n” regime we are in: p=number of (genetic) parameters, 100s of thousands, and n=number of samples we have, usually in the 100s. The personalized cancer medicine problem is in its infancy due to the complexity of human cancers. This talk will reflect on what makes this problem so difficult and will give evidence that multidisciplinary efforts to include existing biological knowledge are of vital importance to developing a high quality clinical prediction tool.

For more information about Dr. Craft’s research, please contact Partners HealthCare Innovation by clicking here.

Example technique to build in prior knowledge via detailed simulations. ML=machine learning.

Cellular response to radiation model, used to generate data for machine learning algorithm testing.

Results demonstrating superiority of prior knowledge (here called SimKern, for simulation-based kernel learning) machine learning. NN=nearest neighbor, SVM=support vector machine, RF=random forest, RBF=radial basis function. Accuracy is classification accuracy, R2 is coefficient of determination.

Most Recent Posts:

At Globe Summit, Mass General Brigham panelists explore the impact of AI

Exploring AI’s Impact on Industry, left to right: panelists, Marc Succi, MD, Daniela Rus and Paul English,…

Read More

Perspectives on WMIF 2023 from Garibyan

Lilit Garibyan, MD, PhD, a dermatologist at the Wellman Center for Photomedicine and an associate professor of…

Read More
Blog Fortune IRA responses image

The Inflation Reduction Act: Responses From Experts on Implications for Biopharmaceutical Innovation

September 21, 2023—Two panel discussions on the Inflation Reduction Act at the World Medical Innovation Forum in…

Read More