FIRST LOOK: Leveraging Machine Learning for Personalized Cancer Treatments


It is not obvious how decades of hard won biological knowledge can be incorporated into building machine learning algorithms for predicting how a drug will affect a patient. We study this problem from theoretical, computational, and practical angles.

Click here to watch Dr. Craft’s First Look presentation.

Our computational studies provide strong evidence that incorporating system knowledge, rather than using pure data driven machine learning approaches, offers substantial improvements to predictive accuracy. We describe open source machine learning datasets we have created for studying methods to incorporate prior knowledge and to assess the value in doing so. Boolean networks, graphs of interconnected logical (AND, OR) switches, represent complex dynamical systems and are used by systems biologists to model cellular signaling. We produce large random Boolean networks and compare machine learning methods which use knowledge of the underlying network connectivity with those that do not. We also use simulation models of biological processes (some taken from literature [e.g. a flowering time prediction problem], others created by our group [e.g. cellular response to DNA damage from radiation]) to generate datasets. We demonstrate in all cases that incorporating prior knowledge significantly enhances predictive capacity. Turning to real datasets, we present ideas and results for machine learning a cell line radiation sensitivity experiment. Prior knowledge here takes the form of expert gene selection, automated PubMed searches, Watson for Drug Discovery, and/or incorporating hierarchical biological information available at We argue that such knowledge incorporation is critical given the “large p small n” regime we are in: p=number of (genetic) parameters, 100s of thousands, and n=number of samples we have, usually in the 100s. The personalized cancer medicine problem is in its infancy due to the complexity of human cancers. This talk will reflect on what makes this problem so difficult and will give evidence that multidisciplinary efforts to include existing biological knowledge are of vital importance to developing a high quality clinical prediction tool.

For more information about Dr. Craft’s research, please contact Partners HealthCare Innovation by clicking here.

Example technique to build in prior knowledge via detailed simulations. ML=machine learning.

Cellular response to radiation model, used to generate data for machine learning algorithm testing.

Results demonstrating superiority of prior knowledge (here called SimKern, for simulation-based kernel learning) machine learning. NN=nearest neighbor, SVM=support vector machine, RF=random forest, RBF=radial basis function. Accuracy is classification accuracy, R2 is coefficient of determination.

Most Recent Posts:

Blog CILP 2023 Cohort image

Meet the 2023 Commercialization and Inclusive Leadership Program Participants

Congratulations to all the participants chosen to attend the 2023 Mass General Brigham Commercialization and Inclusive Leadership…

Read More
Blog 2023 WMIF image

Connect with global leaders in healthcare innovation at the 2023 World Medical Innovation Forum June 12-14

The annual World Medical Innovation Forum is being held in person at the Westin Seaport District in…

Read More
Blog Employee Service Recognition image

Innovation Employees Celebrate Years of Service Milestones

More than 500 Mass General Brigham employees were honored in the recent 26th annual Enterprise Services Employee…

Read More