Machine Learning for the Future of Structural Biology

Dr. Kevin M. Dalton, Harvard University
- | Leidy 109
Photo: Dr. Kevin Dalton

Abstract: As exemplified by DNA replication, transcription, translation, cell-signaling, and enzyme catalysis, every facet of life requires proteins and nucleic acids to dynamically sample multiple conformations in a complex, non-equilibrium ballet. The major challenge facing structural biology in the 21st century is to understand these dynamics by animating the static structures of the previous century. The reward will be a rich model of macromolecular physics to enable the design of proteins and nucleic acids with bespoke dynamics which would have a transformative impact on energy, environmental remediation, and health. Realizing this aim will require a mixture of new experimental methods, data analysis algorithms, and first-principles simulations which will be used to assemble a library of structural dynamics observations, a 4-dimensional Protein Data Bank. Whereas today’s PDB enabled predictive models of static structure like AlphaFold, a dynamics database will lead to models that can predict and design macromolecular dynamics. Several experimental strategies have been developed to record molecular motions. However, biophysical measurement of dynamics with atomic resolution is still quite challenging.

During my postdoc, I focused on developing new algorithms to alleviate this difficulty. I will present a theoretical breakthrough by which motions can be inferred from X-ray crystallography experiments with higher resolution and signal-to-noise than previous methods. This method relies on the recent advances in statistical inference and deep-learning. I will present applications to biological systems including the essential metabolic enzyme (dihydrofolate reductase), a Parkinson’s related protein (DJ-1), and a light-sensitive protein (sensory rhodopsin-II). I will close by contextualizing my results within the future of structural biology. I speculate the current paradigm wherein predictive models of protein structure are accelerating structure determination will be extended to address dynamics. My approach, along with others, will be integrated with predictive models in order to deliver routine observations of dynamics conditioned on arbitrary biophysical measurements.