The State of Computational Protein Design | Oxford Protein Informatics Group

Last month, I had the privilege to attend the Keystone Symposium on Computational Design and Modeling of Biomolecules in beautiful Banff, Canada. This conference gave an incredible insight into the current state of the protein design field, as we are on the precipice of advances catalyzed by deep learning.

Here are my key takeaways from the conference:

Advances

Deep learning is poised to have a enormous impact on protein design. This has been an active field for many years now, but RFDiffusion (David Baker lab) represents a notable breakthrough given the level of experimental validation and unprecedented success rates in designing proteins for a range of applications [1]. (And the code has been released!). The conference also showed how existing models are beginning to trickle through the community and become widely adapted, such as ProteinMPNN (also Baker lab) for inverse folding [2]. This being said, it also remained clear that physics/energy-based methods, such as Rosetta, and rational design [3] are still relevant and widely used.
There were many success stories of designing binders against proteins and small molecules [e.g. 4-7] (from the groups of Bruno Correia, Po-Ssu Huang, Nick Polizzi, Bill DeGrado and Birte Höcker).
Conformational flexibility is increasingly being considered in design. It is no secret that proteins move – and that this movement is important to their functions – however flexibility has often been poorly understood and overlooked. There are research groups now considering conformational ensembles and allostery in enzyme design (Roberto Chica) [8], incorporating flexibility into model training (Po-Ssu Huang) [5] and designing proteins with multiple low-energy states (Philip Leung, David Baker) [9]. Tanja Kortemme and Anum Glasgow also showed beautiful work on understanding flexibility and allostery in protein systems [10].

Unsolved challenges and next steps

Data availability! As we move into a machine learning era, there will be an even greater need for large amounts of high-quality data to train models on. This was highlighted by speakers such as Timothy Whitehead and Gabriel Rocklin, and I hope to continue to see the community come together to produce and open-source high-throughput data, as the Rocklin group has done [11].
While advances are being made with respect to conformational flexibility, as discussed above, there is much left to do. It feels as though we (and deep learning models) are still far away from being able to understand, or even measure and quantify, protein flexibility at a systems (rather than protein-specific) level. This will be an essential first step to be able to train models to predict and design flexibility.
Water molecules are also often overlooked in design, deep learning and other protein research, as highlighted by Stephanie Wankowicz, and can have essential roles in protein function, as highlighted by Huong Kratochvil [12].
Although deep learning models are enabling exciting advances in protein design, there are still limitations in their ability to understand physics. This will hinder the accurate design of, for example, polar interfaces, conformational flexibility and water molecule-mediated interactions.

This is just a snapshot – there were so many excellent talks and posters, as well as unpublished work, that I couldn’t name them all here.

Thank you very much to the conference organizers for putting together such a fantastic conference, and for supporting my travel with a Keystone Symposia Future of Science Fund scholarship.