Held annually in December, the Neural Information Processing Systems meetings aim to encourage researchers using machine learning techniques in their work – whether it be in economics, physics, or any number of fields – to get together to discuss their findings, hear from world-leading experts, and in many years past, ski. The virtual nature of this year’s conference had an enormously negative impact on attendees’ skiing experiences, but it nevertheless was a pleasure to attend – the machine learning in structural biology workshop, in particular, provided a useful overview of the hottest topics in the field, and of the methods that people are using to tackle them.
This year’s NeurIPS highlighted the growing interest in applying the newest Natural Language Processing (NLP) algorithms on proteins. This includes antibodies, as seen by two presentations in the MLSB workshop, which focused on using these algorithms for the discovery and design of antibodies. Ruffolo et al. presented their version of a BERT-inspired language model for antibodies. The purpose of such a model is to create representations that encapsulate all information of an antibody sequence, which can then be used to predict antibody properties. In their work, they showed how the representations could be used to predict high-redundancy sequences (a proxy for strong binders) and how continuous trajectories consistent with the number of mutations could be observed when using umap on the representations. While such representations can be used to predict properties of antibodies, another work by Shuai et al. instead focused on training a generative language model for antibodies, able to generate a region in an antibody based on the rest of the antibody. This can then potentially be used to generate new viable CDR regions of variable length, better than randomly mutating them.
Continue reading