Lucy Vost | Oxford Protein Informatics Group

As a reasonably new RDKit user, I was relieved to find that using its built-in functionality for generating basic images from molecules is quite easy to use. However, over time I have picked up some additional tricks to make the images generated slightly more pleasing on the eye!

The first of these (which I definitely stole from another blog post at some point…) is to ask it to produce SVG images rather than png:

#ensure the molecule visualisation uses svg rather than png format
IPythonConsole.ipython_useSVG=True

Now for something slightly more interesting: as a fragment elaborator, I often need to look at a long list of elaborations that have been made to a starting fragment. As these have usually been docked, these don’t look particularly nice when loaded straight into RDKit and drawn:

#load several mols from a single sdf file using SDMolSupplier
#add these to a list
elabs = [mol for mol in Chem.SDMolSupplier('frag2/elabsTestNoRefine_Docked_0.sdf')]

#get list of ligand efficiencies so these can be displayed alongside the molecules
LEs = [(float(mol.GetProp('Gold.PLP.Fitness'))/mol.GetNumHeavyAtoms()) for mol in elabs]

Draw.MolsToGridImage(elabs, legends = [str(LE) for LE in LEs])

Fig. 1: Images generated without doing any tinkering

Two quick changes that will immediately make this image more useful are aligning the elaborations by a supplied substructure (here I supplied the original fragment so that it’s always in the same place) and calculating the 2D coordinates of the molecules so we don’t see the twisty business happening in the bottom right of Fig. 1:

Continue reading →

Held annually in December, the Neural Information Processing Systems meetings aim to encourage researchers using machine learning techniques in their work – whether it be in economics, physics, or any number of fields – to get together to discuss their findings, hear from world-leading experts, and in many years past, ski. The virtual nature of this year’s conference had an enormously negative impact on attendees’ skiing experiences, but it nevertheless was a pleasure to attend – the machine learning in structural biology workshop, in particular, provided a useful overview of the hottest topics in the field, and of the methods that people are using to tackle them.

This year’s NeurIPS highlighted the growing interest in applying the newest Natural Language Processing (NLP) algorithms on proteins. This includes antibodies, as seen by two presentations in the MLSB workshop, which focused on using these algorithms for the discovery and design of antibodies. Ruffolo et al. presented their version of a BERT-inspired language model for antibodies. The purpose of such a model is to create representations that encapsulate all information of an antibody sequence, which can then be used to predict antibody properties. In their work, they showed how the representations could be used to predict high-redundancy sequences (a proxy for strong binders) and how continuous trajectories consistent with the number of mutations could be observed when using umap on the representations. While such representations can be used to predict properties of antibodies, another work by Shuai et al. instead focused on training a generative language model for antibodies, able to generate a region in an antibody based on the rest of the antibody. This can then potentially be used to generate new viable CDR regions of variable length, better than randomly mutating them.

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Author Archives: Lucy Vost

Viewing fragment elaborations in RDKit

NeurIPS 2021 Conference Feedback