Monthly Archives: June 2021

One of my other hats – Covid-19 Response Director for UK research and innovation

The group asked me if I would tell them a little bit about one of my other hats at our regular Tuesday meeting, and this blog is about that.

In October 2019 I was seconded part-time to UKRI as the Deputy Executive Chair of the Engineering and Physical Sciences Research council (EPSRC). What is UKRI (UK research and Innovation)? It’s a non-departmental public body that funds research and innovation. It is made up of the seven disciplinary research councils (acronyms to please Tom – AHRC, BBSRC, EPSRC, ESRC, NERC, STFC and MRC), Research England, and the UK’s innovation agency, Innovate UK.

As Deputy Executive Chair of EPSRC I was helping with UKRI strategy, learning how a spending review round works, visiting universities to talk about how they could work better with UKRI – pretty much everything I was expecting to be doing. But like everyone, my world changed in early 2020.

Continue reading

How fast can a protein fold?

A protein’s folding time is the time required for it to reach its unique folded state starting from its unfolded ensemble. Globular, cytosolic proteins can only attain their intended biological function once they have folded. This means that protein folding times, which typically exceed the timescales of enzymatic reactions that proteins carry out by several orders of magnitude, are critical to determining when proteins become functional. Many scientists have worked tirelessly over the years to measure protein folding times, determine their theoretical bounds, and understand how they fit into biology. Here, I focus on one of the more interesting questions to fall out of this field over the years: how fast can a protein fold? Note that this is a very different question than asking “how fast do proteins fold?”

Continue reading

Out-of-distribution generalisation and scaffold splitting in molecular property prediction

The ability to successfully apply previously acquired knowledge to novel and unfamiliar situations is one of the main hallmarks of successful learning and general intelligence. This capability to effectively generalise is amongst the most desirable properties a prediction model (or a mind, for that matter) can have.

In supervised machine learning, the standard way to evaluate the generalisation power of a prediction model for a given task is to randomly split the whole available data set X into two sets – a training set X_{\text{train}} and a test set X_{\text{test}}. The model is then subsequently trained on the examples in the training set X_{\text{train}} and afterwards its prediction abilities are measured on the untouched examples in the test set X_{\text{test}} via a suitable performance metric.

Since in this scenario the model has never seen any of the examples in X_{\text{test}} during training, its performance on X_{\text{test}} must be indicative of its performance on novel data X_{\text{new}} which it will encounter in the future. Right?

Continue reading

Automated intermolecular interaction detection using the ODDT Python Module

Detecting intermolecular interactions is often one of the first steps when assessing the binding mode of a ligand. This usually involves the human researcher opening up a molecular viewer and checking the orientations of the ligand and protein functional groups, sometimes aided by the viewer’s own interaction detecting functionality. For looking at single digit numbers of structures, this approach works fairly well, especially as more experienced researchers can spot cases where the automated interaction detection has failed. When analysing tens or hundreds of binding sites, however, an automated way of detecting and recording interaction information for downstream processing is needed. When I had to do this recently, I used an open-source Python module called ODDT (Open Drug Discovery Toolkit, its full documentation can be found here).

My use case was fairly standard: starting with a list of holo protein structures as pdb files and their corresponding ligands in .sdf format, I wanted to detect any hydrogen bonds between a ligand and its native protein crystal structure. Specifically, I needed the number and name of the the interacting residue, its chain ID, and the name of the protein atom involved in the interaction. A general example on how to do this can be found in the ODDT documentation. Below, I show how I have used the code on PDB structure 1a9u.

Continue reading

The Smallest Allosteric System

Allostery is still a badly understood but very general mechanism in the protein world. In principle, an allosteric event occurs when a ligand (small or big) binds to a certain site of a protein and something (activity or function) changes at a different, distant site. A well-known example would be G-protein-coupled receptors that transport such an allosteric signal even across a membrane. But it does not have to be that far apart. As part of the Protein Folding and Dynamics series, I have recently watched a talk by Peter Hamm (Zurich) who presented work on an allosteric system that I thought was very interesting because it was small and most importantly, controllable.

PDZ domains are peptide-binding domains, often part of multi-domain proteins. For the work presented the researchers used the PDZ3 domain which is a bit special and has an additional (third) C-terminal α-helix (α3-helix) which is packing to the other side of the binding pocket. Previous work (Petit et al. 2009) had shown that removal of the α3-helix had changed ligand affinity but not PDZ structure, major changes were of an entropic nature instead. Peter Hamm’s group linked an azobenzene-derived photoswitch to that α3-helix; in its cis configuration stabilizing the α3-helix and destabilising in trans (see Figure 1).

Figure 1: PDZ3 domain (purple) and photoswitch (red) have different affinities for the peptide ligand (green), depending on the photoswitch’s isomerisation state (and temperature). From Bozovic, O., Jankovic, B. & Hamm, P. Sensing the allosteric force. Nat Commun 11, 5841 (2020). https://doi.org/10.1038/s41467-020-19689-7
Continue reading

How do I do regression when my predictors have multicollinearity?

A quick summary of the key idea of principal components regression (PCR), its advantages and extensions.

Sometimes we find ourselves in a dire situation. We have measured some response y and a set of predictors W. Unfortunately, W is a wide but short matrix, say 10×100 or worse 10×100000. We’ve made only 10 observations. Standard regression is simply not going to work, because W is singular. Some would say p is bigger than n.

So what can we do? Many of us would jump to LASSO or ridge regression. However, there is another way that is often overlooked.

Continue reading

Safety and sexism: the heroic stubbornness of Frances Oldham Kelsey

With covid-19 vaccine rollouts well underway the world over, the subject of clinical trials has been a focal point of discussion lately. Of course clinical trials are applicable to every drug, not just vaccines, and the class of molecules on which my own work focuses includes perhaps one of the most famous case studies of why clinical trials are necessary: thalidomide.

The teratogenic effects in unborn infants of this seemingly innocuous small molecule are well documented and infamous. But at the time of its initial use a treatment for morning sickness in the mid twentieth century, little was known about its mechanism of action. Only within the last 20 years has the molecular glue-type nature of thalidomide and its analogues (collectively known as immunomodulatory imide drugs, or IMIDs) become apparent. Armed with this knowledge, we know not only understand how thalidomide works in useful situations (such as curing cancer), but also how it exhibits its less desirable effects (recruiting SALL4 to the E3 ligase cereblon, leading to SALL4’s degradation and subsequent embryogenesis havoc).

Continue reading