Author Archives: Susan Leung

When Does Chemical Elaboration Induce a Ligand To Change Its Binding Mode?

When Does Chemical Elaboration Induce a Ligand To Change Its Binding Mode?

For my journal club in June, I chose to present a Journal of Medicinal Chemistry article entitled “When Does Chemical Elaboration Induce a Ligand To Change Its Binding Mode?” by Malhotra and Karanicolas. This article uses a large scale collection of ligand pairs to investigate the circumstances in which elaborations of a ligand change the original binding mode.

One of the primary goals in medicinal chemistry is the optimisation of biological activity by chemical elaboration of a hit compound. This hit-to-lead optimisation often assumes that addition of functional groups to a given hit scaffold will not change the original binding mode.

In order to investigate the circumstances in which this assumption holds true and how often it holds true, they built up a large-scale collection of 297 related ligand pairs solved in complex with the same protein partner. Each pair consisted of a larger and smaller ligand; the larger ligand could have arisen from elaboration of the smaller ligand. They found that for 41 out of the 297 pairs (14%), the binding mode changed upon elaboration of the smaller ligand.

They investigated many physicochemical properties of the ligand, the protein-ligand complex and the protein binding pocket. They summarise the statistical significance and predictive power of the investigated properties with the table shown below.

They found that the property with the lowest p-value was the “rmsd after minimisation of the aligned complex” (RMAC). They developed this metric to probe whether the larger ligand could be accommodated in the protein without changing binding mode. They did so by aligning the shared substructure of the larger ligand onto the smaller ligand’s complex and then carrying out an energy minimisation. By monitoring the RMSD difference of the larger ligand relative to the initial pose (RMAC), they can gauge how compatible the larger ligand is with the protein. Larger RMAC values indicate greater incompatibility, hence a greater likelihood for the binding mode to not be preserved.

The authors generated receiver operating characteristic (ROC) plots to compare the predictive power of the properties considered. ROC curves are made by plotting the true positive rate (TPR) against the false positive rate (FPR). A random classifier would yield the dotted line from the bottom left to the top right, shown in the plots below. The best predictors would give a point in the top left corner of the plot. The properties that do well include RMAC, pocket volume, molecular weight, lipophilicity and potency.

They also combined properties to enhance predictive power and conclude that RMAC and molecular weight together offers good predictivity.Finally, the authors look at the pairs that have low RMAC values (i.e. the elaboration should be compatible with the protein pocket), yet show a change in binding mode. For these cases, a specific substitution may enable formation of a new, stronger interaction or for pseudosymmetric ligands, the alternate pose can mimic many of the interactions of the original pose.

Using RDKit to load ligand SDFs into Pandas DataFrames

If you have downloaded lots of ligand SDF files from the PDB, then a good way of viewing/comparing all their properties would be to load it into a Pandas DataFrame.

RDKit has a very handy function just for this – it’s found under the PandasTool module.

I show an example below within Jupypter-notebook, in which I load in the SDF file, view the table of molecules and perform other RDKit functions to the molecules.

First import the PandasTools module:

from rdkit.Chem import PandasTools

Read in the SDF file:

SDFFile = "./Ligands_noHydrogens_noMissing_59_Instances.sdf"
BRDLigs = PandasTools.LoadSDF(SDFFile)

You can see the whole table by calling the dataframe:


The ligand properties in the SDF file are stored as columns. You can view what these properties are, and in my case I have loaded 59 ligands each having up to 26 properties:

It is also very easy to perform other RDKit functions on the dataframe. For instance, I noticed there is no heavy atom column, so I added my own called ‘NumHeavyAtoms’:

BRDLigs['NumHeavyAtoms']=BRDLigs.apply(lambda x: x['ROMol'].GetNumHeavyAtoms(), axis=1)

Here is the column added to the table, alongside columns containing the molecules’ SMILES and RDKit molecule: