Category Archives: Small Molecules

Sort and Slice Tutorial – An alternative to extended connectivity fingerprints

Background¶

Sort and Slice (SNS) was developed by a former OPIGlet, Markus, as a method for improving Extended Connectivity Fingerprints (ECFPs) by overcoming bit collisions. ECFPs are a form of topological fingerprint which denote the absence and presence of circular substructures in a molecule. The steps for deriving an ECFP from a molecule are as follows:

Identifier assignment:

Each atom in the molecule is assigned an initial numerical identifier; this is typically generated by hashing a tuple of atomic properties called Daylight atomic invariants into a 32-bit integer. These properties are:
1. Number of non-hydrogen neighbours.
2. Valence – number of neighbouring hydrogens.
3. Atomic number.
4. Atomic mass.
5. Atomic charge.
6. Number of hydrogen neighbours.
7. Ring membership.*
*Ring membership is an additional property that is often used but is not one of the original Daylight atomic invariants.

Continue reading →

Interactive visualization of protein–ligand complexes with Py3Dmol

I recently had a problem where I wanted to provide an interactive visualization of multiple different protein–ligand complexes, requiring minimal setup by the user, allowing them to zoom in and out and change the visualization style, without just providing multiple PDB files or a PyMOL session.

Continue reading →

Comparing pose and affinity prediction methods for follow-up designs from fragments

In any task in the realm of virtual screening, there need to be many filters applied to a dataset of ligands to downselect the ‘best’ ones on a number of parameters to produce a manageable size. One popular filter is if a compound has a physical pose and good affinity as predicted by tools such as docking or energy minimisation. In my pipeline for downselecting elaborations of compounds proposed as fragment follow-ups, I calculate the pose and ΔΔG by energy minimizing the ligand with atom restraints to matching atoms in the fragment inspiration. I either use RDKit using its MMFF94 forcefield or PyRosetta using its ref2015 scorefunction, all made possible by the lovely tool Fragmenstein.

With RDKit as the minimizer the protein neighborhood around the ligand is fixed and placements take on average 21s whereas with PyRosetta placements, they take on average 238s (and I can run placements in parallel luckily). I would ideally like to use RDKit as the placement method since it is so fast and I would like to perform 500K within a few days but, I wanted to confirm that RDKit is ‘good enough’ compared to the slightly more rigorous tool PyRosetta (it allows residues to relax and samples more conformations with the longer runtime I think).

Continue reading →

Fine-tune generated molecular poses with a force field

Some molecular pose generation methods benefit from an energy relaxation post-processing step.

Predicted pose before energy minimization — Example of a small molecule pose before and after energy minimization. The pose before minimization is shown in white, the optimized prediction is shown in pink, and a crystal pose is shown as reference in light blue. Note how the aromatic rings are flattened and the leftmost bond is shortened by the optimization.

Here is a quick way to do this using OpenMM via a short script I prepared:

Continue reading →

RSC Fragments 2024

I attended RSC Fragments 2024 (Hinxton, 4–5 March 2024), a conference dedicated to fragment-based drug discovery. The various talks were really good, because they gave overviews of projects involving teams across long stretches of time. As a result there were no slides discussing wet lab protocol optimisations and not a single Western blot was seen. The focus was primarily either illustrating a discovery platform or recounting a declassified campaign. The latter were interesting, although I’d admit I wish there had been more talk of organic chemistry —there was not a single moan/gloat about a yield. This top-down focus was nice as topics kept overlapping, namely:

Target choice,
covalents,
molecular glues,
whether to escape Flatland,
thermodynamics, and
cryptic pockets

Continue reading →

Taking Equivariance in deep learning for a spin?

I recently went to Sheh Zaidi‘s brilliant introduction to Equivariance and Spherical Harmonics and I thought it would be useful to cement my understanding of it with a practical example. In this blog post I’m going to start with serotonin in two coordinate frames, and build a small equivariant neural network that featurises it.

Continue reading →

Finding and testing a reaction SMARTS pattern for any reaction

Have you ever needed to find a reaction SMARTS pattern for a certain reaction but don’t have it already written out? Do you have a reaction SMARTS pattern but need to test it on a set of reactants and products to make sure it transforms them correctly and doesn’t allow for odd reactants to work? I recently did and I spent some time developing functions that can:

Generate a reaction SMARTS for a reaction given two reactants, a product, and a reaction name.
Check the reaction SMARTS on a list of reactants and products that have the same reaction name.

Continue reading →

Online tools for drawing and visualizing molecules

I recently came across a nice tool for depicting multiple molecules called CDK Depict (thanks to Ruben for sending it to me), so I decided to explore what other web-based molecule visualization and drawing tools are available.

Continue reading →

The workings of Fragmenstein’s RDKit neighbour-aware minimisation

Fragmenstein is a Python module that combine hits or position a derivative following given templates by being very strict in obeying them. This is done by creating a “monster”, a compound that has the atomic positions of the templates, which then reanimated by very strict energy minimisation. This is done in two steps, first in RDKit with an extracted frozen neighbourhood and then in PyRosetta within a flexible protein. The mapping for both combinations and placements are complicated, but I will focus here on a particular step the minimisation, primarily in answer to an enquiry, namely how does the RDKit minimisation work.

Continue reading →

Demystifying the thermodynamics of ligand binding

Chemoinformatics uses a curious jumble of terms from thermodynamics, wet-lab techniques and statistical terminology, which is at its most jarring, it could be argued, in machine learning. In some datasets one often sees pIC50, pEC50, pKi and pKD, in discussion sections a medchemist may talk casually of entropy, whereas in the world of molecular mechanics everything is internal energy. Herein I hope to address some common misconceptions and unify these concepts.

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Category Archives: Small Molecules

Sort and Slice Tutorial – An alternative to extended connectivity fingerprints

Background¶

Interactive visualization of protein–ligand complexes with Py3Dmol

Comparing pose and affinity prediction methods for follow-up designs from fragments

Fine-tune generated molecular poses with a force field

RSC Fragments 2024

Taking Equivariance in deep learning for a spin?

Finding and testing a reaction SMARTS pattern for any reaction

Online tools for drawing and visualizing molecules

The workings of Fragmenstein’s RDKit neighbour-aware minimisation

Demystifying the thermodynamics of ligand binding