Category Archives: Protein-Ligand Docking

Analyzing AlphaFold 3’s Diffusion Trajectory

A useful way to understand AlphaFold 3’s sampling behavior is to look not only at the final predicted structure, but at what happens along the reverse diffusion trajectory itself. If we track quantities such as the physical energy of samples, noise scale, and update magnitude over time, a very clear pattern emerges: structures remain physically imperfect for most of sampling, and only take proper global shape in the final low-noise steps.

This behavior is a result of the diffusion procedure implemented in Algorithm 18, Sample Diffusion, which follows an EDM-style sampler with churn. Rather than simply marching monotonically from noise to structure, the sampler repeatedly perturbs the current coordinates, denoises them, and then takes a Euler-like update step. Because of the churn mechanism, AlphaFold 3 deliberately injects additional noise during part of the trajectory, which encourages exploration but also delays local geometric convergence. This mechanism is shown in step 4 -7 of the Sample Diffusion Algorithm from Alphafold3 Supplementary Information.

Continue reading

SigmaDock: untwisting molecular docking with fragment-based SE(3) diffusion

Alvaro Prat, Leo Zhang, Charlotte Deane, Yee Whye Teh, & Garrett M. Morris
International Conference On Learning Representations (ICLR 2026)

Molecular docking sits at the heart of structure-based drug discovery. If we can reliably predict how a small molecule binds in a protein pocket, we can prioritize compounds faster, reason about interactions more clearly, and build better pipelines for hit discovery and lead optimization. But in practice, docking is still a difficult problem: classical methods are often robust but imperfect, while recent deep learning approaches have sometimes looked promising on headline metrics without consistently producing chemically plausible poses.

SigmaDock was built to address exactly that gap. Instead of treating docking as a problem of directly diffusing on torsion angles or unconstrained atomic coordinates, SigmaDock represents ligands as collections of rigid fragments and learns how to reassemble them inside the binding pocket using diffusion on SE(3)\text{SE}(3). In plain English: rather than trying to “wiggle” every flexible degree of freedom in a tangled way, SigmaDock breaks the ligand into chemically meaningful rigid pieces and learns where those pieces should go, and how they should reorient, to recover a valid bound pose.

Figure 1: Illustration of SigmaDock using PDB 1V4S and ligand MRK. We create an initial conformation of a query ligand where we define our mm rigid body fragments (colour coded). The corresponding forward diffusion process operates in SE(3)m\text{SE}(3)^m via independent roto-translations.
Continue reading

Can we make Boltz predict allosteric binding?

Orthosteric vs Allosteric binding (Nano Banana generated)

(While this post is meant to shed light on the problem of making AI structure prediction models like Boltz become better for allosteric binding, it is also an open call for collaborating on this problem.)

I recently took part in a Boltz hackathon organised by the MIT Jameel Clinic. I worked on improving Boltz 2 predictions for allosteric binders. The validation dataset provided was from a recent paper, Co-folding, the future of docking – prediction of allosteric and orthosteric ligands, which benchmarks some of the recent state-of-the-art AI structure prediction models on a curated set of allosteric and orthosteric binders. Generally, all AI structure prediction models are trained mostly on orthosteric binding cases, which means that their performance on allosteric binding is significantly worse.

Continue reading

How reliable are affinity datasets in practice?

The Data Bottleneck in AI-Powered Drug Discovery

The pharmaceutical industry is undergoing a profound transformation, driven by the promise of Artificial Intelligence (AI) and Machine Learning (ML). These technologies offer the potential to escape the industry’s persistent challenges of high costs, protracted development timelines, and staggering failure rates. From accelerating the identification of novel biological targets to optimizing the properties of lead compounds, AI is poised to enhance the precision and efficiency of drug discovery at nearly every stage

Yet, this revolutionary potential is constrained by a fundamental dependency. The power of modern AI, particularly the deep learning (DL) models that excel at complex pattern recognition, is directly proportional to the volume, diversity, and quality of the data they are trained on. This creates a critical bottleneck: the high-quality experimental data required to train these models—specifically, the protein-ligand binding affinity values that quantify the strength of an interaction—are notoriously scarce, expensive to generate, and often of inconsistent quality or locked within proprietary databases.

Continue reading

A more robust way to split data for protein-ligand tasks?

As I was recently reading through the paper on the PLINDER dataset while preparing for my next project, one of the aspects of the dataset that caught my attention was how the dataset splits were done to ensure minimal leakage for various protein-ligand tasks that PLINDER could be used for. They had task-specific splits as the notion of data leakage differed from task to task. For instance, in rigid body docking, having a similar protein in the train and test may not be considered leakage if the binding pocket location, conformation, or pocket interactions with a ligand are significantly different. On the other hand, in the case of co-folding, having similar proteins in the train and test sets would be considered data leakage, as predicted protein structures play a significant role in accuracy scoring. The effort that went into creating task-specific splits resonates strongly with OPIG’s view on ensuring minimal data leakage for validating the generalisability of protein-ligand models. However, it may become tedious to create task-specific dataset splits for every protein-ligand task when dealing with a large suite of such tasks. This had me thinking of potential avenues to streamline the dataset split process across the tasks, and one way to do this is by using protein-ligand interaction fingerprints or PLIFs.

Continue reading

Pose Prediction: Does Your Model Generalize? The Role of Data Similarity

In our recent work with the PoseBusters benchmark, we made a deliberate choice: to include both receptors seen during training and completely novel ones. Why? To explore an often-overlooked question: how much does receptor similarity to training data influence model performance?

Continue reading

Featurisation is Key: One Version Change that Halved DiffDock’s Performance

1. Introduction 

Molecular docking with graph neural networks works by representing the molecules as featurized graphs. In DiffDock, each ligand becomes a graph of atoms (nodes) and bonds (edges), with features assigned to every atom using chemical properties such as atom type, implicit valence and formal charge. 
 
We recently discovered that a change in RDKit versions significantly reduces performance on the PoseBusters benchmark, due to changes in the “implicit valence” feauture. This post walks through: 

  • How DiffDock featurises ligands 
  • What happened when we upgraded RDKit 2022.03.3 → 2025.03.1 
  • Why training with zero-only features and testing on non-zero features is so bad 

TL:DR: Use the dependencies listed in the environment.yml file, especially in the case of DiffDock, or your performance could half!  

Continue reading

Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data

I’m delighted to report our collaboration (Ísak Valsson, Matthew Warren, Aniket Magarkar, Phil Biggin, & Charlotte Deane), on “Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data”, has been published in Nature’s Communications Chemistry (https://doi.org/10.1038/s42004-025-01428-y).


During his MSc dissertation project in the Department of Statistics, University of Oxford, OPIG member Ísak Valsson developed an attention-based GNN to predict protein-ligand binding affinity called “AEV-PLIG”. It featurizes a ligand’s atoms using Atomic Environment Vectors to describe the Protein-Ligand Interactions found in a 3D protein-ligand complex. AEV-PLIG is free and open source (BSD 3-Clause), available from GitHub at https://github.com/oxpig/AEV-PLIG, and forked at https://github.com/bigginlab/AEV-PLIG.

Continue reading

Comparing pose and affinity prediction methods for follow-up designs from fragments

In any task in the realm of virtual screening, there need to be many filters applied to a dataset of ligands to downselect the ‘best’ ones on a number of parameters to produce a manageable size. One popular filter is if a compound has a physical pose and good affinity as predicted by tools such as docking or energy minimisation. In my pipeline for downselecting elaborations of compounds proposed as fragment follow-ups, I calculate the pose and ΔΔG by energy minimizing the ligand with atom restraints to matching atoms in the fragment inspiration. I either use RDKit using its MMFF94 forcefield or PyRosetta using its ref2015 scorefunction, all made possible by the lovely tool Fragmenstein.

With RDKit as the minimizer the protein neighborhood around the ligand is fixed and placements take on average 21s whereas with PyRosetta placements, they take on average 238s (and I can run placements in parallel luckily). I would ideally like to use RDKit as the placement method since it is so fast and I would like to perform 500K within a few days but, I wanted to confirm that RDKit is ‘good enough’ compared to the slightly more rigorous tool PyRosetta (it allows residues to relax and samples more conformations with the longer runtime I think).

Continue reading

Fine-tune generated molecular poses with a force field

Some molecular pose generation methods benefit from an energy relaxation post-processing step.

Predicted pose before energy minimization
Example of a small molecule pose before and after energy minimization. The pose before minimization is shown in white, the optimized prediction is shown in pink, and a crystal pose is shown as reference in light blue. Note how the aromatic rings are flattened and the leftmost bond is shortened by the optimization.

Here is a quick way to do this using OpenMM via a short script I prepared:

Continue reading