Category Archives: Small Molecules

SigmaDock: untwisting molecular docking with fragment-based SE(3) diffusion

Alvaro Prat, Leo Zhang, Charlotte Deane, Yee Whye Teh, & Garrett M. Morris
International Conference On Learning Representations (ICLR 2026)

Molecular docking sits at the heart of structure-based drug discovery. If we can reliably predict how a small molecule binds in a protein pocket, we can prioritize compounds faster, reason about interactions more clearly, and build better pipelines for hit discovery and lead optimization. But in practice, docking is still a difficult problem: classical methods are often robust but imperfect, while recent deep learning approaches have sometimes looked promising on headline metrics without consistently producing chemically plausible poses.

SigmaDock was built to address exactly that gap. Instead of treating docking as a problem of directly diffusing on torsion angles or unconstrained atomic coordinates, SigmaDock represents ligands as collections of rigid fragments and learns how to reassemble them inside the binding pocket using diffusion on $\text{SE}(3)$ . In plain English: rather than trying to “wiggle” every flexible degree of freedom in a tangled way, SigmaDock breaks the ligand into chemically meaningful rigid pieces and learns where those pieces should go, and how they should reorient, to recover a valid bound pose.

**Figure 1:** Illustration of SigmaDock using PDB 1V4S and ligand MRK. We create an initial conformation of a query ligand where we define our $m$ rigid body fragments (colour coded). The corresponding forward diffusion process operates in $\text{SE}(3)^m$ via independent roto-translations.

Continue reading →

Fragment-to-Lead Successes in 2024 – 10th Anniversary Edition

In what I have to admit is now becoming an annual tradition ([2023] [2019]), I’d like to highlight the 2024 edition of the fragment-to-lead success stories, published in J. Med. Chem. at the end of 2025 [Paper].

Continue reading →

New DPhil/PhD Programme in Pharmaceutical Science Joint with GSK!

Many OPIGlets found their way into a DPhil in Protein Informatics through our Systems Approaches to Biomedical Sciences Industrial Doctoral Landscape Award, which was open to applicants 2009-2024. This innovative course, based at the MPLS Doctoral Training Centre (DTC), offered six months of intensive taught modules prior to starting PhD-level research, allowing students to upskill across a diverse range of subjects (coding, mathematics, structural biology, etc.) and to go on to do research in areas significantly distinct from their formal Undergraduate training. All projects also benefited from direct co-supervision from researchers working in the Pharmaceutical industry, ensuring DPhil projects in areas with drug discovery translation potential. Regrettably, having twice successfully applied for renewal of funding, we were unsuccessful in our bid to refund SABS in 2024.

Happily though, we can now formally announce that our bid for a direct successor to SABS, the Transformative Technologies in Pharmaceutical Sciences IDLA, has been backed by the BBSRC, and we will shortly be opening for applications for entry this October [2026]. As someone who benefited from the interdisciplinary training and industry-adjacency of SABS, I’m thrilled to be a co-director of this new Programme and to help deliver this course to a new generation of talented students.

Continue reading →

What Molecular ML Can Learn from the Vision Community’s Representation Revolution

Something remarkable happened in computer vision in 2025: the fields of generative modeling and representation learning, which had developed largely independently, suddenly converged. Diffusion models started leveraging pretrained vision encoders like DINOv2 to dramatically accelerate training. Researchers discovered that aligning generative models to pretrained representations doesn’t just speed things up—it often produces better results.

As someone who works on generative models for (among other things) molecules and proteins, I’ve been watching this unfold with great interest. Could we do the same thing for molecular ML? We now have foundation models like MACE that learn powerful atomic representations. Could aligning molecular generative models to these representations provide similar benefits?

In this post, I’ll summarize what happened in vision (organized into four “phases”), and then discuss what I think are the key lessons for molecular machine learning. The punchline: many of these ideas are already starting to appear in our field, but we’re still in the early stages compared to vision.

For a more detailed treatment of the vision developments with full references and figures, see the extended blog post on my website.

Continue reading →

Chemical Languages in Machine Learning

For more than a century, chemists have been trying to squeeze the beautifully messy, quantum-smeared reality of molecules into tidy digital boxes, “formats” such as line notations, connection tables, coordinate files, or even the vaguely hieroglyphic Wiswesser Line Notation. These formats weren’t designed for machine learning; some weren’t even designed for computers. And yet, they’ve become the wedged into the backbones of modern drug discovery, materials design and computational chemistry.

The emergent use of large language models and natural language processing in chemistry posits the immediate question: What does it mean for a molecule to have a “language,” and how should machines speak it?

if molecules are akin to words and sentences, what alphabet and grammatical rules should they follow?

What follows is a tour through the evolving world of chemical languages, why we use them, why our old representations keep breaking our shiny new models, and what might replace them.

Continue reading →

Controlling the Diffusion Denoising Process: A Molecular Show

This blog post is supporting my poster at Young Modellers Forum and makes things way easier to see and understand. Underneath each GIF, is the explanation of what you should look for as things denoise throughout the diffusion trajectory. Click the GIFs for higher quality viewing!

Continue reading →

Design your very own drug: An introduction to structure-based small molecule drug design

Are you curious about how scientists design small molecules to treat disease using computational tools, but the words RDKit, docking, and QED mean nothing to you? Look no further than these tutorials for learning the fundamentals of computational small molecule drug design through interactive tutorials that introduce the key tools, concepts, and workflows. From generating compounds to evaluating their drug-likeness and binding potential, by the end you’ll be ready to explore how computational methods can result in the discovery of your very own (virtual) drug candidates to cure Zika!

Find the materials here: https://github.com/oxpig/dtc-struc-bio-smolecules/tree/main.

Continue reading →

Is the molecule in the computer?

The Molecular Graphics and Modelling Society began life as the Molecular Graphics Society. It’s hard to imagine a time without computer graphics, but yes, it existed. The MGS was formed by the pioneers who made molecular graphics commonplace.

In 1994, the MGS organized an Art and Video Show (Goodsell et al., 1995), and I submitted some of my own work. One of the other images — inspired by Magritte‘s “Ceci n’est pas une pipe”, depicts a molecule with a remarkable similarity to a pipe — and to a molecule… It was submitted by Mike Hann (of GSK):

“Ceci n’est pas une molecule”, image by Mike Hann, 1994.

Continue reading →

Fragment-to-Lead Successes in 2023

Back in 2021, I highlighted the annual fragment-to-lead (F2L) success stories from 2019 [Blog post] [Paper]. This is one of my favourite annual publications, and I’m delighted to see that it’s still going strong. In this post, I’ll discuss the 2023 edition that was published in at the start of 2025 [Paper].

Continue reading →

Extracting 3D Pharmacophore Points with RDKit

Pharmacophores are simplified representations of the key interactions ligands make with proteins, such as hydrogen bonds, charge interactions, and aromatic contacts. Think of them as the essential “bumps and grooves” on a key that allow it to fit its lock (the protein). These maps can be derived from ligands or protein–ligand complexes and are powerful tools for virtual screening and generative models. Here, we’ll see how to extract 3D pharmacophore points from a ligand using RDKit.
(Code adapted from Dr. Ruben Sanchez.)

Why pharmacophore “points”?

RDKit represents each pharmacophore feature (donor, acceptor, aromatic, etc.) as a point in 3D space, located at the feature center. These points capture the essential interaction motifs of a ligand without requiring the full atomic detail.

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Category Archives: Small Molecules

SigmaDock: untwisting molecular docking with fragment-based SE(3) diffusion

Fragment-to-Lead Successes in 2024 – 10th Anniversary Edition

New DPhil/PhD Programme in Pharmaceutical Science Joint with GSK!

What Molecular ML Can Learn from the Vision Community’s Representation Revolution

Chemical Languages in Machine Learning

Controlling the Diffusion Denoising Process: A Molecular Show

Design your very own drug: An introduction to structure-based small molecule drug design

Is the molecule in the computer?

Fragment-to-Lead Successes in 2023

Extracting 3D Pharmacophore Points with RDKit

Why pharmacophore “points”?