Fragment-to-Lead Successes in 2023

Back in 2021, I highlighted the annual fragment-to-lead (F2L) success stories from 2019 [Blog post] [Paper]. This is one of my favourite annual publications, and I’m delighted to see that it’s still going strong. In this post, I’ll discuss the 2023 edition that was published in at the start of 2025 [Paper].

Continue reading

Are you addicted to dopamine? 

Ever since the pandemic my attachment to screens and media has slowly crept up on me, and I suspect that’s the case for many of us. It hit me when I started panicked after leaving my flat without headphones, thinking “how could I ever walk around with just my thoughts?” I decided to significantly reduce my technology usage and I keep getting the sense that I’m experiencing some kind of withdrawal from the constant media and dopamine hits, but I was curious just what’s going on, and how bad it is. 

What does dopamine actually do and is “dopamine addiction” scientifically accurate?

Continue reading

Human Learning in the age of Machine Learning

Source: Venus Krier

Oxford University has recently announced that its students will receive free access to a professional-level subscription of ChatGPT Education. This decision is more than just a perk, it’s a signal. One of the world’s leading universities is openly acknowledging that generative AI will be central to the academic experience of its students. But what does this mean for learning? For education? For scholarship itself?

To frame this question, it is worth beginning with a macro view: Mary Meeker’s AI Trends Report (2025) argues that AI is accelerating the transformation of knowledge work, pushing tasks once reserved for experts into more automated or semi-automated regimes. In her framing, AI is less a standalone innovation than a “meta-technology” that amplifies other domains.

Continue reading

Bye Bye Lucy Vost! (Lucy Gone-st but not forgotten)

This month we said Goodbye to a few OG members of OPIG 🙁 among them was one of my favourites, Lucy! (should I apologise to the others?)

Lucy did some amazing work on improving output of generative models during her time in OPIG. One of her recent works involved increasing the plausibility of 3D molecular diffusion models using distorted training data. Check it out here.

Early in her PhD she worked on PointVS with Jack Scantlebury. PointVS is a machine learning scoring function that predicts protein-small molecule binding affinity by learning actual binding physics rather than dataset biases.

Word on the street is she also has some secret works in the making…

Continue reading

Getting In the Flow – How to Flow (Match)

Introduction

In the world of computational structural biology you might have heard of diffusion models as the current big thing in generative modelling. Diffusion models are great because primarily they look cool when you visualise the denoising process to generate a protein structure (checkout RFdiffusion Colab notebook), but also because they are state of the art at diverse and designable protein backbone structure generation.

Originally emerging from computer vision, a lot of work has been built up around their application to macromolecules – especially exciting is their harmonious union with geometric deep learning in the case of SE(3) equivariance (see FrameDiff). I don’t know about you but I get particularly excited about geometric deep learning, mostly because it involves objectively dope words like “manifold” and “Riemannian”, better yet “Riemannian manifolds” – woah! (see Bronstein’s geometric deep learning for more fun vocabulary to add to your vernacular- like “geodesic”, Geometric Deep Learning).

But we’re getting side tracked. Diffusion is a square to rectangle case of score-based generative models with the clause that diffusion refers explicitly to the learning of a time-dependent score function that is typically learned via a denoising process. Checkout Jakub Tomczak’s blog for more on diffusion and score-based generative models. Flow matching, although technically different to score-based generative models, also makes use of transformations to gaussian but is generally faster and not constrained to discrete time steps (or even Gaussian priors). So the big question is, how does one flow match?

Continue reading

Is attention all you need for protein folding?

Researchers from Apple have released SimpleFold, a protein structure prediction model which uses exclusively standard Transformer layers. The results seem to show that SimpleFold is a little less accurate than methods such as AlphaFold2, but much faster and easier to integrate into standard LLM-like workflows. SimpleFold also shows very good scaling performance, in line with other Transformer models like ESM2. So what is powering this seemingly simple development?

Continue reading

Extracting 3D Pharmacophore Points with RDKit

Pharmacophores are simplified representations of the key interactions ligands make with proteins, such as hydrogen bonds, charge interactions, and aromatic contacts. Think of them as the essential “bumps and grooves” on a key that allow it to fit its lock (the protein). These maps can be derived from ligands or protein–ligand complexes and are powerful tools for virtual screening and generative models. Here, we’ll see how to extract 3D pharmacophore points from a ligand using RDKit.
(Code adapted from Dr. Ruben Sanchez.)

Why pharmacophore “points”?

RDKit represents each pharmacophore feature (donor, acceptor, aromatic, etc.) as a point in 3D space, located at the feature center. These points capture the essential interaction motifs of a ligand without requiring the full atomic detail.

Continue reading

Reflections on GRC CADD 2025: A Week of Insight, Innovation, and Baseball

Henry

Back in July, some very lucky OPIGlets ventured across the pond to discover life in Southern Maine (and Boston!). For someone visiting Boston for the first time, no trip would be complete without a Red Sox game—a thoroughly enjoyable highlight (see Figure 1). While we were there, we also went to Gordon Research Conference (GRC) on Computer Aided Drug Design (CADD).

A flock of OPIGlets taking in the Fenway Park experience at a Red Sox game.
Continue reading

Exploring the Protein Data Bank programmatically

The Worldwide Protein Data Bank (wwPDB or just the PDB to its friends) is a key resource for structural biology, providing a single central repository of protein and nucleic acid structure data. Most researchers interact with the PDB either by downloading and parsing individual entries as mmCIF files (or as legacy PDB files), or by downloading aggregated data, such as the RCSB‘s collection in a single FASTA file of all polymer entity sequences. All too often, researchers end up laboriously writing their own file parsers to digest these files. In recent years though, more sophisticated tools have been made available that make it much easier to access only the data that you need.

Continue reading

Accelerating AlphaFold 3 for high-throughput structure prediction

Introduction

Recently, I have been conducting a project in which I need to predict the structures of a dataset comprising a few thousand protein sequences using AlphaFold 3. Taking a naive approach, it was taking an hour or two per entry to get a predicted structure. With a few thousand structures, it seemed that it would take months to be able to run…

In this blog post, I will go through some tips I found to help accelerate the structure predictions and make all of the predictions I needed in under a week. In general, following the tips in the AlphaFold 3 performance documentation is a useful starting place. Most of the tips I provide are related to accelerating the MSA generation portion of the predictions because this was the biggest bottleneck in my case.

Continue reading