Category Archives: Protein Structure

How Unusual Is Your Generated Molecule? Let The CCDC Tell You

In this post I’ll walk through how to set up the CCDC Python API and use the CSD Geometry Analyser to evaluate the geometric quality of molecules from three representative structure-based de novo design models. I’ve put together a small GitHub repo with the full analysis code where we look at bond lengths, angles, torsions, and ring conformations across the three methods, and compare these against their PoseBusters validity scores to see what each metric is really capturing.

Continue reading

A Golden Age of Nanomedicine

As someone who spent their entire academic career, from B.Sc. to M.Sc. to Ph.D., within a Kavli Institute for Nanoscience Discovery (first in Delft and now in Oxford), I’ve had the privilege of seeing firsthand just how beautifully intricate the nanoscale world can be. Now, as my research focuses on lipid nanoparticles for genetic therapeutics and vaccines, I would like to use this platform to advocate for what I believe is one of the most transformative frontiers in modern medicine: the rational design of nanomaterials for therapeutic delivery.

Continue reading

Analyzing AlphaFold 3’s Diffusion Trajectory

A useful way to understand AlphaFold 3’s sampling behavior is to look not only at the final predicted structure, but at what happens along the reverse diffusion trajectory itself. If we track quantities such as the physical energy of samples, noise scale, and update magnitude over time, a very clear pattern emerges: structures remain physically imperfect for most of sampling, and only take proper global shape in the final low-noise steps.

This behavior is a result of the diffusion procedure implemented in Algorithm 18, Sample Diffusion, which follows an EDM-style sampler with churn. Rather than simply marching monotonically from noise to structure, the sampler repeatedly perturbs the current coordinates, denoises them, and then takes a Euler-like update step. Because of the churn mechanism, AlphaFold 3 deliberately injects additional noise during part of the trajectory, which encourages exploration but also delays local geometric convergence. This mechanism is shown in step 4 -7 of the Sample Diffusion Algorithm from Alphafold3 Supplementary Information.

Continue reading

Misconduct, Bias or Benign? A Case of Missing Ångströms

An Ångström

An Ångström (Å) is a unit of length equal to 10−10 metres; one ten-billionth of a metre. It sits at a comfortable scale for the atomic world, with the diameter of a hydrogen atom, the length of a chemical bond, all measured in Ångström.

It is not an International System of Units (Système International d’Unités) “SI” unit. In fact, it has been formally deprecated in favour of the nanometre (1 Å = 0.1 nm), and standards bodies such as NIST and the BIPM discourage its use. Yet, in structural biology and chemistry, crystallography, and materials science, the Ångström persists. I would say, partly out of stubbornness, but mostly out of convenience. Saying a protein structure was solved at 2.1 Å feels natural in a way that 0.21 nm does not.

So we keep using it. And because we keep using it, we inherit its quirks and history.

Continue reading

Can we make Boltz predict allosteric binding?

Orthosteric vs Allosteric binding (Nano Banana generated)

(While this post is meant to shed light on the problem of making AI structure prediction models like Boltz become better for allosteric binding, it is also an open call for collaborating on this problem.)

I recently took part in a Boltz hackathon organised by the MIT Jameel Clinic. I worked on improving Boltz 2 predictions for allosteric binders. The validation dataset provided was from a recent paper, Co-folding, the future of docking – prediction of allosteric and orthosteric ligands, which benchmarks some of the recent state-of-the-art AI structure prediction models on a curated set of allosteric and orthosteric binders. Generally, all AI structure prediction models are trained mostly on orthosteric binding cases, which means that their performance on allosteric binding is significantly worse.

Continue reading

Controlling the Diffusion Denoising Process: A Molecular Show

This blog post is supporting my poster at Young Modellers Forum and makes things way easier to see and understand. Underneath each GIF, is the explanation of what you should look for as things denoise throughout the diffusion trajectory. Click the GIFs for higher quality viewing!

Continue reading

Is attention all you need for protein folding?

Researchers from Apple have released SimpleFold, a protein structure prediction model which uses exclusively standard Transformer layers. The results seem to show that SimpleFold is a little less accurate than methods such as AlphaFold2, but much faster and easier to integrate into standard LLM-like workflows. SimpleFold also shows very good scaling performance, in line with other Transformer models like ESM2. So what is powering this seemingly simple development?

Continue reading

Exploring the Protein Data Bank programmatically

The Worldwide Protein Data Bank (wwPDB or just the PDB to its friends) is a key resource for structural biology, providing a single central repository of protein and nucleic acid structure data. Most researchers interact with the PDB either by downloading and parsing individual entries as mmCIF files (or as legacy PDB files), or by downloading aggregated data, such as the RCSB‘s collection in a single FASTA file of all polymer entity sequences. All too often, researchers end up laboriously writing their own file parsers to digest these files. In recent years though, more sophisticated tools have been made available that make it much easier to access only the data that you need.

Continue reading

Accelerating AlphaFold 3 for high-throughput structure prediction

Introduction

Recently, I have been conducting a project in which I need to predict the structures of a dataset comprising a few thousand protein sequences using AlphaFold 3. Taking a naive approach, it was taking an hour or two per entry to get a predicted structure. With a few thousand structures, it seemed that it would take months to be able to run…

In this blog post, I will go through some tips I found to help accelerate the structure predictions and make all of the predictions I needed in under a week. In general, following the tips in the AlphaFold 3 performance documentation is a useful starting place. Most of the tips I provide are related to accelerating the MSA generation portion of the predictions because this was the biggest bottleneck in my case.

Continue reading