Category Archives: AI

Controlling the Diffusion Denoising Process: A Molecular Show

This blog post is supporting my poster at Young Modellers Forum and makes things way easier to see and understand. Underneath each GIF, is the explanation of what you should look for as things denoise throughout the diffusion trajectory. Click the GIFs for higher quality viewing!

Continue reading →

Is the molecule in the computer?

The Molecular Graphics and Modelling Society began life as the Molecular Graphics Society. It’s hard to imagine a time without computer graphics, but yes, it existed. The MGS was formed by the pioneers who made molecular graphics commonplace.

In 1994, the MGS organized an Art and Video Show (Goodsell et al., 1995), and I submitted some of my own work. One of the other images — inspired by Magritte‘s “Ceci n’est pas une pipe”, depicts a molecule with a remarkable similarity to a pipe — and to a molecule… It was submitted by Mike Hann (of GSK):

“Ceci n’est pas une molecule”, image by Mike Hann, 1994.

Continue reading →

I Prompt, Therefore I Am: Is Artificial Intelligence the End of Human Thought?

Welcome to a slightly different blog post than usual. Today I am sharing an insight into my life at Keble College, Oxford. I am the Chair of Cheese and Why?, which is a talk series we host in our common room during term. The format is simple: I provide cheese and wine, and a guest speaker provides the “why”—a short, thought-provoking talk to spark discussion for the evening.

To kick off the series, I opened with the question of artificial intelligence replacing human thought. I am sharing my spoken essay below. The aim of a Cheese and Why? talk is to generate questions rather than deliver answers, so I hope you’ll forgive me if what follows doesn’t quite adhere to the rigorous structure of a traditional Oxford humanities essay. For best reading, I recommend a glass of claret and a wedge of Stilton, to recreate the full Oxford common-room experience.

Continue reading →

Human Learning in the age of Machine Learning

Oxford University has recently announced that its students will receive free access to a professional-level subscription of ChatGPT Education. This decision is more than just a perk, it’s a signal. One of the world’s leading universities is openly acknowledging that generative AI will be central to the academic experience of its students. But what does this mean for learning? For education? For scholarship itself?

To frame this question, it is worth beginning with a macro view: Mary Meeker’s AI Trends Report (2025) argues that AI is accelerating the transformation of knowledge work, pushing tasks once reserved for experts into more automated or semi-automated regimes. In her framing, AI is less a standalone innovation than a “meta-technology” that amplifies other domains.

Continue reading →

Getting In the Flow – How to Flow (Match)

Introduction

In the world of computational structural biology you might have heard of diffusion models as the current big thing in generative modelling. Diffusion models are great because primarily they look cool when you visualise the denoising process to generate a protein structure (checkout RFdiffusion Colab notebook), but also because they are state of the art at diverse and designable protein backbone structure generation.

Originally emerging from computer vision, a lot of work has been built up around their application to macromolecules – especially exciting is their harmonious union with geometric deep learning in the case of SE(3) equivariance (see FrameDiff). I don’t know about you but I get particularly excited about geometric deep learning, mostly because it involves objectively dope words like “manifold” and “Riemannian”, better yet “Riemannian manifolds” – woah! (see Bronstein’s geometric deep learning for more fun vocabulary to add to your vernacular- like “geodesic”, Geometric Deep Learning).

But we’re getting side tracked. Diffusion is a square to rectangle case of score-based generative models with the clause that diffusion refers explicitly to the learning of a time-dependent score function that is typically learned via a denoising process. Checkout Jakub Tomczak’s blog for more on diffusion and score-based generative models. Flow matching, although technically different to score-based generative models, also makes use of transformations to gaussian but is generally faster and not constrained to discrete time steps (or even Gaussian priors). So the big question is, how does one flow match?

Continue reading →

Is attention all you need for protein folding?

Researchers from Apple have released SimpleFold, a protein structure prediction model which uses exclusively standard Transformer layers. The results seem to show that SimpleFold is a little less accurate than methods such as AlphaFold2, but much faster and easier to integrate into standard LLM-like workflows. SimpleFold also shows very good scaling performance, in line with other Transformer models like ESM2. So what is powering this seemingly simple development?

Continue reading →

Accelerating AlphaFold 3 for high-throughput structure prediction

Introduction

Recently, I have been conducting a project in which I need to predict the structures of a dataset comprising a few thousand protein sequences using AlphaFold 3. Taking a naive approach, it was taking an hour or two per entry to get a predicted structure. With a few thousand structures, it seemed that it would take months to be able to run…

In this blog post, I will go through some tips I found to help accelerate the structure predictions and make all of the predictions I needed in under a week. In general, following the tips in the AlphaFold 3 performance documentation is a useful starting place. Most of the tips I provide are related to accelerating the MSA generation portion of the predictions because this was the biggest bottleneck in my case.

Continue reading →

Understand Large Codebases Faster Using GitIngest

Often as researchers we have to deal with large and ugly codebases – this is not new, I know. Alas, fear not, now we have large language models (LLMs) like ChatGPT and friends which make things a little faster! In this blogpost I will show you how to use GitIngest to do this even faster using your favourite LLM.

No more copy pasting files individually or writing a paragraph explaining the directory structure, or even worse, relying on an LLM to use web search to find the codebase. As the codebase grows, the unreliability of these methods does too. GitIngest makes any “whole” codebase, prompt friendly – one prompt will be all you need!

Continue reading →

How reliable are affinity datasets in practice?

The Data Bottleneck in AI-Powered Drug Discovery

The pharmaceutical industry is undergoing a profound transformation, driven by the promise of Artificial Intelligence (AI) and Machine Learning (ML). These technologies offer the potential to escape the industry’s persistent challenges of high costs, protracted development timelines, and staggering failure rates. From accelerating the identification of novel biological targets to optimizing the properties of lead compounds, AI is poised to enhance the precision and efficiency of drug discovery at nearly every stage

Yet, this revolutionary potential is constrained by a fundamental dependency. The power of modern AI, particularly the deep learning (DL) models that excel at complex pattern recognition, is directly proportional to the volume, diversity, and quality of the data they are trained on. This creates a critical bottleneck: the high-quality experimental data required to train these models—specifically, the protein-ligand binding affinity values that quantify the strength of an interaction—are notoriously scarce, expensive to generate, and often of inconsistent quality or locked within proprietary databases.

Continue reading →

GUI Slop

Previously, I wrote about writing GUI’s for controlling and monitoring experiments. For ML this might be useful for tracking model learning (e.g. the popular weights and biases platform), while in the wet-lab it is great for making experiments simpler and more reliable to run, monitor and record.

And as it turns out, AI is quite good at this!

I have been using VSCode CoPilot in agent mode with Gemini 2.5 Pro to create simple GUIs that can control my experiments, which has proved pretty effective. Although there is clearly a concern when interfacing AI generated code with real hardware (especially if you “vibe code”, that is, just run whatever it generates) in practice it has allowed me to quickly generate tools for testing purposes, cutting the time required for getting a project started from hours to minutes.

As an example, I recently needed to hook up a Helmholtz coil to some custom electronics, centred around a Teensy micro-controller and designed to output a precisely controlled current.

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Category Archives: AI

Controlling the Diffusion Denoising Process: A Molecular Show

Is the molecule in the computer?

I Prompt, Therefore I Am: Is Artificial Intelligence the End of Human Thought?

Human Learning in the age of Machine Learning

Getting In the Flow – How to Flow (Match)

Introduction

Is attention all you need for protein folding?

Accelerating AlphaFold 3 for high-throughput structure prediction

Introduction

Understand Large Codebases Faster Using GitIngest

How reliable are affinity datasets in practice?

The Data Bottleneck in AI-Powered Drug Discovery

GUI Slop