Category Archives: Machine Learning

Peering Inside the Black Box: A Beginner’s Introduction to Mechanistic Interpretability

Over the last few years, large language models (LLMs) have gone from being curiosities tucked away in research labs to something most of us interact with on a daily basis; whether for drafting emails, debugging code, or simply pondering the meaning of life at 2am. And yet, for all our reliance on these systems, a rather inconvenient truth lingers in the background: nobody, not even the people who built them, can fully explain what is going on inside.

This is where mechanistic interpretability comes in.

In essence, mechanistic interpretability is the approach of explaining complex machine learning systems through the behaviour of their functional units (Kästner and Crook, 2024) by reverse-engineering them into their more elementary computations (Rai et al., 2025). The aim is not simply to know that a model gives the right answer, but to pull apart the underlying machinery and uncover the causal relationships between input and output. Think of it as neuroscience for neural networks, except we can read every neuron at any moment, rewind, replay, and intervene mid-thought.

Continue reading

Will TurboQuant save us from the RAM apocalypse?

The LLM boom is causing a global shortage of the very same computer memory it needs to sustain itself. Reports suggest OpenAI’s Stargate project alone could consume up to 40% of global DRAM output. Frontier labs like Google DeepMind need to make their models more memory-efficient.


One such technique is TurboQuant, released by Google. TurboQuant is an example of an online “quantisation” method. LLMs represent information using large tensors of numerical values, where each number typically uses 64 or 32 bits. However, many values do not require full numerical precision, so we can “round” them using fewer bits and less memory. We can see this in the example below:

The rounded value now requires 4x less memory. Source

Some quantisation methods are applied offline before inference begins. TurboQuant is ‘online’ because it compresses the KV cache dynamically during inference.

Continue reading

A Golden Age of Nanomedicine

As someone who spent their entire academic career, from B.Sc. to M.Sc. to Ph.D., within a Kavli Institute for Nanoscience Discovery (first in Delft and now in Oxford), I’ve had the privilege of seeing firsthand just how beautifully intricate the nanoscale world can be. Now, as my research focuses on lipid nanoparticles for genetic therapeutics and vaccines, I would like to use this platform to advocate for what I believe is one of the most transformative frontiers in modern medicine: the rational design of nanomaterials for therapeutic delivery.

Continue reading

New DPhil/PhD Programme in Pharmaceutical Science Joint with GSK!

Many OPIGlets found their way into a DPhil in Protein Informatics through our Systems Approaches to Biomedical Sciences Industrial Doctoral Landscape Award, which was open to applicants 2009-2024. This innovative course, based at the MPLS Doctoral Training Centre (DTC), offered six months of intensive taught modules prior to starting PhD-level research, allowing students to upskill across a diverse range of subjects (coding, mathematics, structural biology, etc.) and to go on to do research in areas significantly distinct from their formal Undergraduate training. All projects also benefited from direct co-supervision from researchers working in the Pharmaceutical industry, ensuring DPhil projects in areas with drug discovery translation potential. Regrettably, having twice successfully applied for renewal of funding, we were unsuccessful in our bid to refund SABS in 2024.

Happily though, we can now formally announce that our bid for a direct successor to SABS, the Transformative Technologies in Pharmaceutical Sciences IDLA, has been backed by the BBSRC, and we will shortly be opening for applications for entry this October [2026]. As someone who benefited from the interdisciplinary training and industry-adjacency of SABS, I’m thrilled to be a co-director of this new Programme and to help deliver this course to a new generation of talented students.

Continue reading

Democratising the Dark Arts: Writing Triton Kernels with Claude

Why would you ever want to leave the warm, fuzzy embrace of torch.nn? It works, it’s differentiable, and it rarely causes your entire Python session to segfault without a stack trace. The answer usually comes down to the “Memory Wall.” Modern deep learning is often less bound by how fast your GPU can do math (FLOPS) and more bound by how fast it can move data around (Memory Bandwidth). When you write a sequence of simple PyTorch operations, something like x = x * 2 + y the GPU often reads x from memory, multiplies it, writes it back, reads it again to add y, and writes it back again. It’s the computational equivalent of making five separate trips to the grocery store because you forgot the eggs, then the milk, then the bread. Writing a custom kernel lets you “fuse” these operations. You load the data once, perform a dozen mathematical operations on it while it sits in the ultra-fast chip registers, and write it back once. The performance gains can be massive (often 2x-10x for specific layers).But traditionally, the “cost” of accessing those gains, learning C++, understanding warp divergence, and manual memory management, was just too high for most researchers. That equation is finally changing.

Continue reading

What Molecular ML Can Learn from the Vision Community’s Representation Revolution

Something remarkable happened in computer vision in 2025: the fields of generative modeling and representation learning, which had developed largely independently, suddenly converged. Diffusion models started leveraging pretrained vision encoders like DINOv2 to dramatically accelerate training. Researchers discovered that aligning generative models to pretrained representations doesn’t just speed things up—it often produces better results.

As someone who works on generative models for (among other things) molecules and proteins, I’ve been watching this unfold with great interest. Could we do the same thing for molecular ML? We now have foundation models like MACE that learn powerful atomic representations. Could aligning molecular generative models to these representations provide similar benefits?

In this post, I’ll summarize what happened in vision (organized into four “phases”), and then discuss what I think are the key lessons for molecular machine learning. The punchline: many of these ideas are already starting to appear in our field, but we’re still in the early stages compared to vision.

For a more detailed treatment of the vision developments with full references and figures, see the extended blog post on my website.

Continue reading

Chemical Languages in Machine Learning

For more than a century, chemists have been trying to squeeze the beautifully messy, quantum-smeared reality of molecules into tidy digital boxes, “formats” such as line notations, connection tables, coordinate files, or even the vaguely hieroglyphic Wiswesser Line Notation. These formats weren’t designed for machine learning; some weren’t even designed for computers. And yet, they’ve become the wedged into the backbones of modern drug discovery, materials design and computational chemistry.

The emergent use of large language models and natural language processing in chemistry posits the immediate question: What does it mean for a molecule to have a “language,” and how should machines speak it?

if molecules are akin to words and sentences, what alphabet and grammatical rules should they follow?

What follows is a tour through the evolving world of chemical languages, why we use them, why our old representations keep breaking our shiny new models, and what might replace them.

Continue reading

An Introduction to the Basics of Reinforcement Learning

Reinforcement learning (RL) is pretty simple in theory – “take actions, get rewards, increase likelihood of high reward actions”. However, we can quickly runs into subtle problems that don’t show up in standard supervised learning. The aim of this post is to give a gentle, concrete introduction to what RL actually is, why we might want to use it instead of (or alongside) supervised learning, and some of the headaches (figure 1) that come with it: sparse rewards, credit assignment, and reward shaping.

Figure 1: I’d like to help take you from confusion/headache 🙁 (left) to having a least some clarity 🙂 (right) with regard to what reinforcement learning is and where its useful

Rather than starting with Atari or robot arms, we’ll work through a small toy environment: a paddle catching falling balls. It’s simple enough to understand visually, but rich enough to show how different reward designs can lead to completely different behaviours, even when the underlying environment and objective are the same. Along the way, we’ll connect the code to the standard RL formalism (MDPs, returns, policy gradients), so you can see how the equations map onto something you can actually run.

Continue reading

Is the molecule in the computer?

The Molecular Graphics and Modelling Society began life as the Molecular Graphics Society. It’s hard to imagine a time without computer graphics, but yes, it existed. The MGS was formed by the pioneers who made molecular graphics commonplace.

In 1994, the MGS organized an Art and Video Show (Goodsell et al., 1995), and I submitted some of my own work. One of the other images — inspired by Magritte‘s “Ceci n’est pas une pipe”, depicts a molecule with a remarkable similarity to a pipe — and to a molecule… It was submitted by Mike Hann (of GSK):

“Ceci n’est pas une molecule”, image by Mike Hann, 1994.
Continue reading

I Prompt, Therefore I Am: Is Artificial Intelligence the End of Human Thought? 

Welcome to a slightly different blog post than usual. Today I am sharing an insight into my life at Keble College, Oxford. I am the Chair of Cheese and Why?, which is a talk series we host in our common room during term. The format is simple: I provide cheese and wine, and a guest speaker provides the “why”—a short, thought-provoking talk to spark discussion for the evening.

To kick off the series, I opened with the question of artificial intelligence replacing human thought. I am sharing my spoken essay below. The aim of a Cheese and Why? talk is to generate questions rather than deliver answers, so I hope you’ll forgive me if what follows doesn’t quite adhere to the rigorous structure of a traditional Oxford humanities essay. For best reading, I recommend a glass of claret and a wedge of Stilton, to recreate the full Oxford common-room experience.

Continue reading