Category Archives: AI

Building a “Second Brain” – A Functional Knowledge Stack with Obsidian

Whilst I always enjoy the acquisition of knowledge, I’ve always struggled with depositing it usefully. From pen and paper notes with a 20 colour theme which lost value with each additional colour, to OneNote or iPad GoodNotes based emulations of pen and paper, it’s been a constant quest for the optimal note taking schema. Personally there are 3 key objectives I need my note taking to achieve:

  1. It must be digitally compatible and accessible from any device.
  2. It must comfortably handle math and images.
  3. It must be something I look forward to – the software needs to be aesthetically clean, lightweight with none of the chunkiness of Microsoft apps, and highly customisable.

For me the solution to this was Obsidian, the perhaps more cultified sibling to Notion. Obsidian is a note taking application that uses markdown with a surprising amount flexibility, including the ability to partner it with an LLM which I’ll explore in this blog, alongside my vault organisation do or dies, and favourite customisations.

Continue reading

New DPhil/PhD Programme in Pharmaceutical Science Joint with GSK!

Many OPIGlets found their way into a DPhil in Protein Informatics through our Systems Approaches to Biomedical Sciences Industrial Doctoral Landscape Award, which was open to applicants 2009-2024. This innovative course, based at the MPLS Doctoral Training Centre (DTC), offered six months of intensive taught modules prior to starting PhD-level research, allowing students to upskill across a diverse range of subjects (coding, mathematics, structural biology, etc.) and to go on to do research in areas significantly distinct from their formal Undergraduate training. All projects also benefited from direct co-supervision from researchers working in the Pharmaceutical industry, ensuring DPhil projects in areas with drug discovery translation potential. Regrettably, having twice successfully applied for renewal of funding, we were unsuccessful in our bid to refund SABS in 2024.

Happily though, we can now formally announce that our bid for a direct successor to SABS, the Transformative Technologies in Pharmaceutical Sciences IDLA, has been backed by the BBSRC, and we will shortly be opening for applications for entry this October [2026]. As someone who benefited from the interdisciplinary training and industry-adjacency of SABS, I’m thrilled to be a co-director of this new Programme and to help deliver this course to a new generation of talented students.

Continue reading

Democratising the Dark Arts: Writing Triton Kernels with Claude

Why would you ever want to leave the warm, fuzzy embrace of torch.nn? It works, it’s differentiable, and it rarely causes your entire Python session to segfault without a stack trace. The answer usually comes down to the “Memory Wall.” Modern deep learning is often less bound by how fast your GPU can do math (FLOPS) and more bound by how fast it can move data around (Memory Bandwidth). When you write a sequence of simple PyTorch operations, something like x = x * 2 + y the GPU often reads x from memory, multiplies it, writes it back, reads it again to add y, and writes it back again. It’s the computational equivalent of making five separate trips to the grocery store because you forgot the eggs, then the milk, then the bread. Writing a custom kernel lets you “fuse” these operations. You load the data once, perform a dozen mathematical operations on it while it sits in the ultra-fast chip registers, and write it back once. The performance gains can be massive (often 2x-10x for specific layers).But traditionally, the “cost” of accessing those gains, learning C++, understanding warp divergence, and manual memory management, was just too high for most researchers. That equation is finally changing.

Continue reading

What Molecular ML Can Learn from the Vision Community’s Representation Revolution

Something remarkable happened in computer vision in 2025: the fields of generative modeling and representation learning, which had developed largely independently, suddenly converged. Diffusion models started leveraging pretrained vision encoders like DINOv2 to dramatically accelerate training. Researchers discovered that aligning generative models to pretrained representations doesn’t just speed things up—it often produces better results.

As someone who works on generative models for (among other things) molecules and proteins, I’ve been watching this unfold with great interest. Could we do the same thing for molecular ML? We now have foundation models like MACE that learn powerful atomic representations. Could aligning molecular generative models to these representations provide similar benefits?

In this post, I’ll summarize what happened in vision (organized into four “phases”), and then discuss what I think are the key lessons for molecular machine learning. The punchline: many of these ideas are already starting to appear in our field, but we’re still in the early stages compared to vision.

For a more detailed treatment of the vision developments with full references and figures, see the extended blog post on my website.

Continue reading

Scientific Acceleration with Agentic Coding in 2026

In the past month we have surpassed a critical threshold with the capabilities of agentic coding models. What previously sounded like science fiction has now become reality, and I don’t believe any of us are ready for what is to come. In this blog post I share a summary of the breakthrough I am referring to, I give an insight into how I use agents to accelerate my research, and I make some predictions for the year. With pride, I can say this entire blog post was 100% written by me without any support from ChatGPT (except spell checking and the image below).

Continue reading

Can we make Boltz predict allosteric binding?

Orthosteric vs Allosteric binding (Nano Banana generated)

(While this post is meant to shed light on the problem of making AI structure prediction models like Boltz become better for allosteric binding, it is also an open call for collaborating on this problem.)

I recently took part in a Boltz hackathon organised by the MIT Jameel Clinic. I worked on improving Boltz 2 predictions for allosteric binders. The validation dataset provided was from a recent paper, Co-folding, the future of docking – prediction of allosteric and orthosteric ligands, which benchmarks some of the recent state-of-the-art AI structure prediction models on a curated set of allosteric and orthosteric binders. Generally, all AI structure prediction models are trained mostly on orthosteric binding cases, which means that their performance on allosteric binding is significantly worse.

Continue reading

Agentic AI

Agents have burst onto the scene in the last year. Agentic AI refers to AI systems that can pursue a goal, make decisions, take actions, and then adapt based on the results. 

Unlike traditional AI models that mostly answer questions or classify information, an agentic system can: 

Continue reading

Chemical Languages in Machine Learning

For more than a century, chemists have been trying to squeeze the beautifully messy, quantum-smeared reality of molecules into tidy digital boxes, “formats” such as line notations, connection tables, coordinate files, or even the vaguely hieroglyphic Wiswesser Line Notation. These formats weren’t designed for machine learning; some weren’t even designed for computers. And yet, they’ve become the wedged into the backbones of modern drug discovery, materials design and computational chemistry.

The emergent use of large language models and natural language processing in chemistry posits the immediate question: What does it mean for a molecule to have a “language,” and how should machines speak it?

if molecules are akin to words and sentences, what alphabet and grammatical rules should they follow?

What follows is a tour through the evolving world of chemical languages, why we use them, why our old representations keep breaking our shiny new models, and what might replace them.

Continue reading

An Introduction to the Basics of Reinforcement Learning

Reinforcement learning (RL) is pretty simple in theory – “take actions, get rewards, increase likelihood of high reward actions”. However, we can quickly runs into subtle problems that don’t show up in standard supervised learning. The aim of this post is to give a gentle, concrete introduction to what RL actually is, why we might want to use it instead of (or alongside) supervised learning, and some of the headaches (figure 1) that come with it: sparse rewards, credit assignment, and reward shaping.

Figure 1: I’d like to help take you from confusion/headache 🙁 (left) to having a least some clarity 🙂 (right) with regard to what reinforcement learning is and where its useful

Rather than starting with Atari or robot arms, we’ll work through a small toy environment: a paddle catching falling balls. It’s simple enough to understand visually, but rich enough to show how different reward designs can lead to completely different behaviours, even when the underlying environment and objective are the same. Along the way, we’ll connect the code to the standard RL formalism (MDPs, returns, policy gradients), so you can see how the equations map onto something you can actually run.

Continue reading

Dispatches from Lisbon

Tiles, tiles, as far as the eye can see. Conquerors on horseback storming into the breach; proud merchant ships cresting ocean waves; pious monks and shepherds tending to their flocks; Christ bearing the cross to Calvary—in intricate tones of blue and white on tin-glazed ceramic tilework. Vedi Napoli e poi muori the Sage of Weimar once wrote—to see Naples and die. But had he been to Lisbon?

The azulejos of the city’s numerous magnificent monasteries are far from the only thing for the weary PhD student to admire. Lisbon has no shortage of imposing bridges and striking towers, historically fraught monuments and charming art galleries. Crumbling old castles and revitalised industrial quarters butt up against the Airbnbs-and-expats district, somewhere between property speculation and the sea. An endearing flock of magellanic penguins paddles away an afternoon in their enclosure at the local aquarium (which is excellent), and an alarming proliferation of custard-based pastries invites one to indulge.

Continue reading