SAbDab2: The structural antibody database in the age of machine learning

Henriette L. Capel, Odysseas Vavourakis, Benjamin H. Williams, Christopher R. Taylor, and Charlotte M. Deane

The Structural Antibody Database

The Structural Antibody Database (SAbDab) [1] is a publicly available repository of experimentally determined antibody structures, first released in 2013. Explicit support for single-domain antibodies was added in 2021, with SAbDab-nano [2]. Detailed annotations and consistent maintenance have made SAbDab a central resource supporting important advances in the field. SAbDab has been used to study antibody-antigen interactions, including SARS-CoV-2; to predict antibody structure; to design antibodies de-novo; and to investigate antibody flexibility.

Continue reading →

Building an Agent – Practical Notes for Beginners

For the last few months, I’ve been building an agent around OPIG’s antibody analysis and design tools, and I thought I’d share some practical notes from the process.

An agent is a language model that doesn’t just answer questions but can also decide what to do, call tools, and follow workflows. I’m using Claude in these notes, but most of the ideas apply equally well to other agent frameworks.

Rather than building an agent from scratch, we’re starting with one that already comes with useful capabilities out of the box. For example, Claude Code can search files, edit code, execute commands, and run scripts. Everything below is really about adapting that behaviour to a specific domain and workflow.

How to start?

Start with the `CLAUDE.md` file. It’s a special file Claude reads at the start of every conversation, and it’s where you define the behaviour of the agent (other agents have their own equivalent — for example `AGENTS.md`). In this file, include things like bash commands, code style preferences, and workflow rules. This gives Claude a persistent context that it can’t infer from the codebase alone. Since it’s loaded every session, it sets the baseline for how the agent behaves.

Start simple – especially if it’s your first time. Define clear tools, write lightweight instructions in the markdown (md) file, and create realistic evaluations before adding complexity.

Then run a loop where the agent gathers context, takes actions, and verifies the outputs. Think about how you’ll verify them first: if you can’t tell whether a run was good, you can’t tell whether your changes helped.

In research, you don’t always know how a project will evolve, so you’ll often end up making many changes along the way. But for projects that are relatively well-defined, I’ve found it’s worth spending some time upfront with pen and paper, specifying what you want the agent to do before writing it all out.

From there, most development becomes an iterative process of improving the md files and adjusting tools when needed.

What is a tool?

A tool gives the agent a capability. It executes an action and returns a result — calling an API, running code, querying a database, and so on.

The key idea is that tools are deterministic: given the same input, they produce the same output. So if I ask, “Can you check whether this is an antibody?”, the agent will always reach for the same tool — `execute_run_anarci()` — and get the same result.

A tool can be an MCP server or simply a Python function; what matters is that it gives the agent a reliable way to perform a specific action. Both work.

For example, I implemented execute_anarci_number() as a Python function — a thin wrapper around ANARCI — and it returns a structured JSON output with the results and the execution status. All the tools follow the same general structure, which makes them easier for the agent to use consistently.

The signature and docstring are really all the agent needs to decide when to reach for it:

def execute_anarci_number(sequence: str, chain_name: str = "Chain") -> dict: """Identify and number an antibody/TCR sequence using ANARCI. Returns chain type, species, numbering, and whether it's a valid antibody. Chain types: H=Heavy, K=Kappa light, L=Lambda light, A=TCR-alpha, B=TCR-beta """

The function itself is simple: it runs ANARCI, parses the numbering, extracts the CDRs, and checks whether the input looks like a real, complete variable domain. Instead of returning a bare error when numbering fails, the tool returns a structured verdict the agent can reason about:

# numbering failed → the sequence just isn't an antibody (not a tool error)
return { "success": True, "chain_name": chain_name, "is_antibody": False, "is_tcr": False, "chain_type": None, "species": None, "message": "ANARCI could not number this sequence. " "It is likely not an antibody or TCR variable domain.", "sequence_length": len(sequence), }

One thing I found useful is having tools return an explicit verdict, not just output, so the agent knows whether it received an answer, encountered an error, or was given an invalid input.

A few things that helped:

Use the agent itself to help write the tools. It’s good at it, especially if you give Claude documentation for any software libraries, APIs, or SDKs you’re wrapping.
Don’t forget to document the tool in the markdown workflow file so the agent knows it exists and when to use it.
Open a fresh session and check the agent can actually call the tools correctly before building on top of them.

What is a skill?

Skills extend Claude with procedural knowledge. They teach the agent how to perform a task, not just what tools are available.

I think of tools as capabilities and skills as workflows. Tools let the agent do something; skills tell it how to approach a task. A tool might tell Claude how to number an antibody sequence. A skill tells it how to carry out an antibody analysis workflow: which tools to use, in what order, what outputs to expect, and how to interpret the results.

Without skills, the model has to rediscover that workflow from scratch each time. Skills package it once and make it reusable.

A skill is just a folder containing a SKILL.md file (instructions plus metadata) and optional scripts or reference material. One nice advantage is portability: because a skill is just a folder of markdown and scripts, you can write it once and reuse it across different projects, environments, and even different agent frameworks.

To make it concrete, here’s one of mine: ab-diversity-select. After an optimization run, I’m left with dozens of candidate antibodies and need to select a small, maximally diverse subset where the retained mutations remain structurally safe. Rather than re-explaining that workflow every time, I captured it as a skill:

ab-diversity-select/ ├── SKILL.md # when to use it + the procedure ├── structural_pipeline.py ├── pipeline.py └── config_template.py

The SKILL.md header tells Claude when the skill is relevant:

name: ab-diversity-select description: >- Select a structurally-validated, maximally-diverse subset of antibody candidates from a results CSV…

The rest of the file describes the procedure, while the accompanying scripts do the heavy lifting. When Claude encounters a task like “pick 20 diverse antibody candidates,” it can automatically apply my workflow instead of inventing a new selection strategy from scratch.

Practices that worked for me

There’s already a lot of useful information out there, for example:

anthropic.com/engineering

Claude Code best practices

A few things I’d highlight:

Keep the markdown files organized. `CLAUDE.md` is loaded every session, so only put things in it that apply broadly. For domain-specific knowledge or workflows that are only relevant sometimes, use skills instead. There’s no required format for `CLAUDE.md`; just keep it short and human-readable. Mine roughly covers: setup & environment, architecture & code map, and failure handling.

Use subagents to protect the context. Once the basic agent is working, most improvements come from managing context effectively. Subagents run in their own context with their own set of allowed tools. They’re useful for subtasks that require a lot of context. For example, summarizing a paper. In practice, though, I mostly used them for tools that generate large outputs, where it becomes difficult for a single agent to process everything cleanly within one context window.

I defined small operator agents that return only compact summaries. The main agent stays focused on planning and interpretation, large tool outputs stay outside its context, and cheaper, faster models handle parsing and batch work.

Prompts matter — a lot. Performance changes significantly depending on the prompt. From my experience, when building longer workflows, improving the prompt often helps more than editing the markdown files.

For example, explicitly defining the expected output format and level of detail can reduce lazy behaviour and make the agent more consistent across runs.

One approach I like is building a skill that interviews the user up front about the information you care about using the built-in `AskUserQuestion` tool, and then generates the prompt from the user’s answers in a structured way.

Use the agent to explain its own failures. The agent is actually pretty good at explaining where it failed and why. Use it to help debug and improve itself. Ask it what went wrong, have it suggest edits to the markdown files, or ask what it learned during the session. Some of my best improvements came from just asking the agent why a run failed.

A few bio-specific lessons

First, watch the jargon and define your terms. “Diverse” might mean sequence distance, V-gene spread, or structural diversity. Say exactly what you mean, or define it explicitly in your workflow files.

Second, the agent will always give you an answer, so make sure it is grounded in tools rather than invented. A language model can easily produce a confident, plausible-looking sequence or numbering out of thin air. If you do not explicitly tell the agent to use the available tools, it may continue without them, even when they exist.

Finally, keep a human in the loop. Read the logs yourself, understand what happened, and do not trust a clean-looking summary on its own. Ask the agent to explain each step and justify its decisions — that is often the fastest way to catch a wrong assumption before it ends up in your results.

Agents are surprisingly capable, but I still found it challenging to get them to reliably execute long workflows without intervention. In practice, I had the most success when treating the agent as a collaborator rather than a fully autonomous system, giving it clear tools, workflows, and checkpoints along the way.

Building agents is still a fast-moving area, and there are many ways to approach it. It can feel confusing at first, but once you start experimenting and building real projects, things become much clearer. My advice would be to start simple, build something useful, and learn by doing.

References:
1. https://code.claude.com/
2. https://code.claude.com/docs/en/agent-sdk/modifying-system-prompts
3. https://youtu.be/TqC1qOfiVcQ?si=K24t3oxuHgYWs375
4. https://www.aiwithamitay.com/p/skills

Networks beyond proteins: a Lake Como summer school

My DPhil uses network representations of protein complexes to predict drug targets, so when a summer school on complex networks came up, I wanted to see what tools and ideas from the broader field I might be missing. The Lake Como School on Complex Networks brought together students and postdocs from universities around the world to discuss recent applications and future possibilities using networks. This was the school’s 10-year anniversary, so we were honoured to have many of the lectures given by founding members of the society.

Continue reading →

How Unusual Is Your Generated Molecule? Let The CCDC Tell You

In this post I’ll walk through how to set up the CCDC Python API and use the CSD Geometry Analyser to evaluate the geometric quality of molecules from three representative structure-based de novo design models. I’ve put together a small GitHub repo with the full analysis code where we look at bond lengths, angles, torsions, and ring conformations across the three methods, and compare these against their PoseBusters validity scores to see what each metric is really capturing.

Continue reading →

Peering Inside the Black Box: A Beginner’s Introduction to Mechanistic Interpretability

Over the last few years, large language models (LLMs) have gone from being curiosities tucked away in research labs to something most of us interact with on a daily basis; whether for drafting emails, debugging code, or simply pondering the meaning of life at 2am. And yet, for all our reliance on these systems, a rather inconvenient truth lingers in the background: nobody, not even the people who built them, can fully explain what is going on inside.

This is where mechanistic interpretability comes in.

In essence, mechanistic interpretability is the approach of explaining complex machine learning systems through the behaviour of their functional units (Kästner and Crook, 2024) by reverse-engineering them into their more elementary computations (Rai et al., 2025). The aim is not simply to know that a model gives the right answer, but to pull apart the underlying machinery and uncover the causal relationships between input and output. Think of it as neuroscience for neural networks, except we can read every neuron at any moment, rewind, replay, and intervene mid-thought.

Continue reading →

A timeline of sampling methods of diffusion models

When approaching the methods used in de-novo protein design, one is quickly confronted with a plethora of overlapping formulations of what looks superficially like “the same thing”. One paper trains an $\boldsymbol{\epsilon}$ -prediction network with a simple MSE loss; another trains a score network with a stochastic-differential-equation justification; a third trains a clean-data predictor under yet another schedule. Each formulation carries its own notation, its own variance schedule, and its own sampler. Qualitatively, this zoo of formulations is doing the same thing: it starts from some unstructured noise and iteratively refines it to eventually produce a protein structure similar (but different!) to other proteins we have experimentally determined in the past. What is not immediately obvious to a newcomer is that all of these formulations are historical descendants of a small number of foundational ideas, and that essentially every architectural and algorithmic decision in a modern protein-design diffusion model has a specific paper of origin and a specific motivation for being there.

This post is my attempt to put these formulations onto a single timeline. I trace the trajectory of the field through four foundational works: DDPM (Ho et al., 2020), DDIM (Song et al., 2021a), the score-based SDE unification (Song et al., 2021b), and EDM (Karras et al., 2022), explaining at each step what specific problem with the previous formulation the next paper was attacking and how the new formulation generalises or simplifies the old one. The goal is coherent motivation rather than exhaustive coverage; the reader interested in implementation details is referred to the original papers and the references at the end.

Continue reading →

Spin Lattices and Proteins – How state-based discretisations have enabled modern protein modelling

I got into protein modelling not long before AlphaFold2 first released. At that time some of the prevailing methods for protein structure prediction came from highly interpretable energy functionals that arose from a particularly beautiful intersection of statistical mechanics and biology. These “Potts” models are going to be the centre of a larger discussion in this blog on state-based discretisations of proteins, how they’ve shaped modern deep learning methods and whether there is still more to learn from them.

In the age of black box deep learning, does the Potts model still have a place?

The Potts/Ising Model

The Ising model is a well established popular theoretical physics model of ferromagnetism. Simply put, given a lattice of atoms each capable of adopting 1 of 2 spins (up and down) ferromagnetism arises when their spins align and their associated magnetic moments point in the same direction. The Ising model tries to parameterise the local and non-local relationships between atoms and their spin states such that we can learn the Hamiltonian of the system and its different configurations under the magnetic field. The Hamiltonian takes the following form for a system of N atoms

$$
E = -\sum_{i}^Nh_ix_i – \sum_{i<j}^N J_{ij}x_i x_j,
$$

where J is the “coupling energy” between any two atoms x_i and x_j, and h represents the magnetic field, or more appropriately for our purposes it can be framed as a single-site field dictating how an individual atom independently acts within the model. You might recognise the form this binary spin model takes as it arises naturally across the sciences including in Hopfield networks and graphical models.

Everything is an Ising-like model if you’re brave enough

Continue reading →

Will TurboQuant save us from the RAM apocalypse?

The LLM boom is causing a global shortage of the very same computer memory it needs to sustain itself. Reports suggest OpenAI’s Stargate project alone could consume up to 40% of global DRAM output. Frontier labs like Google DeepMind need to make their models more memory-efficient.

One such technique is TurboQuant, released by Google. TurboQuant is an example of an online “quantisation” method. LLMs represent information using large tensors of numerical values, where each number typically uses 64 or 32 bits. However, many values do not require full numerical precision, so we can “round” them using fewer bits and less memory. We can see this in the example below:

The rounded value now requires 4x less memory. Source

Some quantisation methods are applied offline before inference begins. TurboQuant is ‘online’ because it compresses the KV cache dynamically during inference.

Continue reading →

The Open Immune Window: Notes on Sweaty Workouts and Vanishing Immune Cells

Here is a question for you: is an intense, sweaty workout in the gym building up your immune health, or is it just opening a window of opportunity for a pathogen to ruin your week? To understand this, we first have to look at energy. The immune system is incredibly energy-hungry, constantly patrolling and repairing the body. When you exercise hard, your body is forced into a rapid game of resource allocation, diverting precious energy away from baseline functions to fuel your contracting muscles.

This brings us to a rather scary observation in sports science that I stumbled on one day reading random headlines. If you draw blood one to two hours after a hard run or heavy exertion, your immune cell count (specifically lymphocytes) absolutely plummets. Apparently for decades, scientists looked at this massive drop in the blood and concluded that our immune system temporarily crashed after exercise, leaving an “open window” of 3 to 72 hours where we were highly vulnerable to infections. Which leads us back to the main question – is a hard workout actually making you sick?

Thankfully, no. It turns out those missing immune cells didn’t just die off. Driven by the acute spike in adrenaline from your workout, those cells rapidly exit your bloodstream and migrate directly into peripheral tissues, specifically mucosal barriers like your lungs and gut. Think about it: during a hard workout, you are hyperventilating and exposing your airway to massive amounts of external air. Your body isn’t suppressing its defenses; it’s actively deploying its best troops exactly where a pathogen is most likely to enter. It is a state of heightened immune surveillance, not suppression.

So why do athletes often get the sniffles after a big race? Often, it is just non-infectious airway inflammation from heavy breathing, combined with the psychological stress and lack of sleep that accompany big events. Your workout actually acts as a natural immune adjuvant, making you more resilient. If you want to dive deeper into this topic, I highly recommend checking out the paper Debunking the Myth of Exercise-Induced Immune Suppression by Campbell and Turner (Frontiers in Immunology, 2018).

Revealing Nature’s Quantum Compass – Kickoff Day

Yesterday marked the kickoff for the BBSRC’s funded Strategic Longer and Larger (sLoLa) scheme “Revealing Nature’s Quantum Compass”¹. The sLoLa grants are a laudable endeavor by the UK government to fund “ambitious research projects that will deepen our understanding of life’s most fundamental processes”. It is wonderful to see the UK government taking seriously the importance of blue sky basic research, appreciating that asking deep questions is what drives scientific progress, often leading to unexpected breakthroughs with application down the line.

At the kickoff event, principal investigators presented on what their research can bring to the table. Much like entering a bakery² where everything smells delicious and it seems impossible to choose, an overwhelming range of experimental and computational techniques were presented, each bringing to bear their own unique approach to tackling the outstanding problem: mechanistically, how is that birds (and other animals) can navigate distances up to thousands of kilometers using the Earth’s magnetic field. Alongside this, my own group is interested in how we can develop biotechnologies that take advantage of magnetic field sensitive biochemistry, which has a host of applications near and long term.

The challenge of linking the biochemistry of a single protein known to be magnetic field sensitive to a behavioral phenotype will require a highly interdisciplinary approach, and excitingly for this community, machine learning is being involved from the start. Prof. Degiacomi, a member of the core team, presented how his lab is developing ML techniques to reduce the computational burden of linking experimental results to protein dynamics informed by molecular dynamics simulation. On the flip-side, I hope such techniques will develop into methods we can use for design. Similar to enzymes, the proteins we are interested have a function depending on mechanisms far more complex than only structure and binding (not to trivialize either of these!). Magnetic field sensing in this context depends on creating an environment in which quantum entanglement can exist, and being able to transduce the state of this quantum entanglement into into a biological signal – thus far this second step in particular has remained highly elusive.

Ultimately, the day concluded with much enthusiasm and excitement for all that is to come. Watch this space!

https://www.ox.ac.uk/news/2025-11-19-new-project-aims-reveal-nature-s-quantum-compass ↩︎
Yes, I just returned from a symposium in Germany ↩︎

Oxford Protein Informatics Group

or "OPIG" to friends

SAbDab2: The structural antibody database in the age of machine learning

The Structural Antibody Database

Building an Agent – Practical Notes for Beginners

How to start?

What is a tool?

What is a skill?

Practices that worked for me

A few bio-specific lessons

Networks beyond proteins: a Lake Como summer school

How Unusual Is Your Generated Molecule? Let The CCDC Tell You

Peering Inside the Black Box: A Beginner’s Introduction to Mechanistic Interpretability

A timeline of sampling methods of diffusion models

Spin Lattices and Proteins – How state-based discretisations have enabled modern protein modelling

The Potts/Ising Model

Everything is an Ising-like model if you’re brave enough

Will TurboQuant save us from the RAM apocalypse?

The Open Immune Window: Notes on Sweaty Workouts and Vanishing Immune Cells

Revealing Nature’s Quantum Compass – Kickoff Day