GPT-5 achieves state-of-the-art chemical intelligence

I have run ChemIQ (our chemical reasoning benchmark) on GPT-5. The model achieves state-of-the-art performance with substantial improvements in the ability to interpret SMILES strings. Read my analysis and initial findings below. Scroll to the end for some cool demos.

Figure 1: Success rates for each model on the ChemIQ reasoning benchmark. Horizontal brackets between adjacent bars indicate the result of a two-tailed McNemar’s test comparing paired outcomes for the same questions. Significance levels are shown as: n.s. (not significant, p ≥ 0.05), * (p < 0.05), ** (p < 0.01), and *** (p < 0.001).

Continue reading

Taming the Trajectory Beast: A Simpler Way to Sample Your MD Simulations

If you’ve ever run a molecular dynamics (MD) simulation, you know the feeling. You spend days, weeks, or even months of precious compute time watching your favourite molecule wiggle and jiggle. The result? A trajectory file bursting with thousands, or even millions, of frames. It’s a treasure trove of data, but it’s also a monster…

Analyzing every single frame is often impossible and, let’s be honest, usually pointless. Many adjacent frames are nearly identical. What we really want are the key representative structures that capture the important shapes, or conformations, your molecule adopted. So, how do we find them?

Continue reading

Antibody developability datasets

Next to binding the antigen with high affinity, antibodies for therapeutic purposes need to be developable. These developability properties includes high expression, high stability, low aggregation, low immunogenicity, and low non-specificity [1]. These properties are often linked and therefore optimising for one property might be at the expense of another. Machine learning methods have been build to guide the optimistation process of one or multiple developability properties.

Performance of these methods is often limited by the amount and type of data available for training. These dataset contain experimental determined scores of biophysical assays related to developability. Some common experimental assays are described in a previous blog post by Matthew Raybould [2]. Here I will discuss some (commonly) used and new dataset related to antibody developability. This list is not exhaustive but might help you start understanding more about antibody developability.

Continue reading

Publishing 101

Scientists pride themselves on clear, logical and concise communication. So naturally, the process for publishing our research involves an absurd number of formalities, like coming up with 700 slightly different ways to ‘thank the reviewer for their insightful comment’. Nevertheless, I’m told this is all a necessary part of spreading your beautiful researcher butterfly wings—and frankly, I’m enough years into my DPhil to stop questioning every quirk of academia. However, the current protocol for new researchers wanting to learn the moves to this bizarre dance seems to be begging postdocs/ old timers for examples of cover letters, marked-up manuscripts, and reviewer responses. To attempt to save everyone some time, I thought I’d provide some guidance and templates here.

Continue reading

A more robust way to split data for protein-ligand tasks?

As I was recently reading through the paper on the PLINDER dataset while preparing for my next project, one of the aspects of the dataset that caught my attention was how the dataset splits were done to ensure minimal leakage for various protein-ligand tasks that PLINDER could be used for. They had task-specific splits as the notion of data leakage differed from task to task. For instance, in rigid body docking, having a similar protein in the train and test may not be considered leakage if the binding pocket location, conformation, or pocket interactions with a ligand are significantly different. On the other hand, in the case of co-folding, having similar proteins in the train and test sets would be considered data leakage, as predicted protein structures play a significant role in accuracy scoring. The effort that went into creating task-specific splits resonates strongly with OPIG’s view on ensuring minimal data leakage for validating the generalisability of protein-ligand models. However, it may become tedious to create task-specific dataset splits for every protein-ligand task when dealing with a large suite of such tasks. This had me thinking of potential avenues to streamline the dataset split process across the tasks, and one way to do this is by using protein-ligand interaction fingerprints or PLIFs.

Continue reading

GUI Slop

Previously, I wrote about writing GUI’s for controlling and monitoring experiments. For ML this might be useful for tracking model learning (e.g. the popular weights and biases platform), while in the wet-lab it is great for making experiments simpler and more reliable to run, monitor and record.

And as it turns out, AI is quite good at this!

I have been using VSCode CoPilot in agent mode with Gemini 2.5 Pro to create simple GUIs that can control my experiments, which has proved pretty effective. Although there is clearly a concern when interfacing AI generated code with real hardware (especially if you “vibe code”, that is, just run whatever it generates) in practice it has allowed me to quickly generate tools for testing purposes, cutting the time required for getting a project started from hours to minutes.

As an example, I recently needed to hook up a Helmholtz coil to some custom electronics, centred around a Teensy micro-controller and designed to output a precisely controlled current.

Continue reading

Can AI help us design better viruses?

Viruses are the most abundant biological entity on the planet. They infect virtually every kind of life form including (sort of) other viruses. Viruses are intensely efficient – some viruses contain as few as 4 genes. Their strategy is typically simple: infect a cell, use its machinery to produce more viruses, and spread to other cells.

Pathogenic human viruses are terrible, but there are many other viruses which are useful for humans. For instance, many modern vaccines use viral vectors to produce antigens of other pathogenic entities. There is also growing interest in using viruses to fight off bacterial infections.

Continue reading

Le Tour de Farce v12.0

Bikes and pints across 5 pubs – what could be better (and what could go wrong). The year is 2025 and the date 06.06.25. Starting from the Stats department the customary picture was taken before the horn blown and a flood of ~30 structural biologists was unleashed onto the streets of Oxford to raid and plunder. Despite being the new kid I think Fergus will be proud to see my accurate version controlling unlike past more experienced members of the group, and that this reference doesn’t seem like copying his homework too much.

Of course even though it was June the weather was tumultuous. Having to make an educated guess on the probability of experiencing rain I took a Bayesian approach to calculate the posterior of rain occurring given the data of the entirety of British history which suggested that despite seeing sun on the BBC weather report that did not in anyway improve the likelihood of there being later rain. In light of this everyone came aptly dressed in waterproofs which turned out to be a smart choice after a later event of spontaneous beer spillage where a certain individual knocked his entire pint over Sophie and proceeded to say “at least you were wearing a raincoat”. This was a fantastic play by the newest member of the group who destroyed what little dignity (if any) he had so far amassed and simultaneously embroiled himself in the responsibility of this blog post. So to Charlotte who I know will be reading this (as I was warned!) perhaps this blog post will be an adequate first step to redemption.

And so the convoy departed towards our first stop, the Up in Arms (thanks Charlotte for the round). The inaugural table tennis tournament was held and it was great to see a real world application of the groups protein folding experience with Odysseus’s portable bike.

Next stop, the Victoria (thanks Matt for the round), before the 3.5 mile cycle to The Plough (I recommend going to the toilet before this after drinking units in the metric of pints).

Being far removed from our hunter gatherer past we settled down on the crisp summer grass with Oxford’s famous White Rabbit pizza delivered directly to the local meadow. I hadn’t grounded myself and connected to the earth like that in months (preferring to spend my days with my quadruple monitor workstation setup in the department) which combined with the beautiful settings of port meadow was making the trees look huggable. After scavenging 4 more pieces of pizza for a profit of 50% on my original contribution despite my intolerance to onions – whilst arguing that tolerance is a mental game aided by alcoholic bravery – we walked down the field to the river to reach our final destination – the idyllic medley looking over the Thames.

Reaching our last stop it dawned on me that despite proclaiming an ambitious target of 2 pints per pub I was sitting well below that at 3 pints total. It was clear desperate actions were needed to raise my average to stand up to any later scrutiny. Perhaps it was this subconscious desire to complete my self-assigned quest that at this last point of interest I executed the “swill Sophie” manoeuvre. Yet, despite my insistence that by getting through 2 pint glasses this was “technically” equivalent two my 2 pints per pub target, this did not stand up to the scrutiny of Charlotte.

After a month of wrangling with HPC molecular dynamics I’ve been getting more contact with the Slurm e-mail notification service than real human beings so it was refreshing to escape the GROMACS simulation that my brain has become and get to know the group better. Yet by the end of the night some of us (myself) couldn’t resist entering a tirade about how fractals and symmetry is the underlying representation of consciousness with the source being a strong “trust me bro”, and so it seemed liked a fitting time to put myself to bed.

Thanks to Eoin for organising!