Monthly Archives: August 2025

ISMB/ECCB conference feedback 

The ISMB/ECCB conference took place in Liverpool this year. So, a couple of OPIGlets took the train up north to attend this biyearly joint conference. Here we will give some general feedback on the conference and highlight some interesting talks/posters. 

General feedback 

ISMB/ECCB is a 4.5 day conference starting on the Sunday evening and running until Thursday evening. The conference is attended by around 2500 people, mostly from academic groups around the world. With more than 20 different tracks, it is a broad conference with lots of tracks happening at the same time. As always, it is thus recommended to have a look at the schedule beforehand to not get too overwhelmed. Each day there is one keynote, two poster sessions, and three blocks of talks. These talks are often given by PIs, but also PostDocs and PhD students get the opportunity to present. There are also some smaller slots for highlighting posters which are presented that day. 

This year there was a very interesting line-up of Distinguished Keynote speakers. The conference was kicked off by John Jumper talking about AlphaFold2, with a focus on how the team went about the various problems during the process of going from the initial AlphaFold model to AlphaFold2. On Monday Prof. Amos Bairoch talked about biocuration and importance and challenges of public databases. He discussed the FAIR principles for Findable, Accessible, Interoperable, and Reusable for data management [1]. The next Keynote was by Prof. James Zou about computational biology in the age of AI agents (later more). On Wednesday we had our own Prof. Charlotte Deane (woo!) talking about structure-based drug discovery with a focus on the importance of baselines and benchmarking. The conference was ended by a short interview with Prof. David Baker, followed by a talk from Prof. Fabian Theis on decoding cellular systems. He discussed Cellflow [2], an AI tool that predicts how perturbations like drugs effect the cellular phenotype. 

Continue reading

How reliable are affinity datasets in practice?

The Data Bottleneck in AI-Powered Drug Discovery

The pharmaceutical industry is undergoing a profound transformation, driven by the promise of Artificial Intelligence (AI) and Machine Learning (ML). These technologies offer the potential to escape the industry’s persistent challenges of high costs, protracted development timelines, and staggering failure rates. From accelerating the identification of novel biological targets to optimizing the properties of lead compounds, AI is poised to enhance the precision and efficiency of drug discovery at nearly every stage

Yet, this revolutionary potential is constrained by a fundamental dependency. The power of modern AI, particularly the deep learning (DL) models that excel at complex pattern recognition, is directly proportional to the volume, diversity, and quality of the data they are trained on. This creates a critical bottleneck: the high-quality experimental data required to train these models—specifically, the protein-ligand binding affinity values that quantify the strength of an interaction—are notoriously scarce, expensive to generate, and often of inconsistent quality or locked within proprietary databases.

Continue reading

Conference feedback: Protein Society Annual Symposium

Recently, a couple of OPIG members had the opportunity to attend and present at the 39th Annual Symposium of the Protein Society—a not-for-profit scholarly society founded in 1985 that focuses on protein structure, function, and design—held in San Francisco.

The PS39 schedule was well designed, offering a balance between plenary talks, themed parallel sessions, and networking opportunities. A wide range of topics was covered, including transient protein states, supramolecular assemblies, proteostasis, and circadian clocks. This allowed us to follow areas of personal interest, both related and unrelated to our research, while exploring unfamiliar fields. Although many talks were biology-heavy, they were generally pitched at an accessible level for those from other disciplines (ie. the small molecules side of OPIG). Presentations almost always included results from both in silico and experimental approaches, with relatively few focusing exclusively on one or the other; a very nifty thing to see as people who mostly just dream of experimental validation! In contrast to our generalisable-model-focus, many of the researchers presenting had dedicated years to studying a single protein or system, uncovering its nuances in a way that made for some neat storytelling.

Continue reading

GPT-5 achieves state-of-the-art chemical intelligence

I have run ChemIQ (our chemical reasoning benchmark) on GPT-5. The model achieves state-of-the-art performance with substantial improvements in the ability to interpret SMILES strings. Read my analysis and initial findings below. Scroll to the end for some cool demos.

Figure 1: Success rates for each model on the ChemIQ reasoning benchmark. Horizontal brackets between adjacent bars indicate the result of a two-tailed McNemar’s test comparing paired outcomes for the same questions. Significance levels are shown as: n.s. (not significant, p ≥ 0.05), * (p < 0.05), ** (p < 0.01), and *** (p < 0.001).

Continue reading

Taming the Trajectory Beast: A Simpler Way to Sample Your MD Simulations

If you’ve ever run a molecular dynamics (MD) simulation, you know the feeling. You spend days, weeks, or even months of precious compute time watching your favourite molecule wiggle and jiggle. The result? A trajectory file bursting with thousands, or even millions, of frames. It’s a treasure trove of data, but it’s also a monster…

Analyzing every single frame is often impossible and, let’s be honest, usually pointless. Many adjacent frames are nearly identical. What we really want are the key representative structures that capture the important shapes, or conformations, your molecule adopted. So, how do we find them?

Continue reading