Category Archives: Uncategorized

Spin Lattices and Proteins – How state-based discretisations have enabled modern protein modelling

I got into protein modelling not long before AlphaFold2 first released. At that time some of the prevailing methods for protein structure prediction came from highly interpretable energy functionals that arose from a particularly beautiful intersection of statistical mechanics and biology. These “Potts” models are going to be the centre of a larger discussion in this blog on state-based discretisations of proteins, how they’ve shaped modern deep learning methods and whether there is still more to learn from them.

In the age of black box deep learning, does the Potts model still have a place?

The Potts/Ising Model

The Ising model is a well established popular theoretical physics model of ferromagnetism. Simply put, given a lattice of atoms each capable of adopting 1 of 2 spins (up and down) ferromagnetism arises when their spins align and their associated magnetic moments point in the same direction. The Ising model tries to parameterise the local and non-local relationships between atoms and their spin states such that we can learn the Hamiltonian of the system and its different configurations under the magnetic field. The Hamiltonian takes the following form for a system of N atoms


$$
E = -\sum_{i}^Nh_ix_i – \sum_{i<j}^N J_{ij}x_i x_j,
$$

where J is the “coupling energy” between any two atoms x_i and x_j, and h represents the magnetic field, or more appropriately for our purposes it can be framed as a single-site field dictating how an individual atom independently acts within the model. You might recognise the form this binary spin model takes as it arises naturally across the sciences including in Hopfield networks and graphical models.

Everything is an Ising-like model if you’re brave enough

Continue reading

The Open Immune Window: Notes on Sweaty Workouts and Vanishing Immune Cells

Here is a question for you: is an intense, sweaty workout in the gym building up your immune health, or is it just opening a window of opportunity for a pathogen to ruin your week? To understand this, we first have to look at energy. The immune system is incredibly energy-hungry, constantly patrolling and repairing the body. When you exercise hard, your body is forced into a rapid game of resource allocation, diverting precious energy away from baseline functions to fuel your contracting muscles.

This brings us to a rather scary observation in sports science that I stumbled on one day reading random headlines. If you draw blood one to two hours after a hard run or heavy exertion, your immune cell count (specifically lymphocytes) absolutely plummets. Apparently for decades, scientists looked at this massive drop in the blood and concluded that our immune system temporarily crashed after exercise, leaving an “open window” of 3 to 72 hours where we were highly vulnerable to infections. Which leads us back to the main question – is a hard workout actually making you sick?

Thankfully, no. It turns out those missing immune cells didn’t just die off. Driven by the acute spike in adrenaline from your workout, those cells rapidly exit your bloodstream and migrate directly into peripheral tissues, specifically mucosal barriers like your lungs and gut. Think about it: during a hard workout, you are hyperventilating and exposing your airway to massive amounts of external air. Your body isn’t suppressing its defenses; it’s actively deploying its best troops exactly where a pathogen is most likely to enter. It is a state of heightened immune surveillance, not suppression.

So why do athletes often get the sniffles after a big race? Often, it is just non-infectious airway inflammation from heavy breathing, combined with the psychological stress and lack of sleep that accompany big events. Your workout actually acts as a natural immune adjuvant, making you more resilient. If you want to dive deeper into this topic, I highly recommend checking out the paper Debunking the Myth of Exercise-Induced Immune Suppression by Campbell and Turner (Frontiers in Immunology, 2018).

Revealing Nature’s Quantum Compass – Kickoff Day

Yesterday marked the kickoff for the BBSRC’s funded Strategic Longer and Larger (sLoLa) scheme “Revealing Nature’s Quantum Compass”1. The sLoLa grants are a laudable endeavor by the UK government to fund “ambitious research projects that will deepen our understanding of life’s most fundamental processes”. It is wonderful to see the UK government taking seriously the importance of blue sky basic research, appreciating that asking deep questions is what drives scientific progress, often leading to unexpected breakthroughs with application down the line.

At the kickoff event, principal investigators presented on what their research can bring to the table. Much like entering a bakery2 where everything smells delicious and it seems impossible to choose, an overwhelming range of experimental and computational techniques were presented, each bringing to bear their own unique approach to tackling the outstanding problem: mechanistically, how is that birds (and other animals) can navigate distances up to thousands of kilometers using the Earth’s magnetic field. Alongside this, my own group is interested in how we can develop biotechnologies that take advantage of magnetic field sensitive biochemistry, which has a host of applications near and long term.

The challenge of linking the biochemistry of a single protein known to be magnetic field sensitive to a behavioral phenotype will require a highly interdisciplinary approach, and excitingly for this community, machine learning is being involved from the start. Prof. Degiacomi, a member of the core team, presented how his lab is developing ML techniques to reduce the computational burden of linking experimental results to protein dynamics informed by molecular dynamics simulation. On the flip-side, I hope such techniques will develop into methods we can use for design. Similar to enzymes, the proteins we are interested have a function depending on mechanisms far more complex than only structure and binding (not to trivialize either of these!). Magnetic field sensing in this context depends on creating an environment in which quantum entanglement can exist, and being able to transduce the state of this quantum entanglement into into a biological signal – thus far this second step in particular has remained highly elusive.

Ultimately, the day concluded with much enthusiasm and excitement for all that is to come. Watch this space!

  1. https://www.ox.ac.uk/news/2025-11-19-new-project-aims-reveal-nature-s-quantum-compass ↩︎
  2. Yes, I just returned from a symposium in Germany ↩︎

Testing python (or any!) command line applications

Through our work in OPIG, many of our projects come in the form of code bases written in Python. These can be many different things like databases, machine learning models, and other software tools. Often, the user interface for these tools is developed as both a web app and a command line application. Here, I will discuss one of my favourite tools for testing command-line applications: prysk!

Continue reading

Tracking the change in ML performance for popular small molecule benchmarks

The power of machine learning (ML) techniques has captivated the field of small molecule drug discovery. Increasingly, researchers and organisations have employed ML to create more accurate algorithms to improve the efficiency of the discovery process.

To be published, methods have to prove they have improved upon others. Often, methods are tested against the same benchmarks within a field, allowing us to track progress over time. To explore the rate of improvement, I curated the performance on three popular benchmarks. The first benchmark is CASF 2016, used to test the accuracy of methods that predict the binding affinity of experimental determined protein-ligand complexes. Accuracy was measured using the Pearson’s R value between predicted and experimental affinity values.

Continue reading

KAUST Computational Advances in Structural Biology

Last month, I had the privilege of being invited to the KAUST Research Conference on Computational Advances in Structural Biology, held from May 1-3, 2023. This gave me the opportunity to present some of the latest OPIG works on small molecules while visiting an exceptional campus with state-of-the-art facilities in one of those corners of the world that are not widely known. Moreover, the experience went beyond the impressive surroundings as I had the chance to attend a highly engaging conference and meet many scientists from different backgrounds.

KAUST Library (left) and Dinning Hall (right)

The conference brought together experts in the field to explore cutting-edge developments in computational structural biology. It had a primary focus on advancements in protein structure prediction, multi-scale simulations, and integrative structural biology. Cryo-electron microscopy (cryo-EM) was the most popular experimental technique, with more than a third of the talks dedicated to its applications. These talks showcased impressive examples where structure prediction, simulations, and mid-resolution cryo-EM maps were combined to construct atomic models of large macromolecular complexes.

Notable examples of integrative works were presented by Jan Kosinski and Thomas Miller, among others. Jan Kosinski shared insights into the model of the human nuclear pore complex, highlighting the integration of cryo-electron tomography (cryo-ET), prior experimental knowledge, and AlphaFold predictions. Thomas Miller, on the other hand, presented his work on EM-based visual biochemistry, which combines single-particle cryo-electron microscopy (cryo-EM), and time-resolved experiments, as a tool to study the molecular mechanisms of eukaryotic DNA replication.

There were also several talks about novel algorithms. Nazim Bouatta presented some less-known details about OpenFold and introduced some of their approaches to tackling the problem of multimer modelling. He also announced the future release of folding methods for predicting protein-ligand complexes. Jianlin Cheng presented MULTICOM, their new protein structure predictor based on consensus predictions from Alphafold. Sergei Grudinin showed deep-learning tools able to predict protein dynamics as well as some integrative modelling tools driven by low-resolution experimental observations, such as small-angle scattering.

On the cryo-EM methods side, Mikhail Kudryashev presented TomoBEAR and SUSAN, cryoEM tools developed to automatize the analysis of tomographic data. Johannes Schwab presented dynamight, a deep learning-based approach for heterogeneity analysis in single particle cryo-EM. While, on the ComChem side, Haribabu Arthanari showed their ultra-large Virtual screening platform and Jean-Louis Reymond talked about tools to enumerate, visualize and search the vast chemical space of drug-like molecules

Overall, the conference provided a quite diverse set of talks that facilitated multidisciplinary views and discussions. From protein structure prediction to integrative approaches combining experimental and computational methods, the talks showed the transformative potential of computational analysis in unravelling the complexities of biological macromolecules.

Better histograms with Python

Histograms are frequently used to visualize the distribution of a data set or to compare between multiple distributions. Python, via matplotlib.pyplot, contains convenient functions for plotting histograms; the default plots it generates, however, leave much to be desired in terms of visual appeal and clarity.

The two code blocks below generate histograms of two normally distributed sets using default matplotlib.pyplot.hist settings and then, in the second block, I add some lines to improve the data presentation. See the comments to determine what each individual line is doing.

Continue reading

COSTNET19 Conference

Last month, I attended the COSTNET19 Conference in Bilbao (Spain). This conference is organised by COSTNET, a COST Action which aims to foster international European collaboration on the emerging field of statistics of network data science. COSTNET facilitates interaction and collaboration between diverse groups of statistical network modellers, establishing a large and vibrant interconnected and inclusive community of network scientists.

Continue reading

Why you should care about startups as a researcher

I was recently awarded the EIT Health Translational Fellowship, which aims to fund DPhil projects with the goal of commercializing the research and addressing the funding gap between research and seed funding. In order to win, I had to deliver a short 5 minute startup pitch in front of a panel of investors and scientific experts to convince them that my DPhil project has impact as well as commercial viability. Besides the £5000 price, the fellowship included a week-long training course on how to improve your pitch, address pain points in your business strategy etc. I found the whole experience to be incredibly rewarding and the skills I picked up very important, even as a researcher. As a summary, this is why I think you should care about the startup world as a researcher.

Continue reading