Spin Lattices and Proteins – How state-based discretisations have enabled modern protein modelling

I got into protein modelling not long before AlphaFold2 first released. At that time some of the prevailing methods for protein structure prediction came from highly interpretable energy functionals that arose from a particularly beautiful intersection of statistical mechanics and biology. These “Potts” models are going to be the centre of a larger discussion in this blog on state-based discretisations of proteins, how they’ve shaped modern deep learning methods and whether there is still more to learn from them.

In the age of black box deep learning, does the Potts model still have a place?

The Potts/Ising Model

The Ising model is a well established popular theoretical physics model of ferromagnetism. Simply put, given a lattice of atoms each capable of adopting 1 of 2 spins (up and down) ferromagnetism arises when their spins align and their associated magnetic moments point in the same direction. The Ising model tries to parameterise the local and non-local relationships between atoms and their spin states such that we can learn the Hamiltonian of the system and its different configurations under the magnetic field. The Hamiltonian takes the following form for a system of N atoms


$$
E = -\sum_{i}^Nh_ix_i – \sum_{i<j}^N J_{ij}x_i x_j,
$$

where J is the “coupling energy” between any two atoms x_i and x_j, and h represents the magnetic field, or more appropriately for our purposes it can be framed as a single-site field dictating how an individual atom independently acts within the model. You might recognise the form this binary spin model takes as it arises naturally across the sciences including in Hopfield networks and graphical models.

Everything is an Ising-like model if you’re brave enough

One of the reasons modelling is so fun is that you get to be creative. Analogously to the famous “everything is a harmonic oscillator if you’re brave enough” here’s some examples of Ising-like state based discretisations and their applications. For example, researchers built a simplified voting model as an Ising-like ferromagnet finding naturally arising spatial segregation ([Budrikis, Z. 2024][https://www.nature.com/articles/s42254-024-00753-w]). Ising-like models have been used in financial markets typically modelling traders as spins, and plant ecosystem organisation.

Another cool application is error correction particularly in image restoration. In fact, it has been used in binary image denoising (as discussed in Pritam Chanda’s blog) and there are emerging links between diffusion on lattices and Ising models (Mei & Wu, 2023,Causer et al., 2024). In biology, Schneidman et al. proposed that eural populations have error correcting properties emergent from pairwise maximum entropy correlations equivalent to Ising models (Schneidman et al., 2006). Clearly the Ising model has surprising modelling potential across applications, it also naturally extends to proteins.

Turning Proteins into Spin Glass Lattices

So, we have a formulation that models a system as a number of discrete particles that can take 1 of 2 states, and tells us about the frustration of these particles under the influence of local and non local relationships through the single-site h and coupling parameters J respectively. For a protein made up of residues taking on 1 of 21 states (20 amino acids and a gap character for sequence alignments) we find ourselves with a Potts model for proteins.

More formally, we define the Potts model with the above Hamiltonian but over q=21 states giving a probability functional for a biological sequence \mathbf{x} of N residues under the model


$$
P(\mathbf{x}) = \frac{1}{Z}\exp\left(\sum_{i}^Nh_i(x_i) + \sum_{i<j}^N J_{ij}(x_i,x_j) \right),
$$

where Z is the partition function summing over all possible configurations. This arises from statistical mechanics, translating a system’s Hamiltonian into the Boltzmann distribution. We parameterise the above form by “solving the inverse Ising problem” using various statistical approximations from multiple sequence alignments.

For example, we can learn a Potts model for a protein given a multiple sequence alignment of its homologs where pairwise correlations between residues that mutate in sync across evolutionary history can be used with the maximum entropy approach to parameterise the J term (Morcos et al. 2011). These evolutionarily correlated residues mutate together when they’re proximal in 3D space – if a positively charged residue mutates to a negative one then stability of the fold is only maintained if contacting residues also mutate accordingly to maintain the electrostatic relationship (see below figure).

Because mutations can be correlated transitively through other correlations (green highlighted in above figure) the maximum entropy approach gives a way to disentangle those transitive contacts from real direct contacts (pink). This analysis has historically been termed Direct Coupling Analysis (DCA).

Whilst the model is deceptively simple parameters scale \mathcal{O}(qN^2) so for any reasonably sized system it becomes computationally difficult to solve without some approximation, of which a rich history of solutions exists.

In the table below I give a brief background of the Potts model in protein modelling:

YearContributionReference
1952The Potts model as a (q)-state generalisation of the Ising model.Potts, Some generalized order-disorder transformations (Cambridge University Press & Assessment)
1987Spin-glass theory applied to protein folding.Bryngelson & Wolynes, Spin glasses and the statistical mechanics of protein folding (PMC)
1994Correlated mutations in MSAs linked to residue contacts in 3D structures.Göbel, Sander, Schneider & Valencia, Correlated mutations and residue contacts in proteins (PubMed)
1999Maximum-entropy framing of correlated mutations; precursor to Potts/DCA contact inference.Lapedes et al., Correlated mutations in models of protein sequences (Project Euclid)
2009DCA using message passing to infer direct residue couplings.Weigt et al., Identification of direct residue contacts in protein–protein interaction by message passing (PNAS)
2011Mean-field DCA made coevolutionary contact prediction scalable across many families.Morcos et al., Direct-coupling analysis of residue coevolution captures native contacts across many protein families (PNAS)
20113D protein structures computed from evolutionary couplings.Marks et al., Protein 3D structure computed from evolutionary sequence variation (PLOS)
2013Pseudolikelihood inference for 21-state protein Potts models improved DCA contact prediction.Ekeberg et al., Using pseudolikelihoods to infer Potts models (PubMed)
2016Pairwise maximum-entropy/DCA methods used to infer protein interaction partners from sequence.Bitbol et al., Inferring interaction partners from protein sequences (PubMed)
2017EVmutation used Potts-style evolutionary statistical energy to predict mutational effects.Hopf et al., Mutation effects predicted from sequence co-variation (PubMed)
2021AlphaFold2’s Evoformer jointly processed MSA and pair representations for structure prediction.Jumper et al., Highly accurate protein structure prediction with AlphaFold (nature.com)

For anyone familiar with AlphaFold2’s internals you might notice a conceptual link with its Evoformer module. Indeed, whether Google DeepMind took direct inspiration from the Potts model or not, the principles of state-based discretising proteins and constructing statistical models of inter-residue relationships from sequence alignments is the same, just in a latent neural network.

We can actually draw more explicit links between attention and DCA, where researchers have recovered Potts models from simplified attention layers,
(Battacharya et al., 2022), going so far as to show that training a single layer of self-attention is equivalent to solving the inverse Potts problem by the pseudolikelihood method (Rende et al., 2023).

Although, it is well established that a transformers’ MSA row attentions outperforms Potts for unsupervised contact prediction, due in part to higher order relationships than just pairwise (Lupo et al., 2022). So if transformers made coevolution differentiable and scalable, and enabled incorporating other features like secondary structure, is there still a space for the humble Potts model in protein modelling?

Adopt an Ising-like Model!

The direct recovery of generalised Potts with attention suggests redundancy for the Potts model. However, active research is still advocating for new use cases. For example, Potts models are used as baselines in the ProteinGym mutation effects prediction benchmark, having showing good results in single-site mutation prediction (Hopf et al., 2017).

Contacts, mutation effects, pairwise residue couplings, are all key to understanding allostery. Perhaps one could even go as far as to explicitly Ising(ify) allostery as a spin system for protein conformational states modelling. For example it could enable analysis of how ligand binding impacts local fields and the signal propagates through the protein by coupled residues.

Also, the general case of spin-like state discretisations of biological systems still shows potential. One example of this I’m particularly interested in is the
WSME model which was developed for modelling protein folding as an energy landscape (Ooka et al., 2022). Later versions such as WSME-L show genuinely promising results even for larger more complex proteins (Ooka & Arai, 2023). Markov State Models are a central method in molecular dynamics analysis and have been used as discrete states in VAMPnets for modelling protein kinetics (Mardt et al., 2018).

In my opinion it’s less a case of redundancy but more asking how these models can be married with modern deep learning either as energy-based interpretations, tokenisation, statistical coarse grain priors or even neural energy functions. For example Caredda and Pagnani showed how hybrid transformer/attention DCA leads to a simpler lower param count interpretable energy function for contact maps and even used it as an basis for a generative model (Carreda and Pagnani, 2025).

This hybrid approach is perhaps the most exciting frontier. For example, TERMinator is a neural network that outputs a Potts-like Hamiltonian over discrete states (Li et al., 2022). Personally, I will always advocate for more ways to embed physics into and improve the interpretability of deep learning models.

Author