ISMB/ECCB conference feedback

The ISMB/ECCB conference took place in Liverpool this year. So, a couple of OPIGlets took the train up north to attend this biyearly joint conference. Here we will give some general feedback on the conference and highlight some interesting talks/posters.

General feedback

ISMB/ECCB is a 4.5 day conference starting on the Sunday evening and running until Thursday evening. The conference is attended by around 2500 people, mostly from academic groups around the world. With more than 20 different tracks, it is a broad conference with lots of tracks happening at the same time. As always, it is thus recommended to have a look at the schedule beforehand to not get too overwhelmed. Each day there is one keynote, two poster sessions, and three blocks of talks. These talks are often given by PIs, but also PostDocs and PhD students get the opportunity to present. There are also some smaller slots for highlighting posters which are presented that day.

This year there was a very interesting line-up of Distinguished Keynote speakers. The conference was kicked off by John Jumper talking about AlphaFold2, with a focus on how the team went about the various problems during the process of going from the initial AlphaFold model to AlphaFold2. On Monday Prof. Amos Bairoch talked about biocuration and importance and challenges of public databases. He discussed the FAIR principles for Findable, Accessible, Interoperable, and Reusable for data management [1]. The next Keynote was by Prof. James Zou about computational biology in the age of AI agents (later more). On Wednesday we had our own Prof. Charlotte Deane (woo!) talking about structure-based drug discovery with a focus on the importance of baselines and benchmarking. The conference was ended by a short interview with Prof. David Baker, followed by a talk from Prof. Fabian Theis on decoding cellular systems. He discussed Cellflow [2], an AI tool that predicts how perturbations like drugs effect the cellular phenotype.

Although numerous topics were discussed throughout the conference, a main focus was on the use of protein/RNA/DNA/single cell LM as ‘foundation models’. There was also a lot of chat about agents, and a fair bit of flow matching. Below we will highlight some specific talks/posters that we particularly enjoyed!

PPI Networks

Sara Rescalli from Sorbonne University presented an interesting poster on a computational approach to reconstruct PPI networks in experimental challenging biological systems. As it is unpublished work, I don’t want to go into detail but will discuss the tool SENSE-PPI [3] from their lab which was the base of the workflow.

Protein-protein interaction (PPI) networks are important for the interpretation of protein functions and understanding cellular systems. SENSE-PPI is a sequence-based deep learning model able to identify PPIs among tens of thousands of proteins. The model takes ESM2 embeddings as input in a Siamese architecture. Siamese models consist of two identical modules that share the weights but take different inputs. The two outputs are combined to output a single value. The architecture is chosen here for the commutative property. Trained on various datasets accuracy and generalisability of the model is demonstrated.

In the work presented during the conference, the model was applied on a specific biological system. By using SENSE-PPI and further filtering such as, functional enrichment analysis and mutational analysis, they identified a potential target and interface which was then experimentally confirmed.

A (Bio)-computational perspective on protein folding, function and evolution

Diego U. Ferreiro from the University of Buenos Aires was an invited speaker, presenting on the topic of the theoretical biophysics of biopolymers and how knowledge of folding can help us in the search for life across the universe. This presentation is a culmination of his recent works on folding and functionality from a biophysical basis. Diego is a very engaging speaker and managed to cram a lot into a short 20-minute time slot.

Natural proteins are a class of biopolymers that make up all life on Earth. Proteins are essentially machines made of jelly; the functionality of these machines is dictated by their 3D structure and are coded for by a 1D sequence of amino acids. Legacy views on this topic suggest that information flows from the gene -> peptide sequence. However, more recent perspectives suggest that the complete picture is much more complex, involving information flowing in both directions. This is because, as the number of interaction-types scales with the square root of the alphabet size, standard amino acid sequences with an alphabet of 20 only barely contain enough information to code for the correct structure in addition to functionality with just 4 interaction-types.

The sequence-structure coding assumption is imperative to modern protein workflows, often incorporating evolutionary history. Evolutionary history is what has enabled transformative models such as AlphaFold and leverages the principle that the mutational landscape over time optimises for fitness. The problem is that evolutionary fitness is not something that is isolated to the protein itself. Proteins exist as part of a concoction of small molecules, lipids and other biomolecules; selection pressure is an information constraint that flows backwards towards the gene sequence itself. This can be quantified by measuring the evolutionary stability against the physical stability of a given mutation. When this analysis is performed on systems that have had deep mutational scanning experiments applied, Diego and his team found there is a large proportion of folding energy that is missing from the evolutionary history, hence dubbed “dark energy”. This dark energy is essentially an energy-scaled measure of the functional constraints during evolution for protein systems.

While from a theoretical basis, dark energy in protein folding landscapes is a real quantity and we can use this information to translate protein biophysics to other conceptually similar biopolymers. Given the vast potential for molecular complexity across the universe, we need a new definition of possible life-coding biopolymers. Uber-proteins are biopolymers that use an alphabet of building blocks with ~4 interaction-types.

While uber-proteins could have a different alphabet of amino acids, this is dictated by the local solution environment itself. In the search for life across the universe, sufficient and adequate solvent is a primary indicator. The question Diego therefore raises is: what is the possibility for life and uber-proteins to exist in other abundant solvents? To do this, Diego leverages the concept of dark energy in protein folding to split evolutionary pressure into two components: folding and functional. Combining this with a base evolutionary rate, these quantities can be compared across the liquid range possible for each solvent. Plotting this across existing solvents known to astrochemistry Diego shows that there are multiple solvents that could plausibly support life. Initially some of these solvents score similarly to water but when adding in other terms such as complexity or abundance, water quickly regains the top ranking, but many solvents still emerge as plausible options.

This talk underscores the idea that solvents and other interactions are an integral part of protein biophysics and cannot be ignored in the search for new protein designs as well as the search for life across the universe.

General small molecules content

Overall, the focus of the conference felt pretty far from small molecule drugs, but there were still a few highlight talks. Dongmin Bang from Seoul National University presented two cool pieces of work: the first, MixingDTA [4], aimed to confront the issue of data sparsity in drug-target affinity (DTA) prediction. This framework introduces a data augmentation strategy named GBA-Mixup, which interpolates the embeddings of neighbouring molecules and proteins based on the “Guilt-By-Association” principle. This technique effectively generates biologically plausible data points to improve model performance in sparse regions of the DTA chemical space. His second presentation, on ADME-Drug-Likeness [5], described a method to enrich Molecular Foundation Models (MFMs) for better classification of viable drug candidates. By employing sequential multi-task learning that enforces the natural A→D→M→E pharmacokinetic cascade, the model learns to encode essential pharmacokinetic properties directly into the molecular embeddings.

Focusing on research infrastructure, Johannes Kersting of the Technical University of Munich detailed a Nextflow pipeline [6] designed to systematise disease module identification and drug repurposing. This workflow automates the execution of six different module detection algorithms, manages their software dependencies, standardises outputs into formats like BioPAX, and integrates tools for the topological and biological validation of the identified network modules, thereby enhancing the rigour and reproducibility of systems medicine research. Finally, Ankit from the Indian Institute of Technology Palakkad addressed a long-standing bottleneck in virtual screening: the computational complexity of 3D graph kernels. His work introduced an efficient 3D kernel (3DGHK) [7] that integrates critical 3D structural information—such as bond lengths, bond angles, and torsion angles—while achieving a substantial reduction in computational cost compared to state-of-the-art methods. This aims to offer a scalable solution for large-scale molecular property prediction.

Written by Henriette, Alexi and Lucy

References

[1] Jacobsen, A., de Miranda Azevedo, R., Juty, N., Batista, D., Coles, S., Cornet, R., … & Schultes, E. (2020). FAIR principles: interpretations and implementation considerations. Data intelligence, 2(1-2), 10-29.

[2] Klein, D., Fleck, J. S., Bobrovskiy, D., Zimmermann, L., Becker, S., Palma, A., … & Theis, F. J. (2025). CellFlow enables generative single-cell phenotype modeling with flow matching. bioRxiv, 2025-04.

[3] Volzhenin, K., Bittner, L., & Carbone, A. (2024). SENSE-PPI reconstructs interactomes within, across, and between species at the genome scale. Iscience, 27(7).

[4] Bang, D., Lee, S., Lee, D., & Kim, S. (2024). MixingDTA: Improved Drug-Target Affinity Prediction by Extending Mixup with Guilt-By-Association. Bioinformatics.

[5] Bang, D., Seo, J., Lee, D., & Kim, S. (2024). ADME-Drug-Likeness: Enriching Molecular Foundation Models via Pharmacokinetics-Guided Multi-Task Learning for Drug-likeness Prediction. Bioinformatics.

[6] Kersting, J.; Manz, Q.; Aguirre-Plans, J.; Bûcheron, C.; Spindler, L. M.; Pock, T.; Delgado-Chaves, F. M.; Guney, E.; List, M. A Nextflow Pipeline for Network-Based Disease Module Identification and Validation. In RExPO24 Conference; REPO4EU, 2024.

[7] Ankit, Singh, R., & Raman, S. (2024). Efficient 3D kernels for molecular property prediction. Bioinformatics.

Oxford Protein Informatics Group

or "OPIG" to friends

Authors