Author Archives: Henriette Capel

Biologic Summit 2026

This year we (Fabian and Henriette) were invited to speak at the Biologic Summit. The conference took place January 20-22 in San Diego. Fabian presented his work on conformational ensembles of antibodies [1] in the “Data Strategies and the Future of AI Models” track. Henriette presented her work on LICHEN [2], a tool to generate an antibody light sequence for a specific heavy sequence, in the “ML/AI for Biologics Developability, Optimization and de novo Design” track. Below we give some general highlights of the conference, and some talks we enjoyed. We would like to thank the organisers for the opportunity to discuss our research and hear about the latest developments in harnessing ML for design and optimisation of antibodies.  

General feedback

  • Industry focused conference. Biologic Summit is strongly attended by industry, making this conference an excellent opportunity to promote your tools/databases, and to connect with companies. The conference is attended by both start-ups as big pharma companies. 
  • Medium size conference. With approximately 250 attendees, the Biologic Summit provides a good opportunity to connect with researchers from a wide range of disciplines. Held concurrently with Protein Science and Production Week (PepTalk) and sharing the same venue, the event further benefited from a diverse mix of scientific backgrounds and expertise. 
  • Panel and table discussions. Throughout the three days there where various table discussions and panel discussions organised. These are good places to learn about general interest and challenges in the field. 
  • Well-organised conference. The conference is well-organised with a clear schedule and enough breaks to recharge and connect. Most talks are scheduled for 30 minutes with around 4 talks per block.
Continue reading

Estimating uncertainty in MD observables using block averaging

When running molecular dynamics (MD) simulations, we are usually interested in measuring an ensemble average of some metric (e.g., RMSD, RMSF, radius of gyration, …) and use this to draw conclusions about the investigated system. While calculating the average value of a metric is straightforward (we can simply measure the metric in each frame and average it) calculating a statistical uncertainty is a little more tricky and often forgotten. The main challange when trying to calculate an uncertainty of MD oveservables is that individual frames of the simulation are not samped independently but they are time correlated (i.e., frame N depends on frame N-1). In this blog post, I will breifly introduce block averaging, a statistical technique to estimate uncertainty in correlated data.

Continue reading

MDAnalysis: Work with dynamics trajectories of proteins

For a long time crystallographers and subsequently the authors of AlphaFold2 had you believe that proteins are a static group of atoms written to a .pdb file. Turns out this was a HOAX. If you don’t want to miss out on the latest trend of working with dynamic structural ensembles of proteins this blog post is exactly right for you. MDAnalysis is a python package which as the name says was designed to analyse molecular dyanmics simulation and lets you work with trajectories of protein structures easily.

Continue reading

Making your figures more accessible

You might have created the most esthetic figures for your last presentation with a beautiful colour scheme, but have you considered how these might look to someone with colourblindness? Around 5% of the gerneral population suffer from some kind of color vision deficiency, so making your figures more accessible is actually quite important! There are a range of online tools that can help you create figures that look great to everyone.

Continue reading

Some useful pandas functions

Pandas is one of the most used packages for data analysis in python. The library provides functionalities that allow to perfrom complex data manipulation operations in a few lines of code. However, as the number of functions provided is huge, it is impossible to keep track of all of them. More often than we’d like to admit we end up wiriting lines and lines of code only to later on discover that the same operation can be performed with a single pandas function.

To help avoiding this problem in the future, I will run through some of my favourite pandas functions and demonstrate their use on an example data set containing information of crystal structures in the PDB.

Continue reading

Current strategies to predict structures of multiple protein conformational states

Since the release of AlphaFold2 (AF2), the problem of protein structure prediction is widely believed to be solved. Current structure prediction tools, such as AF2, are able to model most proteins with high accuracy. These methods, however, have a major limitation as they have been trained to predict a single structure for a given protein. Proteins are highly dynamic molecules, and their function often depends on transitions between several conformational states. Despite research focusing on the task of predicting the structures of multiple conformations of a protein, currently, no accurate and reliable method is available. In this blog post, I will provide a short overview of the strategies developed for predicting protein conformations. I have grouped these into three sets of related approaches. To conclude, I will also demonstrate how to run one of these strategies on your own.

Continue reading

An Overview of Clustering Algorithms

During the first 6 months of my DPhil, I worked on clustering antibodies and I thought I would share what I learned about these algorithms. Clustering is an unsupervised data analysis technique that groups a data set into subsets of similar data points. The main uses of clustering are in exploratory data analysis to find hidden patterns or data compression, e.g. when data points in a cluster can be treated as a group. Clustering algorithms have many applications in computational biology, such as clustering antibodies by structural similarity. Actually, this is objectively the most important application and I don’t see why anyone would use it for anything else.

There are several types of clustering algorithms that offer different advantages.

Continue reading