Author Archives: Bernhard Knapp

Do we need the constant regions of Antibodies and T-cell receptors in molecular simulations?

At this week’s journal club I presented my latest results on the effect of the constant regions of antibodies (ABs) and T-cell receptors (TCRs) on the dynamics of the overall system. Not including constant regions in such simulations is a commonly used simplification that is found throughout the literature. This is mainly due a massive saving in computational runtime as illustrated below: cutConstRegions

The constant regions contain about 210 residues but an additional speed up comes from the much smaller solvation box. If a cubic solvation box is used then the effect is even more severe:

waterbathBut the question is: “Is is OK to remove the constant regions of an AB or TCR and simulate without them?”.

Using replica simulations we found that simulations with and without constant regions lead to (on average) significantly different results. The detail of our analysis will soon be submitted to a scientific journal. The current working title is “Why constant regions are essential in antibody and T-cell receptor Molecular Dynamics simulations”.

ERC starting grant mock interview

Yesterday’s group meeting got transformed into a mock interview for the final evaluation step of my ERC starting grant application. To be successful with an ERC starting grant application one has to pass three evaluation steps.

The first step consists of a short research proposal (max 5 pages), CV, successful previous grant writing, early achievements track record, and a publication list. If a panel of reviewers (usually around 4-6) decides that this is “excellent” (this word is the main evaluation criterion) then the application is transferred to step two.

In step two a full scientific proposal is evaluated. The unfair procedure is that if step one is not successful then the full proposal is not even read (although it had to be submitted together with step one).

Fortunately, my proposal passed step one and step two. The final hurdle will be a 10 minutes interview + 15 minutes questions in Brussels where the final decision will take place.

I already had one mock interview with some of the 2020 research fellows (thanks to Konrad, Remi, and Laurel), one with David Gavaghan, and the third one took place yesterday with our whole research group.

After those three mock interviews I hope to be properly prepared for the real interview!

Hierachical Natural Move Monte Carlo using MOSAICS

After having recently published a large scale Molecular Dynamics simulations project of TCRpMHC [1,2] interaction I have extended my research to another technique of spatial sampling. At this week’s group meeting I presented the first results of my first MOSAICS [3] project.

The MOSAICS package is a software that allows for so called hierarchical natural Monte Carlo moves. That means that the user can specify regions in the protein of interest. These regions are indented to reflect “natural” sets of atoms and are expected to move together. An example would be a stable alpha-helix. “Hierarchical” means that region can be grouped together to super-regions. For example a helix that is broken by a kink [4] in its middle could have a region for the helix parts on both sides of the kink as well as for the overall helix. An example for peptide/MHC is illustrated below.


MOSAICS uses Monte Carlo moves to rearrange these region with respect to each other. A stochastic chain closure algorithm ensures that no chain breaks occur. An example of such movements in comparison to classical all-atom Molecular Dynamics is shown below.


In this study we used MOSAICS to simulate the detachment of peptides from MHCs for experimentally known binder and non-binder. An example of such a detaching peptide is shown below


Our results show that experimentally known non-binding peptides detach significantly faster from MHC than experimentally known binding peptides (results to be reported soon).

As a first conclusion of this project:
After having worked with both MOSAICS and Molecular Dynamics simulations, I think that both techniques have their advantages and disadvantages. They are summarized below:


Which technique should be chosen for which project depends mainly on what the aims of these projects are. If large moves of well defined segments are expected then MOSAICS might be the method of choice. If the aim is to investigate fine changes and detailed dynamics Molecular Dynamics simulations might be the better choice.


1.    Knapp B, Demharter S, Esmaielbeiki R, Deane CM (2015) Current Status and Future Challenges in T-cell receptor / peptide / MHC Molecular Dynamics Simulations. Brief Bioinform accepted.
2.    Knapp B, Dunbar J, Deane CM (2014) Large Scale Characterization of the LC13 TCR and HLA-B8 Structural Landscape in Reaction to 172 Altered Peptide Ligands: A Molecular Dynamics Simulation Study. PLoS Comput Biol 10: e1003748.
3.    Sim AY, Levitt M, Minary P (2012) Modeling and design by hierarchical natural moves. Proc Natl Acad Sci U S A 109: 2890-2895.
4.    Wilman HR, Ebejer JP, Shi J, Deane CM, Knapp B (2014) Crowdsourcing yields a new standard for kinks in protein helices. J Chem Inf Model 54: 2585-2593.

How to (not) perform a Molecular Dynamics simulation study

At this week’s group meeting I presented my recently published paper:

Knapp B, Dunbar J, Deane CM. Large scale characterization of the LC13 TCR and HLA-B8 structural landscape in reaction to 172 altered peptide ligands: a molecular dynamics simulation study. PLoS Comput Biol. 2014 Aug 7;10(8):e1003748. doi: 10.1371/journal.pcbi.1003748. 2014.

This paper was presented on the front page of PLoS Computational Biology:


The paper describes Molecular Dynamics simulations (MD) of T-cell receptor (TCR) / peptide / MHC complexes.

MD simulation calculate the movement of atoms of a given system over time. The zoomed-in movements of the TCR regions (CDRs) recognizing a peptide/MHC are shown here:

Often scientists perform MD simulations of two highly similar complexes with different immunological outcome. An example is the Epstein-Barr Virus peptide FLRGRAYGL which leads to induction of an immune response when presented by HLA-B*08:01 to the LC 13 TCR. If position 7 of the peptide is mutated to A then the peptide does not induce an immune reaction any-more.
In both simulations the TCR and MHC are identical only the peptide differs in one amino acid. Once the simulations of those two complexes are done one could for example investigate the root mean square fluctuation of both peptides to investigate if one of them is more flexible. And indeed this is the case:


Another type of analysis would be the solvent accessible surface area of the peptide or, as shown below, the CDR regions of the TCR beta chain:


… and again it is obvious that those two simulations strongly differ from each other.

Is it then a good idea to publish an article about these findings and state something like “Our results indicate that peptide flexibility and CDR solvent accessible surface area play an important in the activation of a T cell” ?

As we know from our statistics courses a 1 vs 1 comparison is an anecdotic example but those often do not hold true for larger sample sizes.

So, let’s try it with many more peptides and simulations (performing these simulations took almost 6 months on the Oxford Advanced Research Computing facility). We split the simulations in two categories, those which are more immunogenic and those which are less immunogenic:


Let’s see what happens to the difference in RMSF that we found before:


So much for “More immunogenic peptides are more/less flexible as less immunogenic ones” …

How about the SASA that we “found” before:


… again almost perfectly identical for larger sample sizes.

And yes, we have found some minor differences between more and less immunogenic peptides that hold true for larger sample sizes but non of them was striking enough to predict peptide immunogenicity based on MD simulations.

What one really learns from this study is that you should not compare 1 MD simulation against 1 other MD simulation as this will almost certainly lead to false positive findings. Exactly the same applies for experimental data such as x-ray structures because this is a statistical problem rather than a prediction based on. If you want to make sustainable claims about something that you found then you will need a large sample size and a proper statistical sample size estimation and power analysis (as done in clinical studies) before you run any simulations. Comparing structures is always massive multiple testing and therefore high sample sizes are needed to draw valid conclusions.

link pdf

Journal club (Bernhard Knapp): MMPBSA Binding Free Energy Calculations

This week’s topic of the Journalclub was about Molecular Mechanics Poisson−Boltzmann Surface Area (MMPBSA) binding free energy calculations between ligand and receptor using Molecular Dynamics simultions (MD). As an example I selected:

David W. Wright, Benjamin A. Hall, Owain A. Kenway, Shantenu Jha, and Peter V. Coveney. Computing Clinically Relevant Binding Free Energies of HIV-1 Protease Inhibitors. J Chem Theory Comput. Mar 11, 2014; 10(3): 1228–1241

The first question is: Why do we need such rather complex and computationally expensive approaches if other (e.g. empirical) scoring functions can do similar things? The main challenges thereby is that simple scoring functions often do not work very well for systems where they were not calibrated on (e.g. Knapp et al. 2009 ( The reasons for that are manifold. MD-based approaches can improve two major limitations of classical docking/scoring functions:

1) Proteins are not static. Ligand as well as receptor can undergo various slightly different configurations even for one binding site. Therefore the view of scoring one ligand configuration against one receptor configuration is not the whole picture. The first improvement is to consider a lot of different configurations for one position score of the ligand:


2) A more physics based scoring function can be more reliable than a simple and run-time efficient scoring function. On the basis of the MD simulations a variety of different terms can be deduced. These include:


– MM stands for Molecular Mechanics. It’s internal energy includes bond stretch, bend, and torsion. The electrostatic part is calculated using a Coulomb potential while the Van der Waals term is calculated using a Lennard-Jones potential.
– PB stands for Poisson−Boltzmann. It covers the polar solvation part i.e. the electrostatic free energy of solvation.
– SA stands for Surface Area. It covers the non-polar solvation part via a surface tension weighted solvent accessible surface area calculation.
– TS stands for the entropy loss of the system. This term is necessary because the non-polar solvation incorporates an estimate of the entropy changes implicitly but does not account for an entropy change upon receptor/ligand formation in vacuo. This term is calculated on the basis of a normal mode analysis.

If all these terms are calculated for each single frame of the MD simulations and those single values are averaged an estimate of the binding free energy of the complex can be obtained. However, this estimate might not represent the actual mean of the spatial distribution. Therefore at least 50 replica MD simulations are needed per investigated complex. In this aspect replica means an identically parameterized simultion of the same complex where only the inital forces are assinged randomly.

On the basis of the described MMPBSA-TS approach in combination with 50 replicas the authors achieve a reasonable correlation (0.63) for the 9 FDA-approved HIV-1
protease inhibitors with know experimental binding affinities. If the two largest complexes are excluded the correlation improves to an excellent value (0.93).

In a current study we are using the same methodology for peptide/MHC interactions. This system is completely different from the protease inhibitor study of Wright et al.: The ligands are peptides and the binding site is a groove consisting of two alpha-helices. The methods was applied as it is (without calibration or any kind of training). Prelimiary data still shows a high correlation with experimental values for the peptide/MHC system. This indicates that this MMPBSA approach can yield reliable predictions for very different systems without further modification.


Journal club (Bernhard Knapp): NetMHCIIpan-3.0

This week’s topic of the Journalclub was about the prediction of potential T cell epitopes (=prediction of the binding affinity between peptides and MHC). In this context I presented the latest paper in the NetMHCpan series:

Karosiene E, Rasmussen M, Blicher T, Lund O, Buus S, Nielsen M. NetMHCIIpan-3.0, a common pan-specific MHC class II prediction method including all three human MHC class II isotypes, HLA-DR, HLA-DP and HLA-DQ. Immunogenetics. 2013 Oct;65(10):711-24

The reliable prediction of the peptide/MHC binding affinity is already a scientific aim for several years: Early approaches were based on the concept of anchor residues i.e. those peptide side-chain which are pointing in the direction of the MHC. The next “generation” of methods was matrix based i.e. it was simply counted how likely it is that a specific residue is at peptide position 1, 2, 3, … for binding and non-binding peptides of a specific MHC allele. In the next step methods started to incorporate higher order approaches i.e. positions within the peptide influence each other (e.g. SVM, ANN, …). While such methods are already quite reliable their major limitation is that they can only be trained on the basis of sufficient experimental binding data per allele. Therefore a prediction for alleles with only few experimental peptide binding affinities is not possible or rather poor in quality. This is especially import because there are several thousand HLA alleles: The latest version of the IPD – IMGT/HLA lists for example 1740 HLA Class I A and 2329 HLA Class I B proteins ( Some of them are very frequent and others are not but this the aim would be a 100% coverage.

Pan specific approaches go one step further and allow the binding prediction between peptide and an arbitrary MHC allele for which the sequence is known. The NetMHCpan approach is one of them and is based on the extraction of a pseudo sequence (those MHC residues which are in close proximity to the peptide) per MHC allele. Subsequently this MHC pseudo sequence as well as the peptide sequences are encoded using some algorithm e.g. a BLOSUM matrix. Finally the encoded sequences and the experimental binding affinities are fed into an Artificial Neural Network (ANN). For an illustration of the workflow see:

NetMHCpan workflow

This approach turned out to be quite successful and (not too much) depending on the allele the area under ROC reaches in average a healthy 0.8x number which is already quite competitive with experimental approaches.

The netMHCpan series has progressed over the last years starting to cover MHC class I in humans (2007) and beyond (2009). Then MHC class II DR (2010) and in the latest version also DP and DQ (2013). With the next paper expected to be about non human MHC class II alleles the prediction of the binding affinity between peptides and pretty much every MHC should be possible.

Progress report: Molecular Dynamics simulations of TCRpMHC

Last time I gave a presentation about different papers describing Molecular Dynamics simulations of TCRpMHC ( This time I extended this to more of my own work. The focus of this talk was about the motivation why to choose a certain system and which requirements if must fulfíl for reliable conclusions. In short they are:

1) A larger number (>100) of experimental values (e.g. some kind of binding, immunogenicity, etc) of a certain TCRpMHC system. These values should be provided by:
– the same group
– the same HLA/MHC batches
– the same binding conditions
– in the ideal case in the same manuscript

The values should NOT be from a database since different experimental groups come to extremely different results for the same complexes as several examples show e.g.:
The binding IC50 affinity values of PKYVKQNTLKLAT to DRB1*01:01 range from 1nM and 600nM (IEDB entries).

2) Structural data exactly matching the systematic experimental data mentioned above. If multiple structures are involved they should be published by:

– the same group (in the ideal case even published together with the above mentioned data by the same group)
– be contemporary


Data which fulfil the above mentioned criteria is quite hard to find since most biologists are mainly interested in some fancy, however, rather anecdotal examples and rarely in systematic data production.

Once one has found such a system a large number of Molecular Dynamics simulations can be performed which will yield systematic differences not just differences originating from random events or data bias.



Journalclub: Molecular Dynamics simulations of TCRpMHC


T cells recognize fragments of pathogen (peptides) presented by the Major Histocompatibility Complex (MHC) via their T-cell receptor (TCR). This interaction process is commonly considered as one of the most important events taken place in the adaptive immune reaction.


Molecular Dynamics simulations are a computational technique to simulate the movement of atoms over time. For this purpose the interaction energies (bond and non-bond) between the single atoms are calculated and the spatial position are adjusted during each iteration. Such simulations are very resource and time consuming but provide insights into interaction processes which can not be obtained by any currently available experimental technique.

In this journal club we discussed 3 different papers dealing with MD simulations of the TCRpMHC complex:

A typical story

Epitope Flexibility and Dynamic Footprint Revealed by Molecular Dynamics of a pMHC-TCR Complex
Reboul et al., Plos Comp. Biol. 2012

Like similarly done by many other authors before Reboul et al. performed MD simulations of two different (however very similar MHCs) in complex with the same viral peptide. While no immune reaction is caused if the peptide is presented by HLA-B*3501 there is an reaction induced if presented in the context of HLA-B*3508.

In their MD simulations the authors find minor differences in the RMSF and claim this to be systematic and the cause for the different behaviour.

An innovative story

Toward an atomistic understanding of the immune synapse: Large-scale molecular dynamics simulation of a membrane embedded TCR–pMHC–CD4 complex
Wan et al., Molecular Immunology 2008

While several PDB structures of parts of the core of the immunological synapse are available (see image below). On overall structure was not published before this paper. This is addressed by the authors by means of superimposition, modelling of linking and trans-membrane regions, and subsequent MD simulation. The resulting structure seems to be in good agreement with experimental electron microscopy data.


My story

Early relaxation dynamics in the LC 13 T cell receptor in reaction to 172 altered peptide ligands: A molecular dynamics simulation study
Knapp et al., Plos One 2013

In most studies authors compare the same MHC but with two or three different peptides or the same peptide bound to 2 MHCs. In some cases also the same peptide and MHC are simulated in interaction with 2 different TCRs. Given the fact that the TCRpMHC consists of roughly 800 AAs one will almost certainly find some differences between those two or three simulations (multiple testing). Differences would also be present if one simulates the same complex twice with different starting velocities or more extreme even if one parametrizes the same velocities but different hardware is used. Yes, also in this case this may lead to slightly different results. On this basis such studies (if published without further experimental data to undermine the findings) are at best anecdotal stories.

Therefore we indented to address this challenge in a more systematic way: We simulated the LC 13 TCR / HLA-B*08:01 system in complex with all possible single point mutations in the EBV peptide FLRGRAYGL. This leads to a total of 172 highly related MD simulations where for each of them the experimental immunogenicity is known. Based on their immunogencity we assigned each simulations to either the more immunogenic (moreI) or less immunogenic (lessI) group. This was repeated for several thresholds.  Further analysis on the basis of RMSD maps and permutation tests showed that moreI and lessI groups were significantly different in their initial relaxation dynamics from the (perturbed) x-ray structure.


They were not only significantly different but they also showed a quite interesting pattern in their most frequently different regions (highlighted in green):