In what is becoming the highlight of the year and a regular occurrence for the OPIGlets, Le Tour de Farce – The annual OPIG bike ride, took place on the 4th of June. Now in its 2.0 revision but maintaining a route similar to last year, 9.5 miles and several pints later, approximately 20 of us took in some distinctly pretty Oxfordshire scenery, not to mention The White Hart, The Trout, Jacobs Inn and for some, The One and The Punter too.
Protein modeling is one of the most challenging problems in bioinformatics. We still lack a clear theoretical framework which would allow us to link linear protein sequence to its native 3D coordinates. Given that we only have the structures for about a promile of the known seqs, homology modeling is still one of the most successful methods to obtain a structure from a sequence. Currently, using homology modeling and the 1393 known folds we can produce models for more than half known domains. In many cases this is good enough to get an overall idea of the fold but for actual therapeutic applications, there is still a need for high-resolution modeling.
There is one group of molecules whose properties can be readily exploited via computational approaches for therapeutic applications: antibodies. With blockbuster drugs such as Humira, Avastin or Remicade, they are the leading class of biopharmaceuticals. Antibodies share a great degree of similarity with one another (<50-60% sequence identity) and there are at least 1865 antibody structures in the PDB. Therefore, homology modeling of these structures at high resolution becomes tractable, as exemplified by WAM and PIGS. Here, we will review the antibody modeling paradigm using one of the most successful antibody modeling tools, RosettaAntibody, concluding with the most recent progress from AMA II (antibody CASP).
General Antibody-antigen modeling
Modeling of antibody structures can be divided into the following steps:
- Identification of the Framework template
- Optimizing Vh/Vl orientation of the template
- Modeling of the non-H3 CDRs
- Modeling of H3
Most of the diversity of antibodies can be found in the CDRs. Therefore, the bulk of the protein can be readily copied from the framework region. This however needs to undergo an optimization of the Vh/Vl orientation. Prediction of the CDRs is more complicated since they are much more variable than the rest of the protein. Non-H3 CDRs can be modeled using canonical structure paradigms. Prediction of H3 is much more difficult since it does not appear to follow the canonical rules.
When the entire structure is assembled, it is recommended to perform refinement using some sort of relaxation of the structure, coupled with an energy function which should guide it.
RosettaAntibody protocol roughly follows this described above. In the first instance, an appropriate template is identified by highest BLAST bit scores. The best heavy and light chains aligned to the best-BLAST-scoring Fv region. The knowledge-base here is a set of 569 antibody structures form SACS with resolutions 3.5A and better. The Vh/Vl orientation is subsequently refined using local relaxation, guided by Charmm.
Non-H3 CDRs are modeled using the highest-scoring BLAST hit of the same length. Canonical information is not taken into account. Loops are grafted on the framework using the residues overlapping with the anchors.
H3 loops are modeled using a fragment based approach. The fragment library is Rosetta+H3 from the knowledge base of antibody structures created for the purpose of this study. The low-resolution search consists of Monte Carlo attempts to fit 3-residue fragments followed by Cyclic Coordinate Descent loop closure. This is followed by high resolution search when the H3 loop and Vh/Vl are repacked using a variety of moves.
Each decoy coming from the repacking is scored using Rosetta function. The lower the Rosetta score the better the decoy (according to Rosetta).
RosettaAntibody can produce high-quality models (1.4A) on its 54 structure benchmark test. The major limitation of the method (just like any other antibody modeling method) is the H3 loop modeling. It is believed that H3 is the most important loop and therefore getting this loop right is a major challenge.
Right framework and the correct orientation of Vh/Vl have a great effect on the quality of H3 predictions. When the H3 was modeled on using the correct framework, the predictions are order of magnitude better than by using the homology model. This was demonstrated using the native recovery in RosettaAntibody study as well as during ‘Step II’ of the Antibody Modeling assessment where participants were asked to model H3 using the correct framework.
This week’s topic of the Journalclub was about Molecular Mechanics Poisson−Boltzmann Surface Area (MMPBSA) binding free energy calculations between ligand and receptor using Molecular Dynamics simultions (MD). As an example I selected:
David W. Wright, Benjamin A. Hall, Owain A. Kenway, Shantenu Jha, and Peter V. Coveney. Computing Clinically Relevant Binding Free Energies of HIV-1 Protease Inhibitors. J Chem Theory Comput. Mar 11, 2014; 10(3): 1228–1241
The first question is: Why do we need such rather complex and computationally expensive approaches if other (e.g. empirical) scoring functions can do similar things? The main challenges thereby is that simple scoring functions often do not work very well for systems where they were not calibrated on (e.g. Knapp et al. 2009 (http://www.ncbi.nlm.nih.gov/pubmed/19194661)). The reasons for that are manifold. MD-based approaches can improve two major limitations of classical docking/scoring functions:
1) Proteins are not static. Ligand as well as receptor can undergo various slightly different configurations even for one binding site. Therefore the view of scoring one ligand configuration against one receptor configuration is not the whole picture. The first improvement is to consider a lot of different configurations for one position score of the ligand:
2) A more physics based scoring function can be more reliable than a simple and run-time efficient scoring function. On the basis of the MD simulations a variety of different terms can be deduced. These include:
– MM stands for Molecular Mechanics. It’s internal energy includes bond stretch, bend, and torsion. The electrostatic part is calculated using a Coulomb potential while the Van der Waals term is calculated using a Lennard-Jones potential.
– PB stands for Poisson−Boltzmann. It covers the polar solvation part i.e. the electrostatic free energy of solvation.
– SA stands for Surface Area. It covers the non-polar solvation part via a surface tension weighted solvent accessible surface area calculation.
– TS stands for the entropy loss of the system. This term is necessary because the non-polar solvation incorporates an estimate of the entropy changes implicitly but does not account for an entropy change upon receptor/ligand formation in vacuo. This term is calculated on the basis of a normal mode analysis.
If all these terms are calculated for each single frame of the MD simulations and those single values are averaged an estimate of the binding free energy of the complex can be obtained. However, this estimate might not represent the actual mean of the spatial distribution. Therefore at least 50 replica MD simulations are needed per investigated complex. In this aspect replica means an identically parameterized simultion of the same complex where only the inital forces are assinged randomly.
On the basis of the described MMPBSA-TS approach in combination with 50 replicas the authors achieve a reasonable correlation (0.63) for the 9 FDA-approved HIV-1
protease inhibitors with know experimental binding affinities. If the two largest complexes are excluded the correlation improves to an excellent value (0.93).
In a current study we are using the same methodology for peptide/MHC interactions. This system is completely different from the protease inhibitor study of Wright et al.: The ligands are peptides and the binding site is a groove consisting of two alpha-helices. The methods was applied as it is (without calibration or any kind of training). Prelimiary data still shows a high correlation with experimental values for the peptide/MHC system. This indicates that this MMPBSA approach can yield reliable predictions for very different systems without further modification.
Protein and RNA structures are built up in a hierarchical fashion: from linear chains and random coils (primary) to local substructures (secondary) that make up a subunit’s 3D geometry (tertiary) which in turn can interact with additional subunits to form homomeric or heteromeric multimers (quaternary). The metastable nature of the folded polymer enables it to carry out its function repeatedly while avoiding aggregation and degradation. These functions often rely on structural motions that involve multiple scales of conformational changes by moving residues, secondary structure elements, protein domains or even whole subunits collectively around a small set of degrees of freedom.
The modular architecture of antibodies, makes them amenable to act as an example for this phenomenon. Using MD simulations and fluorescence anisotropy experiments Kortkhonjia et al. observed that Ig domain motions in their antibody of interest were shown to correlate on two levels: 1) with laterally neighbouring Ig domains (i.e. VH with VL and CH1 with CL) and 2) with their respective Fab and Fc regions.
This begs the question: Can we exploit these molecular properties to reduce dimensionality and overcome energy barriers when sampling the functional motions of metastable proteins?
In 2012 Sim et al. have published an approach that allows for the incorporation of these collective motions (they call them “Natural Moves”) into simulation. Using simple RNA model structures they have shown that explicitly sampling large structural moves can significantly accelerate the sampling process in their Monte Carlo simulation. By gradually introducing DOFs that propagate increasingly large substructures of the molecule they managed to reduce the convergence time by several orders of magnitude. This can be ascribed to the resulting reduction of the search space that narrows down the sampling window. Instead of sampling all possible conformations that a given polynucleotide chain may take, structural states that differ from the native state predominantly in tertiary structure are explored.
It is important to stress, however, that in addition to these rigid body moves local flexibility is maintained by preserving residue level flexibility. Consequently, the authors argue, high energy barriers resulting from large structural rearrangements are reduced and the resulting energy landscape is smoothened. Therefore, entrapment in local energy minima becomes less likely and the acceptance rate of the Monte Carlo simulation is improved.
Although benchmarking of this method has mostly relied on case studies involving model RNA structures with near perfect symmetry, this method has a natural link to near-native protein structure sampling. Similarly to RNA, proteins can be decomposed into local substructures that may be responsible for the main functional motions in a given protein. However, due to the complexity of protein motion and limited experimental data we have a limited understanding of protein dynamics. This makes it a challenging task to identify suitable decompositions. As more dynamic data emerges from biophysical methods such as NMR spectroscopy and databases such as www.dynameomics.org are extended we will be able to better approximate protein motions with Natural Moves.
In conclusion, when applied to suitable systems and when used with care, there is an opportunity to breathe life into the static macromolecules of the pdb, which may help to improve our understanding of the heterogeneous structural landscape and the functional motions of metastable proteins and nanomachines.