Antibody modeling via AMA II and RosettaAntibody

Intro

Protein modeling is one of the most challenging problems in bioinformatics. We still lack a clear theoretical framework which would allow us to link linear protein sequence to its native 3D coordinates. Given that we only have the structures for about a promile of the known seqs, homology modeling is still one of the most successful methods to obtain a structure from a sequence. Currently, using homology modeling and the 1393 known folds we can produce models for more than half known domains. In many cases this is good enough to get an overall idea of the fold but for actual therapeutic applications, there is still a need for high-resolution modeling.

There is one group of molecules whose properties can be readily exploited via computational approaches for therapeutic applications: antibodies.  With blockbuster drugs such as Humira, Avastin or Remicade, they are the leading class of biopharmaceuticals. Antibodies share a great degree of similarity with one another (<50-60% sequence identity) and there are at least 1865 antibody structures in the PDB. Therefore, homology modeling of these structures at high resolution becomes tractable, as exemplified by WAM and PIGS. Here, we will review the antibody modeling paradigm using one of the most successful antibody modeling tools, RosettaAntibody, concluding with the most recent progress from AMA II (antibody CASP).

General Antibody-antigen modeling

Modeling of antibody structures can be divided into the following steps:

  1. Identification of the Framework template
  2. Optimizing Vh/Vl orientation of the template
  3. Modeling of the non-H3 CDRs
  4. Modeling of H3

Most of the diversity of antibodies can be found in the CDRs. Therefore, the bulk of the protein can be readily copied from the framework region. This however needs to undergo an optimization of the Vh/Vl orientation. Prediction of the CDRs is more complicated since they are much more variable than the rest of the protein. Non-H3 CDRs can be modeled using canonical structure paradigms. Prediction of H3 is much more difficult since it does not appear to follow the canonical rules.

When the entire structure is assembled, it is recommended to perform refinement using some sort of relaxation of the structure, coupled with an energy function which should guide it.

RosettaAntibody

RosettaAntibody protocol roughly follows this described above. In the first instance, an appropriate template is identified by highest BLAST bit scores. The best heavy and light chains aligned to the best-BLAST-scoring Fv region. The knowledge-base here is a set of 569 antibody structures form SACS with resolutions 3.5A and better. The Vh/Vl orientation is subsequently refined using local relaxation, guided by Charmm.

Non-H3 CDRs are modeled using the highest-scoring BLAST hit of the same length. Canonical information is not taken into account. Loops are grafted on the framework using the residues overlapping with the anchors.

H3 loops are modeled using a fragment based approach. The fragment library is Rosetta+H3 from the knowledge base of antibody structures created for the purpose of this study. The low-resolution search consists of Monte Carlo attempts to fit 3-residue fragments followed by Cyclic Coordinate Descent loop closure. This is followed by high resolution search when the H3 loop and Vh/Vl are repacked using a variety of moves.

Each decoy coming from the repacking is scored using Rosetta function. The lower the Rosetta score the better the decoy (according to Rosetta).

Results

RosettaAntibody can produce high-quality models (1.4A) on its 54 structure benchmark test. The major limitation of the method (just like any other antibody modeling method) is the H3 loop modeling. It is believed that H3 is the most important loop and therefore getting this loop right is a major challenge.

Right framework and the correct orientation of Vh/Vl have a great effect on the quality of H3 predictions. When the H3 was modeled on using the correct framework, the predictions are order of magnitude better than by using the homology model. This was demonstrated using the native recovery in RosettaAntibody study as well as during ‘Step II’ of the Antibody Modeling assessment where participants were asked to model H3 using the correct framework.

Author