Author Archives: James Dunbar

Journal Club: AbDesign. An algorithm for combinatorial backbone design guided by natural conformations and sequences

Computational protein design methods often use a known molecule with a well-characterised structure as a template or scaffold. The chosen scaffold is modified so that its function (e.g. what it binds) is repurposed. Ideally, one wants to be confident that the expressed protein’s structure is going to be the same as the designed conformation. Therefore, successful designed proteins tend to be rigid, formed of collections of regular secondary structure (e.g. α-helices and β-sheets) and have active site shapes that do not perturb far from the scaffold’s backbone conformation (see this review).

A recent paper (Lapidoth et al 2015) from the Fleishman group proposes a new protocol to incorporate backbone variation (read loop conformations) into computational protein design (Figure 1). Using an antibody as the chosen scaffold, their approach aims to design a molecule that binds a specific patch (epitope) on a target molecule (antigen).


Figure 1 from Lapidoth et al 2015 shows an overview of the AbDesign protocol

Protein design works in the opposite direction to structure prediction. i.e. given a structure tell me what sequence will allow me to achieve that shape and to bind a particular patch in the way I have chosen. To do this one first needs to select a shape that could feasibly be achieved in vivo. We would hope that if a backbone conformation has previously been seen in the Protein Data Bank that it is one of such a set of feasible shapes.

Lapidoth et al sample conformations by constructing a backbone torsion angle database derived from known antibody structures from the PDB. From the work of North et al and others we also know that certain loop shapes can be achieved with multiple different sequences (see KK’s recent post). The authors therefore reduce the number of possible backbone conformations by clustering them by structural similarity. Each conformational cluster is represented by a representative and a position specific substitution matrix (PSSM). The PSSM represents how the sequence can vary whilst maintaining the shape.

The Rosetta design pipeline that follows uses the pre-computed torsion database to make a scaffold antibody structure (1x9q) adopt different backbone conformations. Proposed sequence mutations are sampled from the corresponding PSSM for the conformation. Shapes and the sequences that can adopt them, are ranked with respect to a docked pose with the antigen using several structure-based filters and Rosetta energy scores. A trade off is made between predicted binding and stability energies using a ‘fuzzy logic’ scheme.

After several rounds of optimisation the pipeline produces a predicted structure and sequence that should bind the chosen epitope patch and fold to form a stable protein when expressed. The benchmark results show promise in terms of structural similarity to known molecules that bind the same site (polar interactions, buried surface area). Sequence similarity between the predicted and known binders is perhaps lower than expected. However, as different natural antibody molecules can bind the same antigen, convergence between a ‘correct’ design and the known binder may not be guaranteed anyway.

In conclusion, my take home message from this paper is that to sensibly sample backbone conformations for protein design use the variation seen in known structures. The method presented demonstrates a way of predicting more structurally diverse designs and sampling the sequences that will allow the protein to adopt these shapes.  Although, as the authors highlight, it is difficult to assess the performance of the protocol without experimental validation, important lessons can be learned for computational design of both antibodies and general proteins.

Journal Club: Large-scale analysis of somatic hypermutations

This week I presented a paper by Burkovitz et al from Bar Ilan University in Israel.  The study investigates the mutations that occur in B-cell maturation and how the propensity for a change to be selected is affected by where in the antibody structure it is located. It nicely combines analysis of both DNA and amino-acid sequence with structural considerations to inform conclusions about how in vivo affinity maturation occurs.

Before being exposed to an antigen, an antibody has a sequence determined by a combination of genes (V and J for the light chain; V, D and J for the heavy chain). Once exposed, B-cells (the cells that produce antibodies), undergo somatic hyper-mutation (SHM) to optimise the antibody-antigen (ab-ag) interaction. These mutations are commonly thought to be promoted at activation-induced deaminase (AID) hotspots.

The authors’ first finding is that the locations of SHMs do not correlate well with the positions of AID hotspots and that the distribution of their distance to a hotspot is not much different to that of the background distribution. They conclude that although perhaps a mechanism to promote mutation, AID hotspots are not a strong factor that indicate whether a mutation will fix.

Motivated to find other determinants for SHM preferences, the study turns to examining structural features and energetics of the molecules. SHMs are found to be more prevalent on the VH domain of an Fv than the VL. However, when present, the energetic importance of an SHM is not related to the domain it is on. In contrast, the contribution an SHM makes to the binding energy is related to its structural location. As one might perhaps expect, those SHMs in positions that can make contact with the antigen have more affect than those that do not. Consideration of their propensity instead of raw frequency also shows that SHMs are more prevalent in antibody-antigen interfaces than in the rest of the molecule. However, they are also likely to occur in the VH-VL interface suggesting an importance for this region in fine-tuning the geometry and flexibility of the binding site.


Figure taken from Burkovitz et al shows a) the location of different structural regions on the Fv b) the energetic contribution of the SHMs in each region c) the fraction of SHMs in the regions and their relative size d) the propensity for an SHM to occur in each of the five structural regions.

Perhaps the most interesting result of this study is the authors’ conclusions about the propensity of SHMs to mutate germline residues to particular amino-acids. It is found that whilst germline amino-acid usage in binding sites is distinctive from other protein-protein interfaces, the residue profiles of SHMs are less diverged. They therefore act to bring the properties ab-ag interaction towards those seen in normal interactions. This may suggest, as proposed by other studies, that the somatic hyper-mutation process is similar to mutation properties observed in evolution. In addition, it is found that five amino-acids, asparagine, arginine, serine, threonine and aspartic acid are the most common substitutions made in SHM. Finally, positions where SHMs most often have an important effect on binding energy are presented. These positions, and the amino-acid preferences provide promising targets for use in rational antibody design procedures.

[Database] SAbDab – the Structural Antibody Database

An increasing proportion of our research at OPIG is about the structure and function of antibodiesCompared to other types of proteins, there is a large number of antibody structures publicly available in the PDB (approximately 1.8% of structures contain an antibody chain). For those of us working in the fields of antibody structure prediction, antibody-antigen docking and structure-based methods for therapeutic antibody design, this is great news!

However, we find that these data are not in a standard format with respect to antibody nomenclature. For instance, which chains are “heavy” chains and which are “light“? Which heavy and light chains pair? Is there an antigen present? If so, to which H-L pair does it bind to? Which numbering system is used … etc.

To address this problem, we have developed SAbDab: the Structural Antibody Database. Its primary aim is for easy creation of antibody structure and antibody-antigen complex datasets for further analysis by researchers such as ourselves. These sets can be selected using a number of criteria (e.g. experimental method, species, presence of constant domains…) and redundancy filters can be applied over the sequences of both the antibody and antigen. Thanks to Jin, SAbDab now also includes associated curated affinity (Kd) values for around 190 antibody-antigen complexes. We hope this will serve as a benchmarking tool for antibody-antigen docking prediction algorithms.


Alternatively, the database can be used to inspect and compare properties of individual structures. For instance, we have recently published a method to characterise the orientation between the two antibody variable domains, VH and VL. Using the ABangle tool, users can select structures with a particular VH-VL orientation, visualise and quantify conformational changes (e.g. between bound and unbound forms) and inspect the pose of structures with certain amino acids at specific positions. Similarly, the CDR (complimentary determining region) search and clustering tools, allow for the antibody hyper-variable loops to be selected by length, type and canonical class and their structures visualised or downloaded.



SAbDab also contains features such as the template search. This allows a user to submit the sequence of either an antibody heavy or light chain (or both) and to find structures in the database that may offer good templates to use in a homology modelling protocol. Specific regions of the antibody can be isolated so that structures with a high sequence identity over, for example, the CDR H3 loop can be found. SAbDab’s weekly automatic updates ensures that it contains the latest available data. Using each method of selection, the structure, a standardised and re-numbered version of the structure, and a summary file containing information about the antibody, can be downloaded both individually or en-masse as a dataset. SAbDab will continue to develop with new tools and features and is freely available at:

Journal Club: Protein structure model refinement using fragment-guided MD

For this week’s journal club I presented this paper by Jian et al from the Zhang Lab. The paper tackles the problem of refining protein structure models using molecular dynamics (MD).

The most successful protocols for building protein structure models have been template-based methods. These involve selecting whole or parts of known protein structures and assembling them to form an initial model of the target sequence of interest. This initial model can then be altered or refined to (hopefully) achieve higher accuracy. One method to make these adjustments is to use molecular dynamics simulations to sample different conformations of the structure. A “refined” model can then be taken as a low-energy state that the simulation converges to. However, whilst physics-based potentials are effective for certain aspects of refinement (e.g. relieving clashes between side chain atoms), the task of actually improving overall model quality has so far proved to be too ambitious. 

The Method

In this paper entitled “Atomic-Level Protein Structure Refinement Using Fragment-Guided Molecular Dynamics Conformation Sampling,” Jian et al demonstrate that current MD refinement methods have little guarantee of making any improvement to the accuracy of the model. They therefore introduce a technique of supplementing physics-based potentials with knowledge about fragments of structures that are similar to the protein of interest.

The method works by using the initial model to search for similar structures within the PDB.  These are found using two regimes. The first is to search for global templates by assessing the TMscore of structures to the whole initial model. The second is to search for fragments of structures by dividing the initial model into continuous 3 secondary structure elements. From these sets of templates and the initial model, the authors can generate a bespoke potential for the model based on the distances between Cα atoms. By doing this, additional information about the likely global topology of the protein can be incorporated into a molecular dynamics simulation. The authors claim that this enables the MD energy landscape is therefore reshaped from being “golf-course-like” being “funnel-like”.  Essentially, the MD simulations are guided to sample conformations which are likely (as informed by the fragments) to be close to the target protein structure. 


A schematic of the FG-MD refinement procedure

 Does it work?

As a full solution to the problem of protein structure model refinement, the results are far from convincing. Quality measures show improvement in only the second or third decimal place from the initial model to the refined model. Also, as might be expected, the degree to which the model quality is improved is dependent on the accuracy of the initial of the model.

However, what is important about this paper is that, although small, the improvements made do exist in a systematic fashion. Previously, attempts to refine a model using MD not only failed to improve its accuracy but would be likely to reduce its quality. Fragment-guided MD (FG-MD) and the explicit inclusion of a hydrogen bonding potential, is not only able to improve the conformations of side chains but also improve (or at least not destroy) the global backbone topology of a model.

Dependence of the energy of  a model on its TM-Score to the native structure. In black is the energy as measure using the AMBER99 energy function. In grey is the corresponding funnel-like shape of the FG-MD energy function.

Dependence of the energy of a model on its TM-Score to the native structure. In black is the energy as measure using the AMBER99 energy function. In grey is the corresponding funnel-like shape of the FG-MD energy function.

This paper therefore lays the groundwork for the development of further refinement methods that incorporate the knowledge from protein structure fragments with atomically detailed energy functions. Given that the success of the method is related to the accuracy of the initial model, there may be scope for developing similar techniques to refine models of specific proteins where modelling quality is already good. e.g. antibodies.