Monthly Archives: March 2014

Structural Biology Module @ the DTC

As part of the DTC Structural Biology module (Feb 2014), first year phD students were given 3 days to answer one of several questions from fields within structural biology. The format had to be an automated presentation, and it had to be ENTERTAINING.

Video 1: Is Your Ligand Really There?

The pilot episode of the award-winning series “Protein Hour”…

Video 2: Protein-Protein Docking

Do not attempt to spoof “The Matrix” – That is impossible…

Video 3: Are Membrane Proteins Special?

An appeal from “Protein Relief 2014″…

Video 4: Structure-based and fragment-based drug design – do they really work?

Is stop-motion animation the next blockbuster in drug design?

Journal Club: Native contacts in protein folding

Like your good old headphone cables, strings of amino acids have the potential to fold into a vast number of different conformations given the appropriate conditions. A conservative estimation for the time it would take a 100 residue protein to explore all theoretically possible conformations would exceed the age of the Universe several times. This is obviously not feasible and was pointed out by Levinthal when he published his “How To Fold Graciously” in 1969.

The so called Protein-Folding Problem has since been under intense study, which inevitably has led to a few theories and models about its nature. Due to the lack of appropriate wet-lab methods to study this phenomenon theoretical, computational approaches have been key to devising impactful frameworks for formally describing protein folding. One of these goes under the name of principle of minimum frustration introduced by Bryngelson and Wolynes in the late 80s (1). It states that proteins by evolution were enriched for sequences with the propensity to fold into low-energy structures, while actively selecting against traps. By avoiding mis-folding and non-native contacts, the theory says, a smooth funnel-like energy landscape with native-state minima is created that ensures robust and fast folding.

This implies that native contacts, i.e. residues that interact in the fully folded protein play a major role in the folding process. Gō models (2), named after Nobuhiro Gō who first proposed this method, are based around this assumption with the energetic contributions of native interactions acting as the sole driving forces in the folding process. While this approach has yielded promising results, many of which were in concordance with experiments, its underlying principles have never been validated in a statistically meaningful way.

native contact schematic

A schematic for native-contact-driven protein folding

In 2013 a study by Best, Hummer and Eaton (3) formally addressed this question. By devising a set of statistical quantities aimed at weighting the importance of native and non-native interactions for folding and applying these to the analysis of several long MD folding simulations they were able to show a “native-centric mechanism” for small fast-folding proteins.

In a first step it was assessed whether the fraction of native contacts  provided a suitable reaction coordinate for the simulated folding events. From their equilibrium simulations two thresholds of native-contact-fractions  were chosen that defined folded and unfolded states (a two-state model is assumed). Overlaying the values for the most visited native-contact-fractions during simulation against these thresholds revealed a strong correlation between the two equilibrium probability density maxima and the protein’s fold state. In addition they showed that the range of native-contact-fractions between those found to represent unfolded and folded thresholds were indicative of being on a transition path (defined as the  “.. regions of the trajectories that cross directly from the unfolded well to the folded well ..”).

A further measure was introduced with the contact lifetime test. The log-ratio of the time a contact spent on a transition path vs the time it existed in the unfolded state was calculated and compared in a heat-map to the native contact map coloured by the number of contacts between residues.


Contact life time test for a selected protein.
Adapted from (3).

Among others this result revealed a clear connection between contacts with longer transition path life times and the number of contacts they made in the native structure.

So what about non-native interactions?

Screenshot from 2014-03-27 12:47:04

One of the measures addressing this question was the Bayesian measure for non-native contacts on transition paths. In the examples used in this paper, no obvious link between being on a transition path given a non-native contact was found unless they were close to native contacts. Further criteria such as the complementary quantity, which is the probability of being on a transition path when a contact is not made, concluded in a similar fashion.

Interestingly, it was found that the one protein that was influenced by non-native contacts was the designed α3D. Best et al. reasoned that additional frustration introduced when building a protein with artificially introduced stability has led to a shifting of helix register giving rise to this outlier.

When taken together, these results lay a robust foundation for further studies along the same lines. It is too early to accept or reject the presented findings as universal truth, but strong arguments for the native-centric mechanism being a reasonable model in small fast-folding proteins have been made. It would not be far-fetched to think that larger proteins would adhere to similar principles with non-native contacts modulating the landscape, especially when considering individual downhill folding modules.


(1) Bryngelson, J.D. et al., 1995. Funnels, pathways, and the energy landscape of protein folding: a synthesis. Proteins, 21(3), pp.167–95.

(2) Taketomi, H., Ueda, Y. & Gō, N., 1975. Studies on protein folding, unfolding and fluctuations by computer simulation. I. The effect of specific amino acid sequence represented by specific inter-unit interactions. International journal of peptide and protein research, 7(6), pp.445–59.

(3) Best, R.B., Hummer, G. & Eaton, W.A., 2013. Native contacts determine protein folding mechanisms in atomistic simulations. Proceedings of the National Academy of Sciences of the United States of America, 110(44), pp.17874–9.

Journal Club: Random Coordinate Descent

The paper I chose to present at last week’s group meeting was “Random Coordinate Descent with Spinor-Matrices and Geometric Filters for Efficient Loop Closure”, by Pieter Chys and Pablo Chacón.

Loop closure is an important step in the ab initio modelling of protein loops. After a loop is initially built, normally by randomly choosing φ/ψ (phi/psi) dihedral angles from a distribution (Step 1 in the figure below), it is probably not ‘closed’ – i.e. the end of the loop does not meet the rest of the protein structure on the other side of the gap. Waiting for the algorithm to produce closed initial conformations would be horribly inefficient, so it’s much better to have some method of closing the initial loop structures computationally.

The main steps in the ab initio prediction of protein loops.

The main steps in the ab initio prediction of protein loops.

Loop closure methods can be classified into three different types:

  1. Analytical methods: the exact solution to the loop closure problem is calculated. The difficulty with this approach is that it becomes increasingly complicated the more degrees of freedom (i.e. dihedral angles) you have.
  2. Build-up methods: the loop is built residue-by-residue to construct an approximately closed loop which can then be refined. Basically, the loop is guided to the closed position as it is being built.
  3. Iterative methods: do just what they say on the tin – the loop is closed gradually through a series of iterations.

Of course, science is never simple, and  loop closure algorithms often cannot be classified into just one of the above categories. Cyclic coordinate descent (CCD), the method on which the random coordinate descent algorithm introduced in this paper is based, is a mix of analytical and iterative methods. Starting from one anchor residue (the residues either side of the loop), the loop is initialised. To the end of the ‘open’ loop structure is added the anchor residue from the other side. This residue is therefore present twice: the ‘fixed’ anchor residue (the true structure) and the ‘mobile’ anchor residue (the one added to the loop structure). Then, starting from the end of the loop that is attached to the  rest of the protein, the dihedral angles are changed sequentially to try and minimise the distance between the fixed and mobile anchor residues. The angle change that would minimise this distance is calculated analytically. Once the distance is within a particular cut-off value, the loop is considered to be closed and this is then the final structure.

Random coordinate descent (RCD) is based upon CCD, but with a number of alterations and additions:

  1. Instead of iterating through each dihedral angle sequentially along the loop backbone, angles are chosen randomly
  2. A spinor-matrix approach is used – this reduces loop closure times
  3. Various geometric filters are added at various points in the algorithm – either before, during or after loop closure.
  4. Switching‘  – if loop building fails, then the direction of loop building is changed to the opposite – for example, if the structure is being grown from the N-anchor, but doesn’t pass through the filters, then the loop is discarded and the next loop will be grown from the C-anchor. This should mean that the favoured loop closure direction naturally dominates.

The different geometric filters are as follows:

  1. A grid clash filter, which checks for clashes between the loop residues and the rest of the protein structure
  2. A loop clash filter, which checks for internal clashes between loop residues
  3. An adaptive Ramachandran filter, which restrains the dihedral angles to the allowed regions of the Ramachandran plot.

The Ramachandran filter is a good idea, since loop closure can change the dihedral angles of a structure significantly, moving them into disallowed regions. φ (phi) angles are restricted to the range between -175˚ and -40˚, and  ψ angles are restricted between -60˚ and 175˚ – this is basically the top left part of the Ramachandran plot. There are two exceptions: the  φ angle of proline is fixed, and the dihedral angles of glycine residues are not restricted at all. When placed inside the loop closure routine, the filter is ‘adaptive’ – if the calculated optimum angle is outside of the allowed region, the filter calculates the maximum possible rotation that would still be allowed. When these angle changes become too small, however, the restriction is removed entirely and the angle is allowed to change freely.

By testing different combinations of filters in different places, the authors decided upon a final RCD algorithm. This version includes the grid clash filter during loop closure, and the Ramachandran filter applied both before and during loop closure. They then compare their method to some other loop closure algorithms – their method produces good results, outperforming all except a method called ‘direct tweak’ – the only other method tested that includes clash detection during loop closure. From this, the authors conclude that this is a key factor in generating accurate loop conformations. They also report that RCD is 6 to 17 times faster than direct tweak.

Overall, then, the authors of this paper have introduced an accurate and fast loop closure algorithm which outperforms most other methods. Currently, my research is focussed upon developing a new antibody-specific ab initio loop modelling method, and some of the concepts used in this paper would definitely be worth investigating further. Watch this space!