Tag Archives: structural dynamics

Learning dynamical information from static protein and sequencing data

I would like to advertise the research from Pearce et al. (https://doi.org/10.1101/401067) whose talk I attended at ISMB 2019. The talk was titled ‘Learning dynamical information from static protein and sequencing data’. I got interested in it as my field of research is structural biology which deals with dynamics systems, e.g. proteins, but data is often static, e.g. structures from X-ray crystallography. They presented a general protocol to infer transition rates between states in a dynamical system that can be represented with an energy landscape.

Continue reading

Network Representations of Allostery

Allostery is the process by which action at one site, such as the binding of an effector molecule, causes a functional effect at a distant site. Allosteric mechanisms are important for the regulation of cellular processes, altering the activity of a protein, or the whole biosynthetic pathway. Triggers for allosteric action include binding of small molecules, protein-protein interaction, phosphorylation events and modification of disulphide bonds. These triggers can lead to changes in accessibility of the active site, through large or small motions, such as hinge motion between two domains, or the motion of a single side chain.

Figure 1 from

Figure 1 from [1]: Rearrangement of a residue–residue interaction in phosphofructokinase. Left panel: interaction between E241 and H160 of chain A in the inactive state; right: this interaction in the active state. Red circles mark six atoms unique to the residue–residue interface in the I state, green circles mark four atoms unique to the A state, and yellow circles mark three atoms present in both states. In these two residues, there are a total of 19 atoms, so the rearrangement factor R(i,j) = max(6, 4)/19 = 0.32

One way to consider allostery is as signal propagation from one site to another, as a change in residue to residue contacts. Networks provide a way to represent these changes. Daily et al [1] introduce the idea of contact rearrangement networks, constructed from a local comparison of the protein structure with and without molecules bound to the allosteric site. These are referred to as the active and inactive structures respectively. To measure the whether a residue to residue contact is changed between the active and inactive states, the authors use a rearrangement factor (R(i,j)). This is the ratio of atoms which are within a threshold distance (5 angstroms) in only one of the active or inactive states (whichever is greater), to the total number of atoms in the two residues.The rearrangement factor is distributed such that the large majority of residues have low rearrangement factors (as they do not change between the active and inactive state). To consider when a rearrangement is significant the authors use a benchmark set of non allosteric proteins to set a threshold for the rearrangement factor. The residues above this threshold form the contact rearrangement network, which can be analysed to assess whether the allosteric and functional sites are linked by residue to residue contacts. In the paper 5/15 proteins analysed are found to have linked functional and allosteric sites.

Contact rearrangement network

Adaption of Figure 2 from [1]. Contact rearrangement network for phosphofructokinase. Circles in each graph represent protein residues, and red and green squares represent substrate and effector molecules, respectively. Lines connect pairs of residues with R(i,j) ≥ 0.3 and residues in the graph with any ligands which are adjacent (within 5.0 Å) in either structure. All connected components which include at least one substrate or effector molecule are shown.

Collective rigid body domain motion was not initially analysed by these contact rearrangement networks, however a later paper [2], discusses how considering these motions alongside the contact rearrangement networks can lead to a detection of allosteric activity in a greater number of proteins analysed. These contact rearrangement networks provide a way to assess the residues that are likely to be involved in allosteric signal propagation. However this requires a classification of allosteric and non-allosteric proteins, to undertake the thresholding for significance of the change in contacts, as well as multiple structures that have and do not have a allosteric effector molecule bound.

CONTACT

Figure 1 from [3]. (a) X-ray electron density map contoured at 1σ (blue mesh) and 0.3σ (cyan mesh) of cyclophilin A (CYPA) fit with discrete alternative conformations using qFit. Alternative conformations are colored red, orange or yellow, with hydrogen atoms added in green. (b) Visualizing a pathway in CYPA: atoms involved in clashes are shown in spheres scaled to van der Waals radii, and clashes between atoms are highlighted by dotted lines. This pathway originates with the OG atom of Ser99 conformation A (99A) and the CE1 atom of Phe113 conformation B (113B), which clash to 0.8 of their summed van der Waals radii. The pathway progresses from Phe113 to Gln63, and after the movement of Met61 to conformation B introduces no new clashes, the pathway is terminated. A 90° rotation of the final panel is shown to highlight how the final move of Met61 relieves the clash with Gln63. (c) Networks identified by CONTACT are displayed as nodes connected by edges representing contacts that clash and are relieved by alternative conformations. The node number represents the sequence number of the residue. Line thickness between a pair of nodes represents the number of pathways that the corresponding residues are part of. The pathway in b forms part of the red contact network in CYPA. (d) The six contact networks comprising 29% of residues are mapped on the three-dimensional structure of CYPA.

Alternatively, Van den Bedem et al [3]  define contact networks of conformationally coupled residues, in which movement of an alternative conformation of a residue likely influences the conformations of all other residues in the contact network. They utilise qFit, a tool for exploring conformational heterogeneity in a single electron density map of a protein, by fitting alternate conformations to the electron density.  For each conformation of a residue, it assesses whether it is possible to reduce steric clashes with another residue, by changing conformations. If a switch in conformations reduces steric clashes, then a pathway is extend to the neighbours of the residue that is moved. This continued until no new clashes are introduced. Pathways that share common members are considered as conformationally coupled, and grouped into a single contact network. As this technique is suitable for a single structure, it is possible to estimate residues which may be involved in allosteric signalling without prior knowledge of the allosteric binding region.

These techniques show two different ways to locate and annotate local conformational changes in a protein, and determine how they may be linked to one another. Considering whether these, and similar techniques highlight the same allosteric networks within proteins will be important in the integration of many data types and sources to inform the detection of allostery. Furthermore, the ability to compare networks, for example finding common motifs, will be important as the development of techniques such as fragment based drug discovery present crystal structures with many differently bound fragments.

[1] Daily, M. D., Upadhyaya, T. J., & Gray, J. J. (2008). Contact rearrangements form coupled networks from local motions in allosteric proteins. Proteins: Structure, Function and Genetics. http://doi.org/10.1002/prot.21800

[2] Daily, M. D., & Gray, J. J. (2009). Allosteric communication occurs via networks of tertiary and quaternary motions in proteins. PLoS Computational Biology. http://doi.org/10.1371/journal.pcbi.1000293

[3] van den Bedem, H., Bhabha, G., Yang, K., Wright, P. E., & Fraser, J. S. (2013). Automated identification of functional dynamic contact networks from X-ray crystallography. Nature Methods, 10(9), 896–902. http://doi.org/10.1038/nmeth.2592

Augmented Modelling with Natural Move Monte Carlo Simulations

In the last group meeting I reported on the progress that I have made regarding the development of a protocol for the systematic use of Natural Move Monte Carlo simulations.

Natural Move Monte Carlo simulations
Natural Moves are degrees of freedom that describe the collective motion of groups of residues. In DNA this might be the concerted motion of a double helix; in proteins this could be the movement of a stable secondary structure element such as a beta-sheet. These segments are joined by so called melting areas. At each simulation step the segments are propagated independently in an MC fashion. The resulting chain breaks are resolved by a chain closure algorithm that acts on the melting areas. This results in a reduction of degrees of freedom of several orders of magnitude. Therefore, large complexes and conformational changes can be sampled more effectively.

In order to get sensible results, however, the initial decomposition of the system is important. The challenge is to accurately represent the plasticity of the system, while keeping the number of degrees of freedom as small as possible. Detailed insight into the flexibility of the system might be gained from experimental sources such as NMR or computational methods such as MD simulations and Normal Mode Analysis. This can help with defining segments and melting areas. However, there are many systems for which this data is not available. Even if it is, there is no guarantee that the segmentation is correct.

Therefore, I am developing a protocol that allows for the evaluation of a range of different test cases that each reflect a unique set of segments and melting areas.

Augmented Modelling Protocol
This protocol is aimed at the systematic evaluation of NMMC segmentations. It allows researchers to feed experimental information, biological knowledge and educated guesses into molecular simulations and so provides a framework for testing competing hypotheses. The protocol has four steps.

Step 1: Segmentation of the system into low-level segments
The initial segmentation contains all possible areas of flexibility that may play a role in conformational changes in the system of interest. This decision may be influenced by many sources. For now, however, we only consider secondary structure information. Helices and beta strands are treated as potential segments. Unstructured regions such as kinks, loops and random coils are treated as melting areas. For a small fold with four helices we get the segmentation shown in figure 1a.

Step 2: Formulate test cases
Generate multiple test cases that reflect hypotheses about the mechanism of interest. In this step we try to narrow down the degrees of freedom as much as possible in order to retain sampling efficiency. This is done by selectively deactivating some melting areas that were defined in step 1. For a system with three melting areas that can either be on or off, 2^3 = 8 different test cases may be generated (example shown in figure 1b).

Segmentation of a small α-fold.

Figure 1 a) Segmentation of a small α-fold. The blue rectangles represent α-helices. The dashed lines indicate the presence of melting areas I, II and III. Each melting area can be switched on or off (1/0) b) Example of a test case in which the first of three melting area is switched off. c) The six degrees of freedom along which a segment is propagated.

Step 3: Perform simulations
Sample the conformational space of all test cases that were generated in step 2. We generally use Parallel Tempering or Simulated Tempering algorithm to accelerate the sampling process. These methods rely on the modulation of temperature to overcome energy barriers.

Step 4: Evaluate results
Score the results against a given control and rank the test cases accordingly. The scoring might be done by comparing experimental distributions of observables with those generated by simulations (e.g. Kullback-Leibler divergence). A test case that reproduces desired expectation values of observables might then be considered as a candidate hypothesis for a certain structural mechanism.

What’s next?
I am currently working on example uses for this protocol. These include questions regarding aspects of protein folding and the stability of the empty MHC II binding groove.

Natural Move Monte Carlo: Sampling Collective Motions in Proteins

Protein and RNA structures are built up in a hierarchical fashion: from linear chains and random coils (primary) to local substructures (secondary) that make up a subunit’s 3D geometry (tertiary) which in turn can interact with additional subunits to form homomeric or heteromeric multimers (quaternary). The metastable nature of the folded polymer enables it to carry out its function repeatedly while avoiding aggregation and degradation. These functions often rely on structural motions that involve multiple scales of conformational changes by moving residues, secondary structure elements, protein domains or even whole subunits collectively around a small set of degrees of freedom.

The modular architecture of antibodies, makes them amenable to act as an example for this phenomenon. Using MD simulations and fluorescence anisotropy experiments Kortkhonjia et al. observed that Ig domain motions in their antibody of interest were shown to correlate on two levels: 1) with laterally neighbouring Ig domains (i.e. VH with VL and CH1 with CL) and 2) with their respective Fab and Fc regions.

Correlated Motion

Correlated motion between all residue pairs of an antibody during an MD simulation. The axes identify the residues whereas the colours light up as the correlation in motion increases. The individual Ig domains as well as the two Fabs and the Fc can be easily identified. ref: Kortkhonjia, et al., MAbs. Vol. 5. No. 2. Landes Bioscience, 2013.

This begs the question: Can we exploit these molecular properties to reduce dimensionality and overcome energy barriers when sampling the functional motions of metastable proteins?

In 2012 Sim et al. have published an approach that allows for the incorporation of these collective motions (they call them “Natural Moves”) into simulation. Using simple RNA model structures they have shown that explicitly sampling large structural moves can significantly accelerate the sampling process in their Monte Carlo simulation. By gradually introducing DOFs that propagate increasingly large substructures of the molecule they managed to reduce the convergence time by several orders of magnitude. This can be ascribed to the resulting reduction of the search space that narrows down the sampling window. Instead of sampling all possible conformations that a given polynucleotide chain may take, structural states that differ from the native state predominantly in tertiary structure are explored.

Reduced Dimensionality

Reducing the conformational search space by introducing Natural Moves. A) Ω1 (residue-level flexibility) represents the cube, Ω2 (collective motions of helices) spans the plane and Ω3 (collective motions of Ω2 bodies) is shown as a line. B) By integrating multiple layers of Natural Moves the dimensionality is reduced. ref: Sim et al. (2012). PNAS 109(8), 2890–5. doi:10.1073/pnas.1119918109

It is important to stress, however, that in addition to these rigid body moves local flexibility is maintained by preserving residue level flexibility. Consequently, the authors argue, high energy barriers resulting from large structural rearrangements are reduced and the resulting energy landscape is smoothened. Therefore, entrapment in local energy minima becomes less likely and the acceptance rate of the Monte Carlo simulation is improved.

Although benchmarking of this method has mostly relied on case studies involving model RNA structures with near perfect symmetry, this method has a natural link to near-native protein structure sampling. Similarly to RNA, proteins can be decomposed into local substructures that may be responsible for the main functional motions in a given protein. However, due to the complexity of protein motion and limited experimental data we have a limited understanding of protein dynamics. This makes it a challenging task to identify suitable decompositions. As more dynamic data emerges from biophysical methods such as NMR spectroscopy and databases such as www.dynameomics.org are extended we will be able to better approximate protein motions with Natural Moves.

In conclusion, when applied to suitable systems and when used with care, there is an opportunity to breathe life into the static macromolecules of the pdb, which may help to improve our understanding of the heterogeneous structural landscape and the functional motions of metastable proteins and nanomachines.