Tag Archives: Molecular Dynamics

Writing “vector trajectories” with cpptraj

The program cpptraj, written by Daniel Roe (https://github.com/Amber-MD/cpptraj) and distributed Open Source with the AmberTools package (https://ambermd.org/AmberTools.php), is a powerful tool for analysis of molecular dynamics simulations. In addition to all of the expected analyses like Root Mean Square Deviation and native contacts, cpptraj also includes a suite of vector algebra functions.

While this vector algebra functionality is fairly well known and easy to find in the documentation, I think it is less well known that cpptraj can write trajectories of the computed vectors. These trajectories can then be loaded into Visual Molecular Dynamics (VMD) alongside the analysed trajectory and played as a movie. This functionality is a valuable tool for debugging your vector calculations to make sure they are doing precisely what you intend. It may also prove useful for generating visualizations of vectors alongside molecular structures for publications.

The cpptraj script below reads in an Amber parameter file and coordinate file and then calculates the angle between two planes.

parm 7bbg_fixed.prmtop
trajin 7bbg_fixed.rst7
vector v1 mask :65@NH1 :65@NH2
vector v2 mask :65@NH1 :65@NE
vector v3 mask :64@CA :66@CA
vector v4 mask :66@CA :68@CA
vectormath vec1 v1 vec2 v2 crossproduct name n1
vectormath vec1 v3 vec2 v4 crossproduct name n2
vectormath vec1 n1 vec2 n2 dotangle out 7bbg_ref_plane_angle.dat

The first plane is defined by two vectors in the plane of the guanidino group of a R65 residue (v1 and v2); the second plane is defined by two vectors between CA atoms of amino acids in the alpha helix containing R65 (v3 and v4). The first two vectormath calls determine the normal vectors to the planes and the final vectormath line computes the angle between the normal vectors. Taken together, these commands compute the angle between the arginine side chain and a plane passing through the CA atoms of the alpha helix. Let’s check that the vectors {v1, v2, v3, v4} are being computed correctly.

parm 7bbg_fixed.prmtop
trajin 7bbg_fixed.rst7
vector v1 mask :65@NH1 :65@NH2
vector v2 mask :65@NH1 :65@NE
vector v3 mask :64@CA :66@CA
vector v4 mask :66@CA :68@CA
run
writedata vectors.mol2 v1 v2 v3 v4 vectraj trajfmt mol2

The resulting vector trajectory vectors.mol2 can be loaded directly into VMD without a topology. Note that in this case we only analyzed a single frame, but you can run this same procedure on DCD files, too. This is what I get when I load the vectors into VMD alongside the structure:

The vectors are shown as red/pink line segments. The right structure is identical to the left but with the alpha helix cartoon model removed. The blue spheres indicate the locations of the CA atoms used to define the plane of the helix.

I hope this vector trajectory functionality will be helpful to a few people who like to neurotically check their analyses like I do. You can download the example prmtop and rst7 files below. Note that you should rename them to remove the extra “.txt” file extension before attempting to use them for anything.

The information in this blog post is adapted from an Amber Archive post from Daniel Roe, dated 30-Oct-2018: http://archive.ambermd.org/201811/0058.html

Files for the example:

Do you have cis peptide bonds in your simulation inputs?

People who run molecular simulations quickly become familiar with all of the things about a PDB file – missing residues, missing heavy atoms in residues, missing hydrogens, non-standard amino acids, multiple conformations, crystallization ligands, etc. – that might need to be fixed before setting up a simulation. This blog post is a reminder to check, after you have “fixed” your PDB, if you have accidentally introduced aberrant cis peptide bonds into your structure during rebuilding.

Continue reading

MM(PB/GB)SA – a quick start guide

The MMPBSA.py program distributed Open Source in the AmberTools21 package is a powerful tool for end-point free energy calculations on molecular dynamics simulations. In its most simple application, MMPBSA.py is used to calculate the free energy difference between the bound and unbound states of a protein-ligand complex. In order to use it, however, you need to have an Amber-compliant trajectory file, which means you need to setup and run your simulation fairly carefully.

While the Amber Manual and the MMPBSA tutorial provide lots of helpful information, putting everything together into a full pipeline taking you from structure to a free energy is another story. The goal for this guide is to provide a schematic you can follow to get started. This guide assumes you are familiar with molecular dynamics simulations and the theory of MMPBSA.

The easiest way I have found to do this, using only Open Source software, is:

(1) Download your raw PDB file. If you are lucky and it contains a complete set of heavy atoms (excepting perhaps a terminal OXT here and there, which tleap will add for you in step 3) you are good to go.

(2) Use the H++ webserver to determine the protonation states of each residue and add hydrogens as needed. This webserver is particularly convenient because it will allow you to directly download a PQR file that you can use to generate your starting topology and coordinates. Note that you have various options to choose the pH and internal/external dielectric constants for the calculation.

(3) Use tleap to generate your topology (prmtop) and coordinate (mdcor) files for your simulations. Do not forget that you will need not only the prmtop for the solvated complex, but also a dry prmtop for each of the complex, receptor, and ligand. Load the PQR file from H++ and do not forget to set PBRadii *to the same value for all prmtops*. A typical tleap script for setting up your solvated complex would look something like:

Continue reading

OpenMM Setup: Start Simulating Proteins in 5 Minutes

Molecular dynamics (MD) simulations are a good way to explore the dynamical behaviour of a protein you might be interested in. One common problem is that they often have a relatively steep learning curve when using most MD engines.

What if you just want to run a simple, one-off simulation with no fancy enhanced sampling methods? OpenMM Setup is a useful tool for exactly this. It is built on the open-source OpenMM engine and provides an easy to install (via conda) GUI that can have you running a simulation in less than 5 minutes. Of course, running a simulation requires careful setting of parameters and being familiar with best practices and while this is beyond the scope of this post, there are many guides out there that can easily be found. Now on to the good stuff: using OpenMM Setup!

When you first run OpenMM Setup, you’ll be greeted by a browser window asking you to choose a structure to use. This can be a crystal structure or a model. Remember, sometimes these will have problems that need fixing like missing density or charged, non-physiological termini that would lead to artefacts, so visual inspection of the input is key! You can then choose the force field and water model you want to use, and tell OpenMM to do some cleaning up of the structure. Here I am running the simulation on hen egg-white lysozyme:

Continue reading

Unraveling the role of entanglement in protein misfolding

Proteins that fail to fold correctly may populate misfolded conformations with disparate structure and function. Misfolding is the focus of intense research interest due to its putative and confirmed role in various diseases, including neurodegenerative diseases such as Parkinson’s and Alzheimer’s Diseases as well as cystic fibrosis (PMID: 16689923).

Many open questions about protein misfolding remain to be answered. For example, how do misfolded proteins evade cellular quality control mechanisms like chaperones to remain soluble but non-functional for long timescales? How long do misfolded states persist on average? How widespread is misfolding? Experiments indicate that misfolding can even be caused by synonymous mutations that alter the speed of protein translation but not the sequence of the protein produced (PMID: 23417067), introducing the additional puzzle of how the protein maintains a “memory” of its translation kinetics after synthesis is complete.

A series of four recent preprints (Preprints 1, 2, 3, and 4, see below) suggests that these questions can be answered by the partitioning of proteins into long-lived self-entangled conformations that are structurally similar to the native state but with perturbed function. Simulation of the synthesis, termination, and post-translational dynamics of a large dataset of E. coli proteins suggests that misfolding and entanglement are widespread, with two thirds of proteins misfolding some of the time (Preprint 1). Many misfolded conformations may bypass proteostasis machinery to remain soluble but non-functional due to their structural similarity to the native state. Critically, entanglement is associated with particularly long-lived misfolded states based on simulated folding kinetics.

Coarse-grain and all-atom simulation results indicate that these misfolded conformations interact with chaperones like GroEL and HtpG to a similar extent as does the native state (Preprint 2). These results suggest an explanation for why some protein always fails to refold while remaining soluble, even in the presence of multiple folding chaperones – it remains trapped in entangled conformations that resemble the native state and thus fail to recruit chaperones.

Finally, simulations indicate that changes to the translation kinetics of oligoribonuclease introduced by synonymous mutations cause a large change in its probability of entanglement at the dimerization interface (Preprint 3). These entanglements localized at the interface alter its ability to dimerize even after synthesis is complete. These simulations provide a structural explanation for how translation kinetics can have a long-timescale influence on protein behavior.

Together, these preprints suggest that misfolding into entangled conformations is a widespread phenomenon that may provide a consistent explanation for many unanswered question in molecular biology. It should be noted that entanglement is not exclusive to other types of misfolding, such as domain swapping, that may contribute to misfolding in cells. Experimental validation of the existence of entangled conformations is a critical aspect of testing this hypothesis; for comparisons between simulation and experiment, see Preprint 4.

Preprint 1: https://www.biorxiv.org/content/10.1101/2021.08.18.456613v1

Preprint 2: https://www.biorxiv.org/content/10.1101/2021.08.18.456736v1

Preprint 3: https://www.biorxiv.org/content/10.1101/2021.10.26.465867v1

Preprint 4: https://www.biorxiv.org/content/10.1101/2021.08.18.456802v1

OpenMM – easy to learn, highly flexible molecular dynamics in Python

When I came to OPIG this past March I realized I had a novel opportunity – there was no one to tell me which molecular dynamics (MD) program I had to use! Usually, researchers do not have much choice in the matter due to a number of practical concerns. Conflicts between input and output file formats, forces, velocities, and basically everything else between MD suites make having multiple programs flying around tenuous at best if you want group members to be able to help one another. After weighing my options, I settled on OpenMM – and so far I am very happy with the decision.

Continue reading

Do we need the constant regions of Antibodies and T-cell receptors in molecular simulations?

At this week’s journal club I presented my latest results on the effect of the constant regions of antibodies (ABs) and T-cell receptors (TCRs) on the dynamics of the overall system. Not including constant regions in such simulations is a commonly used simplification that is found throughout the literature. This is mainly due a massive saving in computational runtime as illustrated below: cutConstRegions

The constant regions contain about 210 residues but an additional speed up comes from the much smaller solvation box. If a cubic solvation box is used then the effect is even more severe:

waterbathBut the question is: “Is is OK to remove the constant regions of an AB or TCR and simulate without them?”.

Using replica simulations we found that simulations with and without constant regions lead to (on average) significantly different results. The detail of our analysis will soon be submitted to a scientific journal. The current working title is “Why constant regions are essential in antibody and T-cell receptor Molecular Dynamics simulations”.