CABS-flex

One of the still open questions in bioinformatics is that of the flexibility of proteins, and it is one in which I am quite interested. Our main source of structural information is X-ray diffraction experiments, in which the crystalline protein is intrinsically rigid. While there is an ever increasing body of NMR data, and Molecular Dynamics simulations are becoming faster and cheaper, complete information about the dynamics of every PDB structure is a long way off. All atom MD simulations for a complete protein still take either days, or a powerful computer. So, naturally, any papers that talk about ways to get flexibility information catch my attention.

The paper that caught my attention was CABS-flex: Server for fast simulation of protein structure fluctuations.. There also is a connected paper – Consistent View of Protein Fluctuations from All-Atom Molecular Dynamics and Coarse-Grained Dynamics with Knowledge-Based Force-Field – which benchmarks their method against MD simulations.

The CABS model pops up in a number of other places – there is a CABS-fold server, and the method was first described in the paper Consistent View of Protein Fluctuations from All-Atom Molecular Dynamics and Coarse-Grained Dynamics with Knowledge-Based Force-Field from 2004.

What does the webserver do?

You give it a coordinate file, wait some hours, and it gives you back a series of (clustered) structures, a C&alpha trajectory, a residue flexibility profile, and a nice little video. Which is nice. It is, however, a little limited; you can only do a single chain (so no modelling of small molecule or peptide interactions), there is no way of fixing part of the model so it does not flex, and it can be picky about your PDB files – no chain gaps, no strange amino acids. You can tell it to analyse a PDB chain by just entering the code, but these are frequently rejected for having unacceptable coordinate files.

How does it work?

The CABS model makes a reduced representation of the protein, and explores its conformational space by making moves – which are accepted or rejected based on a (comparatively complex) force field. Many moves are attempted, and over a large number of iterations, this builds up an ensemble of structures, which show the possible confirmations of the protein. So, its a bit like MD, but rather than moving atoms based on calculated forces, you make random moves, and accept the sensible ones.

hello

The model is shown above. The protein structure is reduced to 4 atoms per residue. There are 5 types of move, and in each step, a random move of each type is tried ~n times (n being the number of amino acids in the protein). These are accepted or rejected based on their force field, and the process is repeated several hundred times. The force field used to accept/reject moves is quite complex – there are short range (sequence dependent and sequence independent), hydrogen bond, and long range (again, sequence dependent and sequence independent) interactions. There are sub-categories within these, and the relative contributions of the interactions can be changed. There is no solvent, so the long range interactions have to provide the forces to keep the protein together, and the hydrogen bond interactions act to maintain the secondary structure elements. More detail can be found in the 2004 paper.

Is it true?

The authors justify the method by comparing it to MD simulations taken from a database, and conclude that CABS-flex gives answers as similar to MD simulations as MD simulations using different force fields are to each other. Simply put, the same residues move in their simulations as move in MD simulations. They move more, which backs up their claim that they demonstrate flexibility over a longer time-frame than MD simulations. They do admit that they get the best agreement when they heavily restrict their secondary structure – raising the question of how much of the agreement between all of the methods is down to all the simulations having the same secondary structure.

To conclude, this could be a useful method, particularly in that it can give long time-frame flexibility – providing you treat it as any other measure of flexibility, with a pinch of salt. It is a bit of a shame that the interface is (perhaps frustratingly) simple, and you need to pre-process your coordinate files, as many PDB coordinate files will not be accepted.

Author