My talk this week focused on secondary structure (SS) assignment. What do I mean by this? It is assigning SS types (principally α-helices and β-sheets) to protein structures. It can be found hiding in many of the things we do – e.g. alignment and modelling. There are many available methods to do this, of which DSSP (despite being published in 1983) is the most popular.
How does it work?
The algorithm identifies hydrogen bonds between mainchain carbonyl and amide groups. Partial charges are applied to the amide and carbonyl bonds, and the the C, O, N, and H atoms are assumed to be point charges (hence C has charge +ρ1, O -ρ1, N -ρ2, and H +ρ2. The the electrostatic energy between these 4 atoms is calculated, and if it is < -0.5 kcal/mol, a hydrogen bond exists. This is a relatively relaxed threshold, as a normal hydrogen bond in an $alpha;-helix is in the region of -3 kcal/mol, so it means that a given residue could have i+3, i+4, and i+5 hydrogen bonds.
Helices and sheets are then identified where there are characteristic hydrogen bond patterns. For example, two consecutive i to i+4 backbone hydrogen bonds indicates an α-helix turn. The algorithm identifies each turn, and each β-bridge, and where several of these overlap, they are combined into single elements.
DSSP has 8 different SS assignments:
- G – 310 helix
- H – α-helix
- I – π-helix
- E – β-sheet
- B – β-bridge
- T – helix turn
- S – bend (high curvature)
- C – coil (none of the above)
These are assigned in an order of preference – HBEGITSC.
Many (but by no means all) SS assignment programs still use this notation.
DSSP is one of the more simple SS assignment programs. Its hydrogen bond energy calculation is distinctly simplistic. It does not (fully) take the angles of the hydrogen bond into account, and provides only a binary classification for each hydrogen bond. However, perhaps surprisingly, DSSP is still the most used method. Why? Probably something to do with them giving it away for free, which resulted in many software suits incorporating it (e.g. JOY, PROMOTIF). As a general rule, if something does not say what it uses for SS assignment, it probably uses DSSP.
Given the simplicity of DSSP, it is not surprising that there are a large number of other available methods. Indeed, you may notice that different programs will give different assignments (e.g. comparing Pymol to VMD or the PDB annotation).
There have been a vast number of other secondary structure (SS) annotation methods published, including: STRIDE, DEFINE, PROMOTIF, KAKSI, SST, PSSC, P-SEA, SECSTR, XLLSSTR, PALSSE, and STICK. The other two you are likely to come across are STRIDE, and the PDB annotation.
SS assignment in general
All of the SS assignment methods rely on a combination of three features of SS. These are:
- Mainchain hydrogen bonds
- Φ and Ψ angles
- Inter Cα distances
For all three of these, there are values characteristic of both helices and β-sheets. Every method takes some combination of these three features for each residue, and if they are within the chosen limits, classifies the residue accordingly. The limits are generally chosen to give good agreement with the PDB annotation.
(It should be noted that the hydrogen-bond containing methods use the position of the hydrogen atom, which is rarely present in the crystal structure, and thus must be inferred.)
STRIDE can be described as an updated version of DSSP. It has two main differences – it uses a much fuller description of hydrogen bond geometry, and combines this with a knowledge based φ/ψ angle potential to assign the residue state. It has many more parameters that DSSP, and these are trained based on the PDB annotation. So where does that come from?
This PDB annotation comes from the depositors own annotation. The current guidance (from here) is to use the generated annotation, from PROMOTIF. PROMOTIF uses DSSP, with a slight change – it annotates an extra residue at the end of each structure element. I am in no position to say how well this guidance is adhered to by the depositors, or what their historical behaviour was, but the vast majority of annotations are reasonable.
I guess you are now wondering how different these methods are. Generally they agree in the obvious cases, and disagreement is normally at the ends of SS elements. Other examples (particularly pertinent to my research) occur when one method identifies a single long element, while another method identifies two elements seperated by a coil section. Ultimately there is no ‘right’ answer, so saying one method is right and another wrong is impossible.
To sum up, DSSP is the de facto standard. Ignoring my previous comment, it is probably not the best algorithm, as it is a gross simplification. STRIDE improves on the algorithm (although using more parameters), whilst for specific tasks, one method may be better than all of the others. It is hard to say if one is the best, and if it is important to you, then you should think about which method to use. If you do not think it is, then you should reconsider, and if it really is not important, then just use DSSP like everyone else. This is perhaps an example where willing, free, provision your code to the community results in your method (DSSP) becoming the de facto standard.