CAPRI, pyDock and the state of protein-protein docking.

Intro

We have discussed a paper highlighting the latest progress in the field of protein-protein docking. It covers the results of the participation of the pyDock group in CAPRI. The latest round of the experiment is particularly interesting as it challenged the participants with affinity prediction which is going one step beyond modeling protein-protein complexes.

Role of the group in this CAPRI edition

For those that are not very familiar with CAPRI, one can participate in any of the following roles:

  • Predictors: Participants are given the unbound structures or sequences of one or both interacting partners. Predictors are then supposed to submit a ranked list of modeled complexes which can be achieved by any docking/prediction tools and some human intervention. This tests the current ad hoc capacity to model protein-protein complexes.
  • Servers: Participants are given the unbound structures or sequences of one or both interacting partners that are supposed to be submitted to an automatic server which needs to return the results (ideally) within 24 hours. This tests the automated capacity to model protein-protein models.
  • Scorers: The community contributes sets of poses (unranked) for each target. The scorers are then supposed to re-rank the poses. This is a form of cross-validation of methods since one could expect that the group that samples the poses also performs better scoring. This exercise is supposed to quantify this.

The pyDock group participates in the capacity of Predictors and Scorers. Below we first outline the general docking methodology, followed by the particular implementation of the pyDock group.

General Docking Methodology

Protein-protein docking starts with two interacting structures at input: (ligand (L) and receptor (R)) (if the structures are unavailable a model is created). The docking techniques can be broadly divided into two types: template based and template-free. In template-based docking, one needs either a complex of homologs or one that is sufficiently structurally similar. However since it is estimated that the number of complexes in the PDB only covers ~4% of the presumed total, this approach is not applicable in a great number of cases.

Template-free docking does not rely on pre-solved complexes and it is the main focus of this post. Standard modus-operandi of a template-free protein-protein docker is the following:

  1. Sample N poses.
  2. Re-score top D sampled poses.
  3. Refine the top K poses.

In the first step, possible poses of ligand with respect to the receptor. This is very often achieved by FFT (ZDOCK) or more elaborate methods such as geometric hashing/pose clustering (PatchDock). The possible orientations are ranked by a simplified energy function or a statistical potential. In most cases this step is performed in rigid-body mode (intra-molecular distances are not allowed to change) for the computational efficiency. The number of sampled poses is usually in the thousands (N~ 2k-10k).

The re-scoring set uses a more elaborate energy function/statistical potential to identify more close-to native poses. Notable examples include pyDock and ZRANK. Since such functions are usually more expensive computationally (for instance pyDock calculates the LJ potential, Coulombic electrostatics and desolvation terms), it is more tractable to apply them to a smaller set of (D<<N) of top poses as returned by the rigid-body sampler. Additionally, the final poses might be pruned for redundancy by removing structures which are very similar to each other. One method in particular (ClusPro) owes its success to scoring the rankings based on the numbers of similar decoys per cluster out of a number of top predictions.

The final step constitutes the refinement of the select top K poses (K<<D). Here, flexibility might be introduced so as to account for conformational changes upon binding. Tools used to achieve this are very computationally expensive energy fields such as AMBER or CHARMM. The coordinates of side-chains or even backbones are distorted, for instance using normal mode calculations, until the energy function reaches an energetic minimum via a flavor of gradient descent.

Fig. 1: Breakdown of results of the pyDock group at the latest CAPRI round. The results are presented for each target, in the predictor as well as scorer capacity.

Fig. 1: Breakdown of results of the pyDock group at the latest CAPRI round. The results are presented for each target, in the predictor as well as scorer capacity.

The pyDock group Methodology and Results

The pyDock group follows the pattern outlined above. As a sampler they employ ZDOCK and FTDOCK, both fast rigid-body Fast Fourier Transform-based methods. They use their pyDock function to score the best models. Additionally, they remove redundancy from the final lists of decoys by removing these entries which are too similar to the higher-scoring ones according to ligand rmsd. The final refining step is carried out using TINKER.

Receptor is shown in white. Best ligand as predictors in red, as scorers in blue.

Receptor is shown in white. Best ligand as predictors in red, as scorers in blue.

The pipeline outlined above is carried out in most of targets, however there are some exceptions. For some targets, additional docking poses were generated using SwarmDock (T53, T54, T57 and T58) and RotBUS (T46 and T47). In some cases rather than using TINKER at the refinement stage, CHARMM or AMBER were used.

Receptor is shown in white. Best ligand as predictors in red, as scorers in blue.

Receptor is shown in white. Best ligand as predictors in red, as scorers in blue.

Such ad hoc approaches were employed for almost every target. Available information was employed to the fullest in order to achieve best models. For instance, in cases where good homology structures were available (T47), or it was known which residues appear of importance to the complex, appropriate distance constraints were imposed. The pyDock group achieves better accuracy as predictors rather than scorers when it comes to docking (67% correct models submitted against 57%). They follow a trend wherein predictors usually do better than scorers (See Fig. 1). This trend is however broken at the stage of predicting the water molecules at the interface of T47. Using DOWSER In the predictor capacity they correctly identify 20% contact-mediating water molecules and 43% as scorers.

Receptor is shown in white. Best ligand as predictors in red, as scorers in blue.

Receptor is shown in white. Best ligand as predictors in red, as scorers in blue.

Overall, the performance of docking is indicating that the field is steadily moving forward towards achieving the goal of modeling complexes using sequence alone. There were some cases in this round where predictor groups started with sequence only and still produced reasonable models of complexes (T47, T50 and T53). In each of these cases one of the partners was an unbound structure and the other was a sequence. The only case where both partners were sequences did not produce any reasonable models. In this case only two groups out of 40 managed to present meaningful solutions.

Receptor is shown in white. Best ligand as predictors in red, as scorers in blue.

Receptor is shown in white. Best ligand as predictors in red, as scorers in blue.

Furthermore, this CAPRI round was the first to introduce affinity prediction – in targets T55 – T56. The aim was to predict the binding energy changes upon mutations. The predictor designed by the pyDock group achieved a good results on this test case with more in-depth details on the approach found in a related community-wide experiment.

Receptor is shown in white. Best ligand as predictors in red, as scorers in blue.

Receptor is shown in white. Best ligand as predictors in red, as scorers in blue.

 

Author