loop prediction | Oxford Protein Informatics Group

Today I gave a talk on my research project when I joined the group. My research focuses on modeling of membrane proteins (MPs). Membrane proteins are the main class of drug targets and their mechanism of function is determined by their 3D structure. Almost 30% of the proteins in the sequenced genomes are membrane proteins. But only ~2% of the experimentally determined structures in the PDB are membranes. Therefore, computational methods have been introduced to deal with this limitation.

Homology modeling is one of the best performing computational methods which gives “accurate” models of proteins. Many homology modeling methods have been developed, with Modeller being one of the best known ones. But these methods have been tested and customised primarily on the soluble proteins. As we know there are main physical difference between the MPS and water soluble proteins. Therefor, to build a homology modeling pipeline for membrane proteins, we need a pipeline which in all its steps the unique environment of the membrane protein is taken into account.

Memoir is a tool for homology-based prediction of membrane protein structure (Figure below). As input memoir takes a target sequence and a template. First, using imembrane the lipid bilayer boundaries are detected on the template. Using this information MP-T, with its membrane specific substitution matrices, aligns the target and template. Then, Medeller is used to build the core model and finally FREAD, a fragment-based loop modeling, is used to fill in the missing loops.

Memoir Pipeline

Memoir methodology builds accurate models but potentially incomplete. Homology modeling often entails a trade-off between the level of accuracy and the level of coverage that can be achieved in predicted models. Therefore we aim to build Memoir 2.0, in which we increase coverage by modelling the missing structural information only if such prediction is sensible. Therefore, to complete the models in the best way we aim at:

1-Examine the best ways to maximise FREAD coverage, maintaining accuracy

2-Examine the best ab initio loop predictor for membrane proteins

Fread has two main parameters which contribute to its accuracy and coverage. The nature of the chosen database to look for a loop (i.e. membrane or soluble (mem/sol)) and the choice of the sequence identity (ESSS) cut-off:

ESSS >= 25: more accurate loop models are built (Hiacc)

ESSS > 0: more coverage is met but not necessary accurate models (Hicov)

To test the effect of these parameters on the prediction accuracy and coverage we chose to test set:

Mem_DS: 280 loops taken from MP X-ray structures.

Model_DS: 156 loops from homology models of MPs. The loop length in both test ranges from 4 to 17 residues

The comparisons on both dataset confirm that to achieve the highest accuracy and coverage the FREAD Pipeline should be performed as:

1. Hiacc-mem

2. Hicov-mem

3. Hiacc-sol

4. Hicov-sol

Memoir with the new FREAD Pipeline, called Memoir 2.0, achieves higher coverage in comparison to the original Memoir 1.0.

But there are still loops which are not modeled by FREAD Pipeline. These loops should be modeled using an ab initio method. To test the performance of soulable ab initio loop predictors on the membrane proteins we predicted the loops of the above testset sing six ab initio methods available for download: Loopy, LoopBuilder, Mechano, Rapper, Modeller and Plop.

Comparison between ab initio methods on membrane proteins

Comparisons in the image above shows that:

FREAD is more accurate but, doesn’t achieve complete coverage.

Greater coverage is achieved using ab initio predictors.

Mechano, LoopBuilder and Loopy are the best ab initio predictors.

We have selected Mechano for Memoir 2.0 because it:

provides higher coverage than Loopy whilst retaining a similar accuracy.

is faster than LoopBuilder (Mechano is ~30 min faster on loop length of 12)

is able to model terminals.

In memoir 2.0 the C and N terminals of up to 8 residues are built using Mechano. Then, Mechano decoy’s are ranked by their Dfire score , and accepted only if they have exited the membrane. This check improves the average RMSD up to 4Å on DS_280 terminals.

In conclusion, Memoir 2.0 provides higher coverage models while maintaining a reasonable accuracy level. Our comparison results showed that FREAD is significantly more accurate than the ab initio methods. But, greater coverage is achieved using ab initio predictors.Comparison oshows that the top ab initio predictors in terms of accuracy are Mechano, LoopBuilder and Loopy. Similar patterns were also present in the model dataset. We have selected Mechano as it provides higher coverage than Loopy whilst retaining a similar accuracy and is also much faster than LoopBuilder. Mechano also has the advantage that it is able to model terminals. Only loops smaller than 17 residues were considered for modelling since above this threshold the accuracy of predicted loops drops significantly.

Oxford Protein Informatics Group

or "OPIG" to friends

Tag Archives: loop prediction

Building accurate models of membrane protein structures