ISMB/ECCB Conference 2013 (Berlin)

It’s that time of the year again… when an intrepid group of OPIGlets trundle back tired but happy from another successful conference (this time it was ISMB/ECCB and its satellite conference 3Dsig in Berlin) armed with their favourite titbits from the presented work. This blog post is a mashup of some of our highlights as presented at the last group meeting.

Post-schnitzel and out and about in Berlin!

Definitely one of the best things for me was getting the chance to hear Sir Tom Blundell (our very own academic grandfather… Charlotte’s supervisor) give the keynote at 3Dsig, talking about everything from the structure of insulin to his deep, underlying love of jazz. Here are some more of our favourite things…

Empirical contact potentials derived from binding free energy changes upon mutation
(poster presentation by Iain H. Moal and Juan Fernández Racio)

Chosen by Jinwoo Leem

I was impressed by Moal (et al.)’s poster on predicting protein-protein binding affinities (in fact, it won the poster prize at 3D-Sig!). The poster describes a statistical potential that considers the number of mutations in a protein, and the type of interatomic contacts. Two variants of the potential were made; one for considering all atoms (atomic potential), and one considering residue side chains, represented as a centroid atom (residue potential). Ultimately, the energy change is represented as:

where N is the matrix of interatomic contacts between atoms i,j and P is a vector of contact types. Using weighted least-squares to minimise the residuals, r, the equation was used to predict affinity (ΔG) and affinity changes following mutations (ΔΔG).

As we can see in the top two graphs, the model shows decent performance for predicting ΔΔG of enzyme-inhibitor interactions, i.e. the model can indicate how a mutation affects binding affinities. Having said this, the ΔΔG predictions for Ab-Ag interactions were poor (Pearson’s correlation = 0.5-0.6).

Moreover, when the same potential was used to predict ΔG (bottom two graphs), the correlations were even worse. In fact, for flexible protein pairs, i.e. receptor-ligand pairs whose interface RMSD > 1.0Å, the correlation has gone to as low as 0.02.

Although the results are disappointing with respect to ΔG prediction, the model raises two interesting points. First, this is one of the few scoring functions that are specifically designed to predict affinity, rather than giving an arbitrary score for low RMSD. In addition, this model re-iterates the challenges in predicting Ab-Ag interactions. The solution for the latter point is not yet clear, but it may be of interest to re-train the model specifically with Ab-Ag complexes, and see if the model’s performance improves!

Predicting protein contact map using evolutionary and physical constraints by integer programming
(paper presentation by Zhiyong Wang and Jinbo Xu)

Chosen by Saulo de Oliveira

Last week, I decided to present a quick overview of a Paper Presentation I attended during the ISMB 2013.

The title of the presentation was “Predicting protein contact map using evolutionary and physical constraints by integer programming.” based on a paper by the same name.

Contact prediction (or evolutionary constraint prediction, a term I am much more fond of) was a trendy topic both at the 3DSig (2013) and at the ISMB (2013), with several presentations and posters on the subject.

In this particular presentation, Zhiyong Wang and Jinbo Xu described a new method to identify evolutionary constraints. The big differential of their talk and their work was approaching the problem in a different angle: their aim was to predict contacts when you have a low number of sequences in the multiple sequence alignment (refer to previous posts in the blog for an introduction to contact prediction).

They proposed a combination of machine learning and integer programming (similar to linear programming, again a topic we discussed previously here) to perform their predictions.

The features of the machine learning did not present any innovation. They were quite standard in the field such as mutation rates on PSIBLAST profiles and the Mutual Information (MI). The results of the Random Forest algorithm was employed to formulate constraints in a linear problem. These constraints were used to enforce physical properties of proteins, based mostly on our understanding of secondary structure.

Results seemed positive in both a random test set (CASP10) and 2 other test sets. By positive, I mean there was an improvement on the current state-of-the-art, especially for proteins with 10-1000 sequences in the MSA. Still, their precision was around 30, 40% for the top L/10 predictions (where L is the protein length). Further improvements are still necessary before we can apply these evolutionary constraints to improve protein structure prediction.

Evolution of drug resistance: structural models
(presentation by Maria Safi)

Chosen by Hannah Edwards

I found this talk by Maria Safi (which won the prize for best non-keynote presentation at 3Dsig) to be a really interesting method, despite my complete lack of background knowledge in the area (what are conferences for but to expose you to new ideas, right?).

Their idea was to produce a computationally viable method for identifying drug resistance in a protein’s mutant space. Drugs work by binding to their target protein in such a way as to inhibit its native function. If the protein mutates so as to maintain its native function but impair its binding to the drug it acquires resistance. The problem is, even within a reasonable distance of the native sequence, a proteins’ mutant space is huge, and it’s by no means trivial to test for maintenance of function and binding energy.

The groups’ solution was to recognise that the vast majority of mutant space would not be of interest. As such they send their candidate mutants through a 2-pass search: the first, a quick and easy algorithm to swiftly eliminate the dead end mutants… those that either are not resistant to drug binding or do not maintain their native function, and the second, a more biochemically accurate yet computationally expensive algorithm to be applied to the shortlist identified during the first pass.

The first algorithm is based on restricted dead-end elimination which aims to minimise a simple energy potential based on the protein’s structural stability and it’s binding energy to the drug. The algorithm keeps the backbone structure constant but by differing the side chain conformations, the mutants result in different energy potentials. A mutation at residue r can then be eliminated if an alternative mutation at r will always result in a lower energy potential.

The second algorithm is based on the more sophisticated methodology of MM-PBSA, combining molecular mechanics with the Poisson-Boltzman Surface Area calculations to estimate the free energy of the compound. This final run identifies the candidate mutants.

A significant strength of their method is that it requires only the crystal structures of the drug and target protein. As a purely structural model it eliminates the need for large amounts of training data, which, for newly emerging diseases and drugs, is often impossible to have access to.

The main focus of Maria’s talk however was using these energy potentials to predict evolutionary pathways from a wild-type protein to a resistant candidate. By treating evolution as a random walk through mutant space, weighted by the energy potentials, and assuming selection pressure of resistance, they were able to computationally simulate evolutionary scenarios.

For example, Maria focussed on the ritonavir-HIV protease complex to illustrate this method. The majority of the mutants with resistance to ritonavir which have been observed in nature were predicted by the method. For the candidates that were predicted but have not been seen, further elucidation could be found from the simulated evolutionary pathways: around 60% of these candidates were not accessible under the evolutionary model.

Sequence comes to the Structural Rescue: Identifying Relevant Protein Interfaces in Crystal Structures
(presentation by Jose M. Duarte)

Chosen by Henry Wilman

Jose Duarte presented a tool, EPPIC, which identifies and classifies protein interfaces from pdb structures. The talk was titled ‘Sequence comes to the Structural Rescue: Identifying Relevant Protein Interfaces in Crystal Structures’, and follows from their publication Protein interface classification by evolutionary analysis, Duarte JM, Srebniak A, Schärer MA, Capitani G. BMC Bioinformatics. 2012 Dec 22..

As the title suggests, this uses both structural and sequence information to classify protein contacts as biological or crystal. There is a webserver, and a downloadable version. There are a number of methods that exist to classify interfaces, and this differs in a few ways.

The previous methods typically rely on the area of the interface. As you see in the paper, even the benchmark sets used to test the other methods are biased such that biological contacts have much greater areas than crystal contacts. When the authors constructed a set where the contact area was similar, they found the previous methods performed generally poorly. However, there are a number of ways that you can define the interface or contact area, and specifically what people call ‘core residues’ of the interface. They found one study performed much better on their training set than the others. This defined core residues as ones that lost the majority of their solvent accessible surface area on binding to the interface. A simple cut off of >= 6 core residues at an interface produced a good classification.

In addition to this, they used sequence information. We know that interfaces are important, and often mutations at interface residues are bad. So, for a biological interface, we would expect residues to be better conserved than non-interacting surface residues. The authors used sequence entropy as a measure of the conservation. They calculated this by collecting homologous sequences with PSI-Blast and aligned them using Clustal-Omega. For each position in the alignment, if x is the occupancy frequency for a given amino acid, the sequence entropy is given by the sum over all amino acids of xlog(x). (They actually use a reduced alphabet for this, to avoid penalising mutations to similar amino acids). They then compare the entropy of the ‘core’ residues in the interface to those on the surface of the protein, and those on the periphery of the interface. If the core residues have lower entropy, then the contact is classed as biological. There are simple thresholds for both of these comparisons.

They have three metrics – one structural (number of core residues), and two sequence (entropy of core residues vs. peripheral residues, and entropy of core residues vs. surface residues). They classify based on a majority vote of the three methods. If there are an insufficient number of homologous sequences (i.e. fewer than 8), then they ignore the sequence scores, and classify using the structure only.

So why do we care about protein interfaces? Lots of us work with experimental protein structures. Many of these come from X-ray crystallography experiments. This means that when the structural information is captured, the protein is not isolated – instead it is packed against many other copies of itself. A bit like a brick in a wall – a wall many rows of bricks deep. So our protein is in contact with many others. Some of these contacts occur within the natural environment of the cell, others are a result of the crystal packing structure.
Now, protein interfaces are important. ‘Why?’, I hear you ask. Interfaces are how proteins interact with each other, and with other molecules. Some interfaces are large, some small, some are involved in transient interactions, others in permanent ones. Many diseases occur due to amino acid mutations at these interfaces. These change how the protein functions, which can cause bad things to happen. Similarly, if we can learn how to interact with an interface, we are (theoretically) able to alter the function with some sort of drug, and cause good things (whilst avoiding bad things).

So, this raises a few questions for people who study protein crystal structures. Things like, which bits of my protein interact? Do these interactions happen in a cell? Is my structure somehow distorted by the crystal packing? This EPPIC tool gives us one way to answer these.