Monthly Archives: May 2013

Protein kinases, the PIM story

Last week I was presenting my DPhil work. In one of my projects I address the reasons for inhibitor selectivity in PIM protein kinase family. PIM kinases play key roles in signalling pathways and have been identified as oncogenes long time ago. Slightly unusual for protein kinases ATP-binding sites and cancer roles have prompted the investigation of potential PIM-selective inhibitors for anticancer therapy. Due to overlapping functions of the three PIM isoforms, efficacious inhibitors should bind to all three isozymes. However, most reported inhibitors show considerable selectivity for PIM1 and PIM3 over PIM2 and the mechanisms leading to this selectivity remain unclear.

Figure 1. Workflow of the sequence and structure analysis of PIM kinases

Figure 1. Workflow of the sequence and structure analysis of PIM kinases

To establish the sequence determinants of inhibitor selectivity we investigated the phylogenetic relationships of PIM kinases and their structural conformations upon ligand binding (Figure 1). Together with my OPIG supervisor Charlotte Deane we predicted a set of candidates for site-directed mutagenesis as illustrated in Figure 2. The mutants were designed to convert PIM1 residues into analogous PIM2 residues at the same positions.

I then moved to the wetlab to test the hypotheses experimentally. Under guidance of Oleg Fedorov, I screened the SGC library of kinase inhibitors using differential scanning fluorimetry (DSF). After comparing melting temperature shift values across the PIM kinases and mutants, a set of potent inhibitors with different chemical scaffolds have been selected for quantitative binding analysis. I worked with Peter Drueker’s team at Novartis on PIMs enzymology, where I measured activities, Km values for ATP and IC50s using mobility shift assay. For my final set of measurements I performed isothermal titration calorimetry (ITC) experiments back at the SGC and determined binding constants and enthalpic/entropic contributions to the total free energy of ligand binding.

Figure 2. An overlay of PIM1 and PIM2 structures (P-loop and hinge regions), the mutated residues are shown as sticks

Figure 2. An overlay of PIM1 and PIM2 structures (P-loop and hinge regions), the mutated residues are shown as sticks

The data are yet to be published, I only briefly state the results here. The hinge mutant E124L demonstrated reduced thermal stability probably due to removal of E124-R122 salt bridge. The P-loop mutants had intermediate Km ATP values between PIM1 and PIM2, indicating that those residues could be responsible for stronger ATP binding in PIM2. As shown in Figure 2, the residues are located at the tip of the P-loop and might have involvement in the P-loop movement. Importantly, three mutants have shown reduced affinity to inhibitors validating my initial hypotheses.

Ideally having PIM1 and PIM2 co-crystal structures with the same inhibitors would allow direct comparison of the binding modes. So far I was able to solve apo-PIM2 structure in addition to the single PIM2 pdb, which will be deposited shortly.

I will update you soon about on my second project which involves more mutants, type II inhibitors, equilibrium shifts and speculations about conformational transitions. Keep visiting us!

Inside Memoir: MP-T aligns membrane proteins

Although Memoir has received a lot of air-time on this blog, we haven’t gone into a great deal of detail about how it models membrane proteins. Memoir is a pipeline involving a series of programs iMembrane -> MP-T -> Medeller -> Fread, and in this post I’ll explain the MP-T step (I’ll briefly touch on Medeller too).

Let’s first look at the big-picture. There are several ways of modelling a protein’s 3D structure. In an ideal world we could specify an extended polypeptide, teach a computer some physics, set if off simulating, and watch the exact folding pathway of a protein. This doesn’t work. A second method would be to build up a protein from lots of fragments of unrelated proteins… this is usually what is meant by ‘ab initio’ modelling. The most accurate (and least sophisticated) approach is to find a protein of known structure with similar sequence, align the sequences, and copy over the coordinates of the aligned residues to make a model for the query protein. This is the approach taken by Memoir and is called homology modelling or comparative modelling.

The diagram below shows an example of how homology modelling might work. Four membrane protein sequences are aligned (left) and the alignment specifies a structural superposition (right). Assume now that the red structure is unknown: we could make a good model for it just by copying over the aligned parts of the blue, green and yellow structures.

Screen shot 2013-05-20 at 14.26.06
The greatest difficulty in the modelling described above is making an accurate alignment. As sequences become more distantly related they share less and less sequence identity, and working out the optimum alignment becomes challenging. This problem is especially acute for membrane protein modelling: there are so few structures from which to copy coordinates that a randomly chosen query protein has a good chance of having <30% sequence identity to the nearest related structure.

Although alignment is the most important facet of homology modelling it is not the only consideration. In the above diagram the centres of the proteins are structurally very conserved (so copying coordinates will lead to a good model in this region), but the top of the proteins differ (the stringy loops don’t sit on top of one another). It is the role of coordinate generation software to distinguish which coordinates to copy. It turns out that the pattern of a conserved centre and varying top/bottom is generally true for membrane proteins, and Memoir uses our Medeller coordinate generation software to take advantage of this pattern.

Back then to alignment. The aim of alignment is to work out which amino acids in one protein are related to amino acids in another. All alignment methods have at their heart a set of scores which encode the propensity for one amino acid to mutate to another, and for that mutation to become fixed in a population. These scores form a substitution table (here mutation + fixation = substitution). More sophisticated alignment methods augment these scores in different ways — for example by adding in scoring based on secondary structure, smoothing scores over a window, or estimating a statistical supplement to the score determined from a related set of pre-aligned sequences — but at some level a substitution table is always present. Using a substitution table, the most likely evolutionary relationship between two sequences can be detected and this is reported in the form of an alignment.

So that’s general alignment, now to apply this to membrane proteins. The cell membrane is composed of a lipid bilayer: a sandwich with a hydophobic filling and hydrophilic crusts. The part of a membrane protein that touches the filling will have different preferences for amino acids (and, more importantly, substitutions between these amino acids) than the part of a membrane protein that touches the crust. Similarly there are systematic preferences for amino acid substitutions depending on whether part of a protein is buried or exposed, and on which type of secondary structure it assumes. The figure below shows a membrane protein with different regions of the membrane and different types of secondary structure annotated.

Screen shot 2013-05-20 at 14.27.19 Screen shot 2013-05-20 at 14.28.42


It is possible to make separate substitution tables for each environment within a membrane protein, where an environment specifies where the protein sits in the membrane, what secondary structure it has, and whether it is accessible or buried. Below is a principal components analysis of the resulting set of tables: each table is represented by a single point and the axes show the direction of the greatest variation between the tables. The plot on the right shows a separation of the points based on whether they are buried (more hydrophobic) or accessible (more hydrophilic). The hydrophobic centre (red circles) and hydrophilic edges (green circles) of the membrane fall into this general pattern. The table on the left shows that the tables further divide by secondary structure type. In summary there are systematic substitution preferences in practice as well as theory, and for membrane proteins it is most important to consider hydrophobicity when aligning two protein sequences.

Screen shot 2013-05-20 at 14.30.09

On then to modelling. The conventional approach to aligning a pair of sequences for homology modelling is to take a set of pre-aligned sequences (a sequence profile), and use them to estimate a supplement to the standard substitution score for aligning two sequences. This is termed profile-profile alignment. Memoir takes a different approach by using the MP-T program to construct a multiple sequence alignment scored with environment-specific substitution tables. The alignment includes a set of homologous sequences to the pair of interest.

Profile-profile alignment methods and MP-T are very different. It is unclear whether the substitution preferences at a position are best estimated by MP-T’s tables or the supplements derived from sequence profiles, and the answer probably depends on how well the profiles are made — garbage in, garbage out. Similarly the MP-T algorithm only determines the upper limit of alignment accuracy, and the actual accuracy depends on how the homologous sequences in the alignment are chosen.

In general we find little difference between the fraction of an alignment that MP-T and either HHsearch or Promals (profile-profile alignment methods) gets right. However we do find a difference in the fraction of the alignment that these methods get wrong (part of an alignment can be right, wrong or simply not aligned, so it’s possible to get a lower fraction wrong whilst getting the same fraction right). It turns out that on average MP-T gets less of an alignment wrong for simple reasons of combinatorics: for a pair of proteins, the number of possible multiple sequence alignments is much greater than the number of possible profile-profile alignments. This means that, just by chance, the number of incorrectly aligned positions between the two sequences of interest will be lower for MP-T than for a conventional profile-profile alignment method.

Now for a little sales-pitch. The source code for MP-T is freely available and easy to expand (if you have a passing familiarity with Haskell). Only two or three lines of code need to be changed to define a new set of protein environments, and to feed it a substitution table for each environment. I’d be happy to help anyone who wants to try it out.

[Publication] Memoir: template-based structure prediction for membrane proteins

Congratulations to all involved in the Memoir publication in Nucleic Acids Research! 

Memoir is a web server which builds homology models for membrane proteins.  It is a web-enabled workflow combining some of OPIG’s software; MP-T, IMembrane, Medeller & Fread.  The inputs are a sequence of the membrane protein you wish to model (target) and a PDB file to use as template.

Memoir may be found here and there is also a video tutorial narrated by Jamie.  There is even a funny blooper of him practising, which I kept to celebrate this moment.

Happy modelling!


Free food!

Yesterday I walked into Group Meeting not having read Bernhard‘s paper (shameful, I know), and I was immediately asked “Where is the Daleks post on the blog?”.  To which I mumbled something unconstrued, because I am not sure what a Dalek is and because I didn’t know we were doing post requests.

Anyway, at every group meeting one of us is responsible to organise the talk and another to supply food.  The only current rule (since the well-received demise of the “No alcohol” one)  is: “No tomatoes“.  We’ve had a number of original and tasty contributions: Dominos pizza, Ben’s and Millies cookies, truckloads of Haribos, Krispy Kreme Doughnuts, Sushi, Nutella baguettes and home-baked delights.

But Eoin‘s contribution takes the prize this round (a small trophy in Lab Room #1).


Eoin’s Dr. Who Daleks sugar rush inducing cakes (click for the juicy detail).

So, a small pointer to OPIG prospective students – “baking” and “creative thinking” skills are really well appreciated and look good on your CV!


Journalclub: Molecular Dynamics simulations of TCRpMHC


T cells recognize fragments of pathogen (peptides) presented by the Major Histocompatibility Complex (MHC) via their T-cell receptor (TCR). This interaction process is commonly considered as one of the most important events taken place in the adaptive immune reaction.


Molecular Dynamics simulations are a computational technique to simulate the movement of atoms over time. For this purpose the interaction energies (bond and non-bond) between the single atoms are calculated and the spatial position are adjusted during each iteration. Such simulations are very resource and time consuming but provide insights into interaction processes which can not be obtained by any currently available experimental technique.

In this journal club we discussed 3 different papers dealing with MD simulations of the TCRpMHC complex:

A typical story

Epitope Flexibility and Dynamic Footprint Revealed by Molecular Dynamics of a pMHC-TCR Complex
Reboul et al., Plos Comp. Biol. 2012

Like similarly done by many other authors before Reboul et al. performed MD simulations of two different (however very similar MHCs) in complex with the same viral peptide. While no immune reaction is caused if the peptide is presented by HLA-B*3501 there is an reaction induced if presented in the context of HLA-B*3508.

In their MD simulations the authors find minor differences in the RMSF and claim this to be systematic and the cause for the different behaviour.

An innovative story

Toward an atomistic understanding of the immune synapse: Large-scale molecular dynamics simulation of a membrane embedded TCR–pMHC–CD4 complex
Wan et al., Molecular Immunology 2008

While several PDB structures of parts of the core of the immunological synapse are available (see image below). On overall structure was not published before this paper. This is addressed by the authors by means of superimposition, modelling of linking and trans-membrane regions, and subsequent MD simulation. The resulting structure seems to be in good agreement with experimental electron microscopy data.


My story

Early relaxation dynamics in the LC 13 T cell receptor in reaction to 172 altered peptide ligands: A molecular dynamics simulation study
Knapp et al., Plos One 2013

In most studies authors compare the same MHC but with two or three different peptides or the same peptide bound to 2 MHCs. In some cases also the same peptide and MHC are simulated in interaction with 2 different TCRs. Given the fact that the TCRpMHC consists of roughly 800 AAs one will almost certainly find some differences between those two or three simulations (multiple testing). Differences would also be present if one simulates the same complex twice with different starting velocities or more extreme even if one parametrizes the same velocities but different hardware is used. Yes, also in this case this may lead to slightly different results. On this basis such studies (if published without further experimental data to undermine the findings) are at best anecdotal stories.

Therefore we indented to address this challenge in a more systematic way: We simulated the LC 13 TCR / HLA-B*08:01 system in complex with all possible single point mutations in the EBV peptide FLRGRAYGL. This leads to a total of 172 highly related MD simulations where for each of them the experimental immunogenicity is known. Based on their immunogencity we assigned each simulations to either the more immunogenic (moreI) or less immunogenic (lessI) group. This was repeated for several thresholds.  Further analysis on the basis of RMSD maps and permutation tests showed that moreI and lessI groups were significantly different in their initial relaxation dynamics from the (perturbed) x-ray structure.


They were not only significantly different but they also showed a quite interesting pattern in their most frequently different regions (highlighted in green):