Monthly Archives: June 2013

Evolutionary fold space preferences

At group meeting last week I focussed, alongside some metaphysical speculation, on a project which has occupied the first half of my DPhil: namely exploring the preferences of both very old and very young protein structures. This work is currently in preparation for publication so I will give only a brief overview and hopefully update the juicy details later. Feel free to contact me for more information.

Proteins are the molecular machinery of the cell. Their evolution is one of the most fundamental processes which has delivered the diversity and complexity of life that we see around ourselves today. Despite this diversity, protein domains (independent folding units) of known structure fall into just over 1,000 unique SCOP folds.

This project has sought to identify how populations of proteins at different stages of evolution explore their possible structure space.

Superfamily ages

Structural domains are clustered at different levels of similarity within the SCOP classification. At the superfamily level this classification attempts to capture evolutionary relationships through structural and functional similarities even if sequence diversion has occurred.

Evolutionary ages for these superfamilies are then estimated from their phylogenetic profiles across the tree of life. These ages are an estimate of the structural ancestor for a superfamily.




The phylogenetic occurrence profiles are constructed using predictions of superfamilies on completely sequenced genomes using HMMs and taken from the SUPERFAMILY database. Given an occurrence profile and a phylogenetic tree (for robustness we consider several possible reconstructions of the tree of life) we use a maximum parsimony algorithm (proposed by Mirkin et. al) which estimates the simplest scenario of loss events (domain loss on a genome) and gain events (domain gain) at internal nodes on the tree which explains the occurrence profile. The age estimate is the height of the first gain event, normalised between 0 (at the leaves of the tree) and 1 (at the root).

We estimated ages for 1,962 SCOP superfamilies and compared several properties relating to their primary, secondary and tertiary structures, as well as their functions. In particular, we compared two populations of superfamilies: ancients, with an age of 1, and new-borns, with an age < 0.4. Full details of our results will hopefully be published shortly so watch this space!

Antimicrobial Drug Discovery Conference (Madrid)

I am a big fan of taking something, either a poster or a talk, to a conference, and getting something back – other than a €6 box of airport chocolates.  This blog post is in that spirit.

On the plane to the “Antimicrobial Drug Discovery” conference in Madrid I was reading the Cassandra Project (a novel on smallpox, how apt) instead of the stack overflow of scientific papers I planned to read.  Classic JP.

The conference had a mix of experienced, invited speakers and early stage researchers.  It was very “biological” for a computational scientist, so quite removed from what I normally do – but an opportunity to learn nonetheless.

The keynote lecture was by Julian Davies, a fantastic speaker who gave a general overview of antibiotics and antibiotic resistance.  Antibiotic resistance is a real concern (even those politicians in the G8 noticed a few hours ago!) and there is a fear we might return to pre-antibiotics era when you could not cure common diseases like bacterial pneumonia.  Pharmaceutical companies all got out of antibiotic research years ago, and there have been no new antibiotic scaffolds for more than a decade.  I found this surprising as you would think that there was a truckload of money to be made from finding the new penicillin.  Apparently, there is little return in anti-infectives because of rapid mutation of the pathogen and its short-term use (curing the infection, as opposed to having to take your medication for life, such as beta-blockers for hyperventilation).  Bacteria should not only be considered at an individual cell level but also as a population with complex signalling between the individuals (which may offer a way to stop bacterial infection).  In order to combat infections and increasing resistance sick patients are now supplied with combinations of drugs – this is still dangerous due to the possible (toxic) drug-drug interactions.

Natural products, e.g. some toxins, are good antibiotics but it is very hard to optimize such compounds to improve their drug profile (chemical synthesis of natural products is difficult).  Also a lot of people at the conference were talking of how antimicrobial peptides will save the day.  The attendees with drug discovery experience raised an eyebrow about this, knowing how hard it will be to make a 30 residue peptide into a drug.

Some antibiotics work by having a hydrophilic part (e.g. carboxyl) and a hydrophobic part (e.g. an alkane chain).  This hydrophobic part sits in the membrane wall disrupting it, which creates a “leak” from the bacteria which eventually kills the pathogen.  There are other mechanisms of action such as blocking transporter or signalling channels.

There was a brilliant, energetic talk by Bruno Gonzalez-Zorn with the audience paying rapt attention.  He showed how bacteria have these multiple, small plasmids offering antibiotic resistance.  He discovered there was a common two-part theme to antibiotic resistance, where a particular gene is always present.

Paul Finn gave a much needed talk on why drug discovery is hard (e.g. target selection, difficulty to get drugs in therapeutic area, potency, toxicity, have to optimize for different variables, etc.).  Unknowingly proving this point, there was this earlier talk of a whole optimization series which got a small molecule inhibitor of a viral infection from 150uM down to 1uM (IC50) – a great result in itself, and when the investigators tested this ligand in vivo rather than in vitro it simply did not have any affect on the virus.

Cele Abad Zapatero, one of the main investigators of AltasCBS, made the point that, today, we do not know where we are in drug discovery.  He argued we need to move to chemical-biology space instead of simply chemical space and recommended the use of ligand efficiency indices (e.g. BEI, SEI).

Having fun in Madrid

Madrid was way too much fun.  Zidane (and a few thousand others) kissed this Champions League Cup in exactly the same place. Talking about microbes.  (click to enlarge)

And what did I take to the conference?  I took a poster, the design of which is based on Dunbar’s stylish template.  Marta, Ana and myself won a “highly commendable” poster prize with the best poster going to Laura (Synthetic inhibitors of bacterial cell division targeting the GTP binding site of FtsZ, since you asked).  There were 24 posters in all, and mine was the only computational study in a room otherwise filled with phages, bacteria and plasmids (literally as well as metaphorically).  There is a sinister heart-warming joy in winning a bottle of wine, instead of a cheque or a certificate.  James deserves a sip or two.


Poster Prize Presentation

Cheekily asking for a corkscrew during the poster prize award


[Publication] Effect of Single Amino Acid Substitution Observed in Cancer on Pim-1 Kinase Thermodynamic Stability and Structure

In this study we selected point mutations resulting in Pim-1 variants that are expressed in cancer tissues and reported in SNP databases, such as FastSNP and COSMIC. These Pim-1 variants have been comprehensively characterized to investigate the effect of single amino acid substitution on Pim-1 thermal and thermodynamic stability and structure in solution. Our results indicate that the effects of the mutation observed in cancer tissues cause local changes of tertiary structure, but do not affect binding to type I kinase inhibitors.

This work has been pioneered by researches at the Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Rome and served as an inspiration for one of my thesis chapters.

Standardize a PDB

In many applications you need to constrain PDB files to certain chains. You can do it using this program.

A. What does it do?

Given a pdb file, write out the ATOM and HETATM entries for the supplied chain(s), put them on a single chain with the name provided by the user.

PDB_standardize needs four arguments:

  1. PDB file to constrain.
  2. Chains from the pdb file to constrain.
  3. Output file.
  4. Name of the new chain

B. Requirements:

Biopython – should be installed on your machines but in case you want to use it locally, download the latest version into the’s directory (don’t need to build).

C. Example use:

C.1 Constrain 1A2Y.pdb to chains A and B, placed on a single chain C – write results in constr.pdb

python -f 1A2Y.pdb -c AB -o const.pdb -s C


C.2 Constrain 1ACY to chain L, with a new chain name F, write results in const.pdb – this example shows that the constrainer works well with ‘insertion’ residue numbering as in antibodies where you have 27A, 27B etc.

python -f 1ACY.pdb -c L -o const.pdb -s F


And the Oscar goes to …


I have a bet I can write a blog post in under 20 mins.  Last week I asked Leila Tamara where her blog post was (we aim to have one up every week) and she replied that “It was ready but hadn’t been proof-read yet“.  I guess it goes to show the diligence of the students here at Oxford.  (Some of them, anyway).

So as a Marie Curie Stars ITN Fellow, my time is coming to an end (in mid-September).  Part of my D.Phil was about finding novel anti-malarial inhibitors using computational methods (virtual screening).  No, we haven’t cured it yet.

During a conference in Riga we made a video about the project – I find it funny to see myself on screen, (and to hear my heavily accented commentary).  Pity the Oscars have already been dished out this year!

And a “making of” still

JP make-up


Warning: this post hasn’t been proof-read yet!