# Seventh Joint Sheffield Conference on Cheminformatics Part 1 (#ShefChem16)

In early July I attended the the Seventh Joint Sheffield Conference on Cheminformatics. There was a variety of talks with speakers at all stages of their career. I was lucky enough to be invited to speak at the conference, and gave my first conference talk! I have written two blog posts about the conference: part 1 briefly describes a talk that I found interesting and part 2 describes the work I spoke about at the conference.

One of the most interesting parts of the conference was the active twitter presence. #ShefChem16. All of the talks were live tweeted which provided a summary of each talk and also included links to software or references. It also allowed speakers to gain insight and feedback on their talk instantly.

One of the talks I found most interesting presented the Protein-Ligand Interaction Profiler (PLIP). It is a method for the detection of protein-ligand interactions. PLIP is open-source and has a web-based online tool and a command-line tool. Unlike PyMol which only calculates polar contacts, and not the type of interaction, PLIP calculates 8 different types of interactions: hydrogen bonding, hydrophobic, π-π stacking, π-cation interactions, salt bridges, water bridges, halogen bonds, metal complexes. For a given pdb file the interactions are calculated and shown in a publication quality figure shown here.

The display can also be downloaded as a PyMol session so the display can be modified.

This tool is an extremely useful way to calculate protein-ligand interactions and can be used to find the types of interactions formed by the protein-ligand complex.

PLIP can be found here: https://projects.biotec.tu-dresden.de/plip-web/plip/

# Tracked changes in LaTeX

Maybe people keep telling you Word is great but you are just too emotionally attached to LaTeX to consider using anything else. It just looks so beautiful. Besides, you would have to leave your beloved linux environment (maybe that’s just me), so you stick with what you know. You work for many weeks long and hard, finally producing a draft of a paper that gets the all clear from your supervisor to submit to journal X. Eventually you hear back and the reviewers have responded with some good ideas and a few pedantic points. Apparently this time the journal wants a tracked changes version to go with your revised manuscript.

Highlighting every change sounds like a lot of bother, and besides, you’d have to process the highlighted version to generate the clean version they want you to submit alongside it. There must be a better way, and one that doesn’t involve converting your document to Word.

Thankfully, the internet has an answer! Check out this little package changes which will do just what you need. As long as you annotate using \deleted{}, \replaced{} and \added{} along the way, you will have to change just one word of your tex source file in order to produce the highlighted and final versions. It even comes with a handy bash script to get rid of the resulting mess when you’re happy with the result, leaving you with a clean final tex source file.

The die-hard Word fans won’t be impressed, but you will be very satisfied that you have found a nice little solution that does just the job you want it to. It’s actually capable of much more, including comments by multiple authors, customisation of colours and styles, and an automatically generated summary of changes. I have heard good things about ShareLaTeX for collaboration, but this simple package will get you a long way if you are not keen to start paying money yet.

# Rational Design of Antibody Protease Inhibitors

On the theme of my research area and my last presentation I talked at group meeting about a another success story in structurally designing an antibody by replicating a general protein binding site using grafted fragments involved in the original complex. The paper by Liu et al is important to me for two major reasons . Firstly they used an unconventional antibody for protein design, namely a bovine antibody which is known to have an extended CDR H3. Secondly the fragment was not grafted at the anchor points of the CDR loop.

SFTI-1 is a cyclic peptide and a known trypsin inhibitor. It’s structure is stabilised by a disulphide bridge. The bovine antibody is known to have an extended H3 loop which is essentially a long beta strand stalk with a knob domain at the end. Liu et al removed the knob domain and a portion of the B strand and grafted the acyclic version of the SFTI-1 to it. As I said above this result is very important because it shows we can graft a fragment/loop at places different then the anchor points of the CDR. This opens up the possibility for more diverse fragments to be grafted because of new anchor points,  and also because the fragment will sit further away from the other CDRs and the framework allowing more conformational space. To see how the designed antibody compares to the original peptide they measured the Kd and found a 4 fold increase (9.57 vs 13.3). They hypothesise that this is probably due to the fact that the extended beta strand on the antibody keeps the acyclic SFTI-1 peptide in a more stable conformation.

The problem with the bovine antibody is that if inserted in a human subject it would probably elicit an immune response from the native immune system. To humanise this antibody they found the human framework which shares the greatest sequence identity to the bovine antibody and then grafted the fragment on it. The human antibody does not have an extended CDR H3 and to decide what is the best place of grafting they tried various combinations again showing again that the fragments do not need to grafted exactly at the anchor points. Some of the resulting  human antibodies showed even more impressive Kds.

The best designed human antibody had a 0.79nM Kd, another 10-fold gain . Liu et al hypothesised that this might be due to the fact that the cognate protein forms contacts with residues on the other CDRs even though there is no crystal structure to show this. In order to test this hypothesis they mutated surface residues on the H2 and L1 loop to Alanine which resulted in a 6.7 fold decrease in affinity. The only comment I would have to this is that the mutations to the other CDRs might have destabilized the other CDRs on the antibody which could be the reason for the decrease in affinity.

# Quantifying dispersion under varying instrument precision

Experimental errors are common at the moment of generating new data. Often this type of errors are simply due to the inability of the instrument to make precise measurements. In addition, different instruments can have different levels of precision, even-thought they are used to perform the same measurement. Take for example two balances and an object with a mass of 1kg. The first balance, when measuring this object different times might record values of 1.0083 and 1.0091, and the second balance might give values of 1.1074 and 0.9828. In this case the first balance has a higher precision as the difference between its measurements is smaller than the difference between the measurements of balance two.

In order to have some control over this error introduced by the level of precision of the different instruments, they are labelled with a measure of their precision $latex 1/\sigma_i^2$ or equivalently with their dispersion $latex \sigma_i^2$ .

Let’s assume that the type of information these instruments record is of the form $latex X_i=C + \sigma_i Z$,  where $latex Z \sim N(0,1)$ is an error term, $latex X_i$ its the value recorded by instrument $latex i$ and where $latex C$ is the fixed true quantity of interest the instrument  is trying to measure. But, what if $latex C$ is not a fixed quantity? or what if the underlying phenomenon that is being measured is also stochastic like the measurement $latex X_i$. For example if we are measuring the weight of cattle at different times, or the length of a bacterial cell, or concentration of a given drug in an organism, in addition to the error that arises from the instruments; there is also some noise introduced by dynamical changes of the object that is being measured. In this scenario, the phenomenon of interest, can be given  by a random variable $latex Y \sim N(\mu,S^2)$. Therefore the instruments would record quantities of the form $latex X_i=Y + \sigma_i Z$.

Under this case, estimating the value of $latex \mu$, the expected state of the phenomenon of interest is not a big challenge. Assume that there are $latex x_1,x_2,…,x_n$ values observed from realisations of the variables $latex X_i \sim N(\mu, \sigma_i^2 + S^2)$, which came from $latex n$ different instruments. Here $latex \sum x_i /n$ is still a good estimation of $latex \mu$ as $latex E(\sum X_i /n)=\mu$.  Now, a more challenging problem is to infer what is the underlying variability of the phenomenon of interest $latex Y$. Under our previous setup, the problem is reduced to estimating $latex S^2$ as we are assuming $latex Y \sim N(\mu,S^2)$ and that the instruments record values of the from $latex X_i=Y + \sigma_i Z$.

To estimate $latex S^2$ a standard maximum likelihood approach could be used, by considering the likelihood function:

$latex f(x_1,x_2,..,x_n)= \prod e^{-1/2 \times (x_i-\mu)^2 /(\sigma_i^2+S^2)} \times 1/\sqrt{2 \pi (\sigma_i^2+S^2) }$,

from which the maximum likelihood estimator of $latex S^2$ is given by the solution to

$latex \sum [(X_i- \mu)^2 – (\sigma_i^2 + S^2)] /(\sigma_i^2 + S^2)^2 = 0$.

Another more naive approach could use the following result

$latex E[\sum (X_i-\sum X_i/n)^2] = (1-1/n) \sum \sigma_i^2 + (n-1) S^2$

from which $latex \hat{S^2}= (\sum (X_i-\sum X_i/n)^2 – ( (1-1/n ) \sum(\sigma_i^2) ) ) / (n-1)$.

Here are three simulation scenarios where 200 $latex X_i$ values are taken from instruments of varying precision or variance $latex \sigma_i^2, i=1,2,…,200$ and where the variance of the phenomenon of interest $latex S^2=1500$. In the first scenario $latex \sigma_i^2$ are drawn from $latex [10,1500^2]$, in the second from $latex [10,1500^2 \times 3]$ and in the third from $latex [10,1500^2 \times 5]$. In each scenario the value of $latex S_2$ is estimated 1000 times taking each time another 200 realisations of $latex X_i$. The values estimated via the maximum likelihood approach are plotted in blue, and the values obtained by the alternative method are plotted in red. The true value of the $latex S^2$ is given by the red dashed line across all plots.

First simulation scenario where $latex \sigma_i^2, i=1,2,…,200$ in $latex [10,1500^2]$. The values of  $latex \sigma_i^2$ plotted in the histogram to the right. The 1000 estimations of $latex S$ are shown by the blue (maximum likelihood) and red (alternative) histograms.

First simulation scenario where $latex \sigma_i^2, i=1,2,…,200$ in $latex [10,1500^2 \times 3]$. The values of $latex \sigma_i^2$ plotted in the histogram to the right. The 1000 estimations of $latex S$ are shown by the blue (maximum likelihood) and red (alternative) histograms.

First simulation scenario where $latex \sigma_i^2, i=1,2,…,200$ in $latex [10,1500^2 \times 5]$. The values of $latex \sigma_i^2$ plotted in the histogram to the right. The 1000 estimations of $latex S$ are shown by the blue (maximum likelihood) and red (alternative) histograms.

For recent advances in methods that deal with this kind of problems, you can look at:

Delaigle, A. and Hall, P. (2016), Methodology for non-parametric deconvolution when the error distribution is unknown. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78: 231–252. doi: 10.1111/rssb.12109