OPIGTREAT

On the 19th of March OPIG set off on our group retreat – henceforth referred to as the OPIGTREAT.

We kicked off a little late as apparently Saulo and check in times are not a good combination (though he is an expert at reversing on an icy road).

Jin and Flo gave the first talk on web programming specifically Flask and D3. If I understood correctly flask is a web development framework for python that runs everything on the server side. Whereas D3 is data/driven/document, which appears to be a way of making very pretty things.

Garrett then gave us an impressive overview on the area of docking, thinking about whether docking had improved in the last 10 years. He discussed how docking can be used to both predict the binding mode (the orientation and conformation) as well as the binding affinity. The state of the art appears to be if we are docking a small molecule into approximately the correct binding site a native like pose can be identified but binding affinity prediction in all cases remains challenging.

Mark then attempted the impossible, he tried to give a talk explaining how to give a good talk. In this case in the context of public engagement and taking our work out to schools. I am now versed in the 4 Ms Manageable, Measurable, Made first and Most Important. I am also weirdly aware that my head shouldn’t move when I am teaching.

Ellliot then took us through how we should judge a PDB structure, a really useful skill for everyone in the group. He described measures such as resolution, B factors Rfree, Clash score, Ramachandran outliers, sidechain outliers and RSRZ outliers. Interesting facts that I collected the average resolution of an X-ray structure in the PDB is ~2A and the average Rfree is 0.25. I also learnt of the existence of PDBredo a service that re-refines datasets in the PDB.

Saulo and the Fergi were up next and they treated us all to a short talk and then a Jupyter notebook practical on machine learning. They discussed supervised, unsupervised and reinforcement learning. Giving examples of each and how and when they should/could be used. Claire and I then learnt a great deal about Jupyter notebooks, the most important thing being to press shift enter. Useful facts “out of the bag” is a method for measuring the error of random forests, score using all data points apart from those used to make that tree.  

The evening finished with a film about the evil iniquities of smoking (very high brow stuff!?!).

The second day began with Bernhard (a visitor from the far of land of Barcelona these days) talking to us about his latest research project. As this is his story – no details in the blog.

Claire then gave an update of the talk she gave at the last OPIGTREAT – how to make “stuff” pretty. Obviously a popular topic as we all wish to display our data and findings in a way that is easily interpretable as well as visually appealing. Claire took us through some of the tools to use like ggplot and Pymol – showed us where to find the lists of useful commands and then showed us the types of images you could make if you really put some thought into it.

Anne was up next, she discussed the challenges and opportunities of integrating heterogeneous data sources and she came up with a lot of data sources to think about, running from protein structures, protein interactions, small molecule structures, drug safety, drug targets, functional annotation and pathways. One thing to remember probably don’t tell your boss when she should or shouldn’t be taking notes……

It was then the turn of team networks Javi, James and Lyuba who walked us through the basics of networks and expanded on their uses across multiple data types in biology. They mentioned areas from simple motifs to protein structure, MD simulations, ontologies, disease prediction, drug target identification…. We then had a practical to check we had understood the power of networks! The networks under consideration were dolphins, Myoglobin structure, Facebook data and the mystery voter network (where we discovered that Fergus the first in no way tried to rig the vote for what film to watch).

That afternoon I visited the bird sanctuary just down the road, others went to a gin distillery or on a walk. Top quote of the afternoon was from James “I want the birds to eat from my pants”. I believe he is from one of those countries that has the misguided belief that pants means trousers. Actually I could have a different top quote from Alex about somebody being a cheap ride in his dreams but I think I should pass over that one.

That evening we were treated to a fragment based drug discovery extravaganza headed up Hannah, Susan and Joe. They took us through the use of fragments for drug discovery and then we attempted a practical. I seem to remember that Claire and I once again excelled at shift enter on the Jupyter notebook.

That evening we had a pub quiz, which apparently ended in a draw between all the teams playing. I feel that Claire and Flo as quizmasters might have made a minor miscalculation. I was happy though as I ended up with the minions bowl and cup. I also managed to persuade several grown men to jump and smash chocolate eggs on their heads on the ceiling.

Next morning Alex and Matt were up first. In their talk they demonstrated not only their knowledge on the area of the future immunotherapy repertoire but also their ability to finish each other’s sentences. They gave a really excellent overview of current immunotherapies and where the field is moving and what might be the future. Facts to store in the head, first ever approved AB therapeutic Muromonab (1986). Currently most successful Humira (Adalimumab) from Abbvie worth 18.4b dollars in 2017, this is a fully human AB for autoimmune diseases and binds to the mediator of inflammation (TNF-alpha).

Next up Catherine and Lucian who discussed distributed computing in PySpark, they started by explaining why distributed computing is going to become so important. Basic info by 2025, 100 million to 2 billion human genomes will have been sequenced that is 2 – 40 exabytes of data. They discussed distributed computing vs centralised and Pyspark compared to Hadoop. There was a practical but Mark had to solo perform for the audience leading to one of the top photos of the whole OPIGTREAT.

As a punishment for being in charge I gave the final talk where I discussed future research direction and how you decide what those might be.

So with thanks to all of the group that concludes the OPIGTREAT report.

Author