Category Archives: Code

SAbBox – the easy way to obtain our antibody tools

A significant part of the work we do here in OPIG revolves around antibodies, the proteins of the immune system that bind to and help remove any foreign entities that find their way into the body. Since antibodies can be developed that target basically anything, they have become extremely useful as therapeutics. In our research, we develop computational tools that can be incorporated into various points along the antibody discovery pipeline. These tools include our database of antibody structures, SAbDab, and a series of predictive tools (e.g. structural modelling algorithms like ABodyBuilder) which are known collectively as SAbPred.

Continue reading →

A Gentle Introduction to the GPyOpt Module

Manually tuning hyperparameters in a neural network is slow and boring. Using Bayesian Optimisation to do it for you is slightly less slower and you can go do other things whilst it’s running. Susan recently highlighted some of the resources available to get to grips with GPyOpt. Below is a copy of a Jupyter Notebook where we walk through a couple of simple examples and hopefully shed a little bit of light on how the algorithm works.

Continue reading →

Three things to help you get started on Bayesian Optimisation

In this blog post I will share with you the materials that I found most useful when I started doing some Bayesian Optimisation in my research. Bear in mind, I am a Chemist by training, so I approached this topic from a non-mathematical background (my eyes have to be persuaded to look at mathematical equations). Out of all the materials I have come across, I found these to be the most accessible.

Continue reading →

Combining Inset Plots with Facets using ggplot2

I recently spent some time working out how to include mini inset plots within ggplot2 facets, and I thought I would share my code in case anyone else wants to achieve a similar thing. The resulting plot looks something like this:

Continue reading →

How to Iterate in PyMOL

Sometimes pointing-and-clicking just doesn’t cut it. With PyMOL’s built-in Python interpreter, repetitive actions are made simple.

Continue reading →

Trying out some code from the Eighth Joint Sheffield Conference on Chemoinformatics: finding the most common functional groups present in the DSPL library

Last month a bunch of us attended the Sheffield Chemoinformatics Conference. We heard many great presentations and there were many invitations to check out one’s GitHub page. I decided now is the perfect time to try out some code that was shown by one presenter.

Peter Ertl from Novartis presented his work on the The encyclopedia of functional groups. He presented a method that automatically detects functional groups, without the use of a pre-defined list (which is what most other methods use for detecting functional groups). His method involves recursive searching through the molecule to identify groups of atoms that meet certain criteria. He used his method to answer questions such as: how many functional groups are there and what are the most common functional groups found in common synthetic molecules versus bioactive molecules versus natural products. Since I, like many others in the group, are interested in fragment libraries (possibly due to a supervisor in common), I thought I could try it out on one of these.

Continue reading →

Searching through large databases with bloom filter

Searching through large databases is often a linear time problem. Here I compare the performance of applying a bloom filter and using the regular std::find command in C++:Codes are from: https://codereview.stackexchange.com/questions/179135/bloom-filter-implementation-in-c

Continue reading →

Constrained Embedding with RDKit

This blog post explores the RDKit function ConstrainedEmbed.

Continue reading →

A Brief Introduction to ggpairs

In this blog post I will introduce a fun R plotting function, ggpairs, that’s useful for exploring distributions and correlations.

Continue reading →

Should scientists learn C++?

Conventional wisdom dictates that compiled languages are slow to develop, can be slow to compile, but are fast to run. Interpreted languages are easy to use and do not require compilation but have sluggish performance. Like most people in scientific computing, the first two languages I learned were C++ and Python; I use Python every day but when, if ever, would I use C++?

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Category Archives: Code

SAbBox – the easy way to obtain our antibody tools

A Gentle Introduction to the GPyOpt Module

Three things to help you get started on Bayesian Optimisation

Combining Inset Plots with Facets using ggplot2

How to Iterate in PyMOL

Trying out some code from the Eighth Joint Sheffield Conference on Chemoinformatics: finding the most common functional groups present in the DSPL library

Searching through large databases with bloom filter

Constrained Embedding with RDKit

A Brief Introduction to ggpairs

Should scientists learn C++?