A few months back I moved from the Oxford BRC to OPIG, both within the university of Oxford, but like many in academia I have moved across a few universities. As this is my first post here I wanted to do something neat: a JS tool that swapped colours in university logos!
It was a rather laborious task requiring a lot of coding, but once I got it working, I ended up tripping up at the last metre. So for technical reasons, I have resorted to hosting it in my own blog (see post), but nevertheless the path towards it is worth discussing.
Category Archives: Code
Does ChatGPT know how to translate images?
Yesterday I spent a couple of hours playing with ChatGPT. I know, we have some other recent posts about it. It’s so amazing that I couldn’t resist writing another. Apologies for that.
The goal of this post is to determine if I can effectively use ChatGPT as a programmer/mathematician assistant. OK. It was not my original intention, but let’s pretend it was, just to make this post more interesting.
So, I started asking a few very simple programming answers like the following:
Can you implement a function to compute the factorial of a number using a cache? Use python.
And this is what I got.

A clear and efficient implementation of the factorial. This is the kind of answer you would expect from a first year CS student.
Continue readingCleaning outliers in conductance timeseries from molecular dynamics
Have you ever had an annoying dataset that looks something like this?

or even worse, just several of them

In this blog post, I will introduce basic techniques you can use and implement with Python to identify and clean outliers. The objective will be to get something more eye-pleasing (and mostly less troublesome for further data analysis) like this
Continue readingCodeQL analyses your code to find common errors
This post is pretty much an ad for a very useful tool developed by GitHub that helps you find errors or vulnerabilities in your code by querying it as if it were data. I have personally found it very useful in finding small errors in my code and would recommend everyone to use it. If you want to check it out, this is their webpage.
Continue readingHow to turn a SMILES string into an extended-connectivity fingerprint using RDKit
After my posts on how to turn a SMILES string into a molecular graph and how to turn a SMILES string into a vector of molecular descriptors I now complete this series by illustrating how to turn the SMILES string of a molecular compound into an extended-connectivity fingerprint (ECFP).
ECFPs were originally described in a 2010 article of Rogers and Hahn [1] and still belong to the most popular and efficient methods to turn a molecule into an informative vectorial representation for downstream machine learning tasks. The ECFP-algorithm is dependent on two predefined hyperparameters: the fingerprint-length L and the maximum radius R. An ECFP of length L takes the form of an L-dimensional bitvector containing only 0s and 1s. Each component of an ECFP indicates the presence or absence of a particular circular substructure in the input compound. Each circular substructure has a center atom and a radius that determines its size. The hyperparameter R defines the maximum radius of any circular substructure whose presence or absence is indicated in the ECFP. Circular substructures for a central nitrogen atom in an example compound are depicted in the image below.

Unreasonably faster notes, with command-line fuzzy search
A good note system should act like a second brain:
- Accessible in seconds
- Adding information should be frictionless
- Searching should be exhaustive – if it’s there, you must find it
The benefits of such a note system are immense – never forget anything again! Search, perform the magic ritual of Copy Paste, and rejoice in the wisdom of your tried and tested past.
But how? Through the unreasonable effectiveness of interactive fuzzy search. This is how I have used Fuz, a terminal-based file fuzzy finder, for about 4 years.
Briefly, Fuz extracts all text within a directory using ripgrep, enables interactive fuzzy search with FZF, and returns you the selected item. As you type, the search results get narrowed down to a few matches. Files are opened at the exact line you found. And it’s FAST – 100,000 lines in half a second fast.

How to build a Python dictionary of residues for each molecule in PyMOL
Sometimes it can be handy to work with multiple structures in PyMOL using Python.
Here’s a snippet of code you might find useful: we iterate over all the α-carbon atoms in a protein and append to a list tuples such as (‘GLY’, 1). The dictionary, ‘reslist’, returns a list of residue names and indices for each molecule, where the key is a string containing the name of the molecule.
from pymol import cmd # Create a list of all the objects, called 'mpls': mols = cmd.get_object_list('*') # Create an empty dictionary that will return a list of residues # given the name of the molecule object reslist = {} # Set the dictionaries to be empty lists for m in mols: reslist[m] = [] # Use PyMOL's iterate command to go over every α-Carbon and append # a tuple consisting of the each residue's residue name ('resn') and # residue index ('resi '): for m in mols: cmd.iterate('%s and n. ca'%m, 'reslist["%s"].append((resn,int(resi)))'%m)
This script assumes you only have protein molecules loaded, and ignores things like chain ID and insertion codes.
Once you have your list of residues, you can use it with the cmd.align
command, e.g., to align a particular residue to a reference structure.
Running code that fails with style
We have all been there, working on code that continuously fails while staring at a dull and colorless command-line. However, we are in luck, as there is a way to make the constant error messages look less depressing. By changing our shell to one which enables a colorful themed command-line and fancy features like automatic text completion and web search your code won’t just fail with ease, but also with style!
A shell is your command-line interpreter, meaning you use it to process commands and output results of the command-line. The shell therefore also holds the power to add a little zest to the command-line. The most well-known shell is bash, which comes pre-installed on most UNIX systems. However, there exist many different shells, all with different pros and cons. The one we will focus on is called Z Shell or zsh for short.
Zsh was initially only for UNIX and UNIX-Like systems, but its popularity has made it accessible on most systems now. Like bash, zsh is extremely customizable and their syntax so similar that most bash commands will work in zsh. The benefit of zsh is that it comes with additional features, plugins and options, and open-source frameworks with large communities. The framework which we will look into is called Oh My Zsh.
Continue readingHow to make your own singularity container zero fuss!
In this blog post, I’ll show you guys how to make your own shiny container for your tool! Zero fuss(*) and in FOUR simple steps.
As an example, I will show how to make a singularity container for one of our public tools, ANARCI, the antibody numbering tool everyone in OPIG and external users are familiar with – If not, check the web app and the GitHub repo here and here.
(*) Provided you have your own Linux machine with sudo
permissions, otherwise, you can’t do it – sorry. Same if you have a Mac or Windows – sorry again.
BUT, there are workarounds for these cases such as using the remote singularity builder here, for which you only need to sign up and create an account, and the use of Virtual Machines (VMs), as described here.