Running code that fails with style

We have all been there, working on code that continuously fails while staring at a dull and colorless command-line. However, we are in luck, as there is a way to make the constant error messages look less depressing. By changing our shell to one which enables a colorful themed command-line and fancy features like automatic text completion and web search your code won’t just fail with ease, but also with style!

A shell is your command-line interpreter, meaning you use it to process commands and output results of the command-line. The shell therefore also holds the power to add a little zest to the command-line. The most well-known shell is bash, which comes pre-installed on most UNIX systems. However, there exist many different shells, all with different pros and cons. The one we will focus on is called Z Shell or zsh for short.

Zsh was initially only for UNIX and UNIX-Like systems, but its popularity has made it accessible on most systems now. Like bash, zsh is extremely customizable and their syntax so similar that most bash commands will work in zsh. The benefit of zsh is that it comes with additional features, plugins and options, and open-source frameworks with large communities. The framework which we will look into is called Oh My Zsh.

Continue reading →

Universal graph pooling for GNNs

Graph neural networks (GNNs) have quickly become one of the most important tools in computational chemistry and molecular machine learning. GNNs are a type of deep learning architecture designed for the adaptive extraction of vectorial features directly from graph-shaped input data, such as low-level molecular graphs. The feature-extraction mechanism of most modern GNNs can be decomposed into two phases:

Message-passing: In this phase the node feature vectors of the graph are iteratively updated following a trainable local neighbourhood-aggregation scheme often referred to as message-passing. Each iteration delivers a set of updated node feature vectors which is then imagined to form a new “layer” on top of all the previous sets of node feature vectors.
Global graph pooling: After a sufficient number of layers has been computed, the updated node feature vectors are used to generate a single vectorial representation of the entire graph. This step is known as global graph readout or global graph pooling. Usually only the top layer (i.e. the final set of updated node feature vectors) is used for global graph pooling, but variations of this are possible that involve all computed graph layers and even the set of initial node feature vectors. Commonly employed global graph pooling strategies include taking the sum or the average of the node features in the top graph layer.

While a lot of research attention has been focused on designing novel and more powerful message-passing schemes for GNNs, the global graph pooling step has often been treated with relative neglect. As mentioned in my previous post on the issues of GNNs, I believe this to be problematic. Naive global pooling methods (such as simply summing up all final node feature vectors) can potentially form dangerous information bottlenecks within the neural graph learning pipeline. In the worst case, such information bottlenecks pose the risk of largely cancelling out the information signal delivered by the message-passing step, no matter how sophisticated the message-passing scheme.

Continue reading →

Musings on Digital Nomaddery from Seoul

The languorous, muggy heat of the Korean afternoon sun was what greeted me after 13 hour cattle-class flight from a cool, sensible Helsinki night. The goings-on in Ukraine, and associated political turmoil, meant taking the scenic route – avoiding Russia and instead passing over Turkey, Kazakstan and Mongolia – with legs contorted into unnatural positions and sleep an unattainable dream. Tired and disoriented, I relied less on Anna’s expert knowledge of the Korean language than her patience for my jet-lag-induced bad mood and brain fog. We waited an hour for a bus to take us from Incheon airport to Yongsan central station in the heart of the capital. It was 35 °C.

I’ve been here for a month. Anna has found work, starting in November; I have found the need to modify my working habits. Gone are the comfortable, temperate offices on St Giles’, replaced by an ever-changing diorama of cafés, hotel rooms and libraries. Lugging around my enormous HP Pavilion, known affectionately by some as ‘The Dominator’, proved to be unsustainable.

It’s thesis-writing time for me, so any programming I do is just tinkering and tweaking and fixing the litany of bugs that Lucy Vost has so diligently exposed. I had planned to run Ubuntu on Parallels using my MacBook Air; I discovered to my dismay that a multitude of Conda packages, including PyTorch, are not supported on Apple’s M1 chip. It has been replaced by a combination of Anna’s old Intel MacBook Pro and rewriting my codebase to install and run without a GPU – adversity is the great innovator, as the saying goes.

Continue reading →

5th AI in Chemistry meeting

A few weeks ago we attended the RSC’s 5th AI in Chemistry conference at Churchill College, Cambridge. It featured research and discussions on a broad range of topics and was attended by a diverse set of more than 200 researchers from academia and industry, including six Opiglets.

Continue reading →

Vaccines and vino

Recently, I was fortunate enough to attend and present at GSK’s PhD and Postdoc workshop in Siena, Italy. The workshop spanned two days and I had a brilliant time there – Siena itself is beautiful, I ate fantastic food, and I learnt a huge amount about all stages of vaccine production.

Unfortunately, due to confidentiality, I can’t go into great detail about others’ current research, however I have provided a short overview of the five main areas the workshop focused on below.

Continue reading →

An evolutionary lens for understanding cancer and its treatment

I recently found myself in the Oxford Blackwells’ Norrington Room browsing the shelves for some holiday reading. One book in particular caught my eye, a blend of evolution — a topic that has long interested me — and cancer biology, a topic I’m increasingly exposed to in immune repertoire analysis collaborations but on which I am assuredly “non-expert”!

Paperback cover of “The Cheating Cell” by Athene Aktipis.

The Cheating Cell by Athene Aktipis provides a theoretical framework for understanding cancer by considering it as a logical sequitor of the advent of successful multicellular life.

Continue reading →

How to make your own singularity container zero fuss!

In this blog post, I’ll show you guys how to make your own shiny container for your tool! Zero fuss(*) and in FOUR simple steps.

As an example, I will show how to make a singularity container for one of our public tools, ANARCI, the antibody numbering tool everyone in OPIG and external users are familiar with – If not, check the web app and the GitHub repo here and here.

(*) Provided you have your own Linux machine with sudo permissions, otherwise, you can’t do it – sorry. Same if you have a Mac or Windows – sorry again.
BUT, there are workarounds for these cases such as using the remote singularity builder here, for which you only need to sign up and create an account, and the use of Virtual Machines (VMs), as described here.

Continue reading →

Identify Silly Molecules

Automatic argument parsers for python

One of the recurrent problems I used to have when writing argument parsers is that after refactoring code, I also had to change the argument parser options which generally led to inconsistency between the arguments of the function and some of the options of the argument parser. The following example can illustrate the problem:

def main(a,b):
  """
  This function adds together two numbers a and b
  param a: first number
  param b: second number
  """
  print(a+b)

if __name__ == "__main__":
  import argparse
  parser = argparse.ArgumentParser()
  parser.add_argument("--a", type=int, required=True, help="first number")
  parser.add_argument("--b", type=int, required=True, help="second number")
  args = parser.parse_args()
  main(**vars(args))

This code is nothing but a simple function that prints a+b and the argument parser asks for a and b. The perhaps not so obvious part is the invocation of the function in which we have ** and vars. vars converts the named tuple args to a dictionary of the form {“a":1, "b":2}, and ** expands the dictionary to be used as arguments for the function. So if you have main(**{"a":1, "b":2}) it is equivalent to main(a=1, b=2).

Let’s refactor the function so that we change the name of the argument a to num.

Continue reading →

Do you have cis peptide bonds in your simulation inputs?

People who run molecular simulations quickly become familiar with all of the things about a PDB file – missing residues, missing heavy atoms in residues, missing hydrogens, non-standard amino acids, multiple conformations, crystallization ligands, etc. – that might need to be fixed before setting up a simulation. This blog post is a reminder to check, after you have “fixed” your PDB, if you have accidentally introduced aberrant cis peptide bonds into your structure during rebuilding.

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Running code that fails with style

Universal graph pooling for GNNs

Musings on Digital Nomaddery from Seoul

5th AI in Chemistry meeting

Vaccines and vino

An evolutionary lens for understanding cancer and its treatment

How to make your own singularity container zero fuss!

Identify Silly Molecules

Automatic argument parsers for python

Do you have cis peptide bonds in your simulation inputs?