Category Archives: How To

How to do things. doh.

Building a “Second Brain” – A Functional Knowledge Stack with Obsidian

Whilst I always enjoy the acquisition of knowledge, I’ve always struggled with depositing it usefully. From pen and paper notes with a 20 colour theme which lost value with each additional colour, to OneNote or iPad GoodNotes based emulations of pen and paper, it’s been a constant quest for the optimal note taking schema. Personally there are 3 key objectives I need my note taking to achieve:

It must be digitally compatible and accessible from any device.
It must comfortably handle math and images.
It must be something I look forward to – the software needs to be aesthetically clean, lightweight with none of the chunkiness of Microsoft apps, and highly customisable.

For me the solution to this was Obsidian, the perhaps more cultified sibling to Notion. Obsidian is a note taking application that uses markdown with a surprising amount flexibility, including the ability to partner it with an LLM which I’ll explore in this blog, alongside my vault organisation do or dies, and favourite customisations.

Continue reading →

Using Node-RED as a front-end to your software

Node-RED is an, open-source, visual programming tool that lets you wire together hardware (such as sensors), APIs (such as REST/POST) and custom functions. However, its custom functions aren’t simply the JavaScript you write, they can also be containers!

This can provide an intuitive front-end to otherwise difficult software. For example, you’ve written your magnum opus, you’ve even documented it (though no-one will ever read it) and to ensure maximum compatibility for the widest possible audience, you’ve containerised it. But it’s still a command-line driven application. Using node-RED you can make this accessible to an inexperienced audience.

Out of the box, node-RED’s quite pretty, you can string together nodes to perform functions that are useful. In this case, it’s for monitoring a log file, if the log doesn’t grow, something’s gone wrong, so email me to take a look at it.

Continue reading →

Extracting 3D Pharmacophore Points with RDKit

Pharmacophores are simplified representations of the key interactions ligands make with proteins, such as hydrogen bonds, charge interactions, and aromatic contacts. Think of them as the essential “bumps and grooves” on a key that allow it to fit its lock (the protein). These maps can be derived from ligands or protein–ligand complexes and are powerful tools for virtual screening and generative models. Here, we’ll see how to extract 3D pharmacophore points from a ligand using RDKit.
(Code adapted from Dr. Ruben Sanchez.)

Why pharmacophore “points”?

RDKit represents each pharmacophore feature (donor, acceptor, aromatic, etc.) as a point in 3D space, located at the feature center. These points capture the essential interaction motifs of a ligand without requiring the full atomic detail.

Continue reading →

Exploring the Protein Data Bank programmatically

The Worldwide Protein Data Bank (wwPDB or just the PDB to its friends) is a key resource for structural biology, providing a single central repository of protein and nucleic acid structure data. Most researchers interact with the PDB either by downloading and parsing individual entries as mmCIF files (or as legacy PDB files), or by downloading aggregated data, such as the RCSB‘s collection in a single FASTA file of all polymer entity sequences. All too often, researchers end up laboriously writing their own file parsers to digest these files. In recent years though, more sophisticated tools have been made available that make it much easier to access only the data that you need.

Continue reading →

Accelerating AlphaFold 3 for high-throughput structure prediction

Introduction

Recently, I have been conducting a project in which I need to predict the structures of a dataset comprising a few thousand protein sequences using AlphaFold 3. Taking a naive approach, it was taking an hour or two per entry to get a predicted structure. With a few thousand structures, it seemed that it would take months to be able to run…

In this blog post, I will go through some tips I found to help accelerate the structure predictions and make all of the predictions I needed in under a week. In general, following the tips in the AlphaFold 3 performance documentation is a useful starting place. Most of the tips I provide are related to accelerating the MSA generation portion of the predictions because this was the biggest bottleneck in my case.

Continue reading →

Handling OAS Scale Datasets Without The Drama

Working with Observed Antibody Space (OAS) dataset sometimes feels a bit like trying to cook dinner with the contents of the whole fridge emptied into the pan. There are countless CSVs, all of different sizes (some might not even fit onto your RAM), and you just want a clean, fast pipeline so you can get back to modelling. The trick is to stop treating the data like a giant spreadsheet you fully load into memory and start treating it like a columnar, on-disk database you stream through. That’s exactly what the 🤗 Datasets library gives you.

At the heart of 🤗 Datasets is Apache Arrow, which stores columns in a memory-mapped format (if you are curious about what that means there is a great explanation in another blog post here. In plain terms: the data mostly lives on disk, and you pull in just the slices you need. It feels interactive even when the dataset is huge. Instead of a single monolithic script that does everything (and takes forever), you layer small, composable steps—standardize a few columns, filter out junk, compute a couple of derived fields—and each step is cached automatically. Change one piece, and only that piece recomputes. Sounds great, right? But of course, the key question now is how to get OAS data into Datasets to begin with.

Continue reading →

On the Joys of vim-like Browsing

Reflections on Pointlessness

One of the great delights in this life is pointless optimisation. Point-ful optimisation has its place of course; it is right and proper and sensible, and, well, useful, and it also does, when first achieved, yield considerable satisfaction. But I have found I soon adjust to the newly more efficient (and equally drab) normality, and so the spell fades quickly.

Not so with pointless optimisation. Pointless optimisation, once attained, is a preternaturally persistent source of joy that keeps on giving indefinitely. Particularly if it involves acquiring a skill of some description; if the task optimised is frequent; and if the time so saved could not possibly compensate for the time and effort sunk into the optimisation process. Words cannot convey the triumph of completing a common task with hard-earned skill and effortless efficiency, knowing full-well it makes no difference whatsoever in the grand scheme of things.

Continue reading →

AI generated linkers™: a tutorial

In molecular biology cutting and tweaking a protein construct is an often under-appreciated essential operation. Some protein have unwanted extra bits. Some protein may require a partner to be in the correct state, which would be ideally expressed as a fusion protein. Some protein need parts replacing. Some proteins disfavour a desired state. Half a decade ago, toolkits exists to attempt to tackle these problems, and now with the advent of de novo protein generation new, powerful, precise and way less painful methods are here. Therefore, herein I will discuss how to generate de novo inserts and more with RFdiffusion and other tools in order to quickly launch a project into the right orbit.
Furthermore, even when new methods will have come out, these design principles will still apply —so ignore the name of the de novo tool used.

Continue reading →

Debugging code for science: Fantastic Bugs and Where to Find Them.

The simulation results make no sense … My proteins are moving through walls and this dihedral angle is negative; my neural network won’t learn anything, I’ve tried for days to install this software and I still get an error.

Feel familiar? Welcome to scientific programming. Bugs aren’t just annoying roadblocks – they’re mysterious phenomena that make you question your understanding of reality itself. If you’ve ever found yourself debugging scientific code, you know it’s a different beast compared to traditional software engineering. In the commercial software world, a bug might mean a button doesn’t work or data isn’t saved correctly. In scientific computing, a bug might mean your climate model predicts an ice age next Tuesday, or your protein folding algorithm creates molecular structures that couldn’t possibly exist in our universe (cough).

Continue reading →

Making Pretty Pictures in PyMOL v2

Throughout my PhD I’ve needed nice PyMOL visualizations, but struggled to quickly and easily make the pictures I wanted. I’ve used Claire Marks‘ blopig post, Making Pretty Pictures in PyMOL, many times and wanted to expand it with what I’ve learned to make satisfying visualizations quickly!

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Category Archives: How To

Building a “Second Brain” – A Functional Knowledge Stack with Obsidian

Using Node-RED as a front-end to your software

Extracting 3D Pharmacophore Points with RDKit

Why pharmacophore “points”?

Exploring the Protein Data Bank programmatically

Accelerating AlphaFold 3 for high-throughput structure prediction

Introduction

Handling OAS Scale Datasets Without The Drama

On the Joys of vim-like Browsing

Reflections on Pointlessness

AI generated linkers™: a tutorial

Debugging code for science: Fantastic Bugs and Where to Find Them.

Making Pretty Pictures in PyMOL v2