Monthly Archives: April 2025

Cheating at Spelling Bee using the command line


The New York Times Spelling Bee is a free online word game, where players must construct as many words as possible using the letters provided in the Bee’s grid, always including the middle letter. Bonus points for using all the letters and creating a pangram.


However, this is the kind of thing which computers are very good at. If you’ve become frustrated trying to decipher the abstruse ways of the bee, let the command line help.

tl;dr:

grep -iP "^[adlokec]{6,}$" /usr/share/dict/words |grep a | awk '{ print length, $0 }' |sort -n |cut -f2 -d" "
Continue reading

Featurisation is Key: One Version Change that Halved DiffDock’s Performance

1. Introduction 

Molecular docking with graph neural networks works by representing the molecules as featurized graphs. In DiffDock, each ligand becomes a graph of atoms (nodes) and bonds (edges), with features assigned to every atom using chemical properties such as atom type, implicit valence and formal charge. 
 
We recently discovered that a change in RDKit versions significantly reduces performance on the PoseBusters benchmark, due to changes in the “implicit valence” feauture. This post walks through: 

  • How DiffDock featurises ligands 
  • What happened when we upgraded RDKit 2022.03.3 → 2025.03.1 
  • Why training with zero-only features and testing on non-zero features is so bad 

TL:DR: Use the dependencies listed in the environment.yml file, especially in the case of DiffDock, or your performance could half!  

Continue reading

Slurm and Snakemake: a match made in HPC heaven

Snakemake is an incredibly useful workflow management tool that allows you to run pipelines in an automated way. Simply put, it allows you to define inputs and outputs for different steps that depend on each other, Snakemake will then run jobs only when the required inputs have been generated by previous steps. A previous blog post by Tobias is a good introduction to it – https://www.blopig.com/blog/2021/12/snakemake-better-workflows-with-your-code/

However, often pipelines are computationally intense and we would like to run them on our HPC. Snakemake allows us to do this on slurm using an extension package called snakemake-executor-plugin-slurm.

Continue reading

A Masterclass in Basic & Translational Immunology with Prof. Abul Abbas

On Thursday 17th April, a group of us made the journey ‘up the hill’ to the Richard Doll building to attend an immunology masterclass from Professor Abul Abbas. Prof. Abbas is an emeritus professor in Pathology at UCSF and author of numerous core textbooks including Basic Immunology: Functions and Disorders of the Immune System.

The whole-day course consisted of a series of lectures covering core topics in immunology, from innate immunity and antigen presentation through to B/T cell subsets, autoimmunity, and immunotherapy.

Continue reading

AI generated linkers™: a tutorial

In molecular biology cutting and tweaking a protein construct is an often under-appreciated essential operation. Some protein have unwanted extra bits. Some protein may require a partner to be in the correct state, which would be ideally expressed as a fusion protein. Some protein need parts replacing. Some proteins disfavour a desired state. Half a decade ago, toolkits exists to attempt to tackle these problems, and now with the advent of de novo protein generation new, powerful, precise and way less painful methods are here. Therefore, herein I will discuss how to generate de novo inserts and more with RFdiffusion and other tools in order to quickly launch a project into the right orbit.
Furthermore, even when new methods will have come out, these design principles will still apply —so ignore the name of the de novo tool used.

Continue reading

NVIDIA Reimagines CUDA for Python Developers

According to GitHub’s Open Source Survey, Python has officially become the world’s most popular programming language in 2024 – ultimately surpassing JavaScript. Due to its exceptional popularity, NVIDIA announced Python support for its CUDA toolkit at last year’s GTC conference, marking a major leap in the accessibility of GPU computing. With the latest update (https://nvidia.github.io/cuda-python/latest/) and for the first time, developers can write Python code that runs directly on NVIDIA GPUs without the need for intermediate C or C++ code.

Historically tied to C and C++, CUDA has found its way into Python code through third-party wrappers and libraries. Now, the arrival of native support means a smoother, more intuitive experience.

This paradigm shift opens the door for millions of Python programmers – including our scientific community – to build powerful AI and scientific tools without having to switch languages or learn legacy syntax.

Continue reading