How to replace bike ball bearings when your steering sounds crunchy

Over the last few months my bicycle steering axle started freezing up, to the point where the first thing I did before getting on my bike in the morning was jerk the handlebars from side to side aggressively to loosen it up. It made atrocious guttural sounds and bangs when I did and navigating Oxford by bike was becoming more treacherous by the day as I swerved from left to right trying to wrestle my front wheel’s fork in the right direction. It was time to undertake some DIY…

Continue reading

The workings of Fragmenstein’s RDKit neighbour-aware minimisation

Fragmenstein is a Python module that combine hits or position a derivative following given templates by being very strict in obeying them. This is done by creating a “monster”, a compound that has the atomic positions of the templates, which then reanimated by very strict energy minimisation. This is done in two steps, first in RDKit with an extracted frozen neighbourhood and then in PyRosetta within a flexible protein. The mapping for both combinations and placements are complicated, but I will focus here on a particular step the minimisation, primarily in answer to an enquiry, namely how does the RDKit minimisation work.

Continue reading

Converting pandas DataFrames into Publication-Ready Tables

Analysing, comparing and communicating the predictive performance of machine learning models is a crucial component of any empirical research effort. Pandas, a staple in the Python data analysis stack, not only helps with the data wrangling itself, but also provides efficient solutions for data presentation. Two of its lesser-known yet incredibly useful features are df.to_markdown() and df.to_latex(), which allow for a seamless transition from DataFrames to publication-ready tables. Here’s how you can use them!

Continue reading

Demystifying the thermodynamics of ligand binding

Chemoinformatics uses a curious jumble of terms from thermodynamics, wet-lab techniques and statistical terminology, which is at its most jarring, it could be argued, in machine learning. In some datasets one often sees pIC50, pEC50, pKi and pKD, in discussion sections a medchemist may talk casually of entropy, whereas in the world of molecular mechanics everything is internal energy. Herein I hope to address some common misconceptions and unify these concepts.

Continue reading

What the heck are TPUs?

I recently became curious about TPUs, a specialised hardware for training Machine- and Deep-Learning models, where TPU stands for Tensor Processing Unit. This fancy chip can provide very high gains for anyone aiming to perform really massive parallelisation of AI tasks such as training, fine-tuning, and inference.

In this blog post, I will touch on what a TPU is, why it could be useful for AI applications when compared to GPUs and briefly discuss associated opportunity costs.

What’s a TPU?

Continue reading

OPIG: A decade of Scientific Shenanigans. What’s changed?

2013 was a big year: Andy Murray clinched the Wimbledon title, NASA’s Curiosity Rover discovered water-bearing minerals on Mars, and ‘twerk’ and ‘selfie’ made their way into the dictionary, something equally significant happened—the birth of BLOPIG.com. Intrigued by how the group has changed over the last decade, I started on a journey to unearth the some of the publications from then till now. Questioning their focus, methods, and evolution of the groups’ research over the past decade. This blog post is what I found.

While delving into each publication of the past decade genuinely seemed like an interesting idea, the imminent threat to my PhD progress forced me to adopt the most 2023-appropriate approach: outsourcing the task to AI. After collecting abstracts from all the group’s papers, I enlisted the help of everyone’s’ favourite hallucinator to summarise the works and (hopefully) highlight the shifts in their research.

So after a relatively long, sequence of prompts, this is (apparently) what we do?

Continue reading

Deploying a Flask app part II: using an Apache reverse proxy

I recently wrote about serving a Flask web application on localhost using gunicorn. This is sufficient to get an app up and running locally using a production-ready WSGI server, but we still need to add a HTTP proxy server in front to securely handle HTTP requests coming from external clients. Here we’ll cover configuring a simple reverse proxy using the Apache web server, though of course you could do the same with another HTTP server such as nginx.

Continue reading

Understanding GPU parallelization in deep learning

Deep learning has proven to be the season’s favourite for biology: every other week, an interesting biological problem is solved by clever application of neural networks. Yet, as more challenges get cracked, modern research shifts more and more in the direction of larger models — meaning that increasing computational resources are required for training. Unsurprisingly, NVIDIA, the main manufacturer of GPUs, experienced a significant jump in their stock price earlier this year.

Access to compute is not enough to train good neural networks. As soon as multiple cards enter into play, researchers need to use a completely different paradigm where data and model weights are distributed across different devices — and sometimes even different computers. Though these tools start to be crucial for successful computational biology research, they are generally unknown to researchers. Hence, in this blogpost, I would like to provide a really brief introduction to multi-GPU training.

Continue reading