Author Archives: Alexi Hussain

Memory Efficient Clustering of Large Protein Trajectory Ensembles

Molecular dynamics simulations have grown increasingly ambitious, with researchers routinely generating trajectories containing hundreds of thousands or even millions of frames. While this wealth of data offers unprecedented insights into protein dynamics, it also presents a formidable computational challenge: how do you extract meaningful conformational clusters from datasets that can easily exceed available system memory?

Traditional approaches to trajectory clustering often stumble when faced with large ensembles. Loading all pairwise distances into memory simultaneously can quickly consume tens or hundreds of gigabytes of RAM, while conventional PCA implementations require the entire dataset to fit in memory before decomposition can begin. For many researchers, this means either downsampling their precious simulation data or investing in expensive high-memory computing resources.

The solution lies in recognizing that we don’t actually need to hold all our data in memory simultaneously. By leveraging incremental algorithms and smart memory management, we can perform sophisticated dimensionality reduction and clustering on arbitrarily large trajectory datasets using modest computational resources. Let’s explore how three key strategies—incremental PCA, mini-batch clustering, and intelligent memory management—can transform your approach to analyzing large protein ensembles.

Continue reading

LOADING: an art and science collaborative project

For the past few months, OPIGlets Gemma, Charlie and Alexi have been engaged in a collaboration between scientists from Oxford and artists connected to Central St Martins art college in London. This culminated in February with the publication of a zine detailing our work, and a final symposium where we presented our projects to the wider community.

This collaboration was led by organisers Barney Hill and Nina Gonzalez-Park and comprised a series of workshops in various locations across Oxford and London, where the focus was to discuss commonalities between contemporary artistic and scientific research and the concept of transdisciplinary work. Additionally, scientists and artists were paired up to explore shared interests, with the goal of creating a final piece to exhibit.

Continue reading

Big Compute doesnt want you to know this! Maximising GPU Usage with CUDA MPS

Accelerating Simulations with CUDA MPS: An OpenMM Implementation Guide

Introduction

High-performance molecular dynamics simulations often require running concurrent simulations on GPUs. However, traditional GPU resource allocation can lead to inefficient utilization when running multiple processes, with users often resorting to using multiple GPUs to achieve this. While parrallelsing across nodes can improve time to solution, many processes require coordination and hence communication which quickly becomes a bottleneck. This is exacerbated with more powerful hardware as internal node communication for a single simulation on a single GPU can also become a bottleneck. This problem has been addressed for CPU parrallelism with multiprocessing and multithreading but previously this was challenging to do this efficiently on GPUs.

NVIDIA’s Multi-Process Service (MPS) offers a solution by enabling efficient and easy sharing of GPU resources among multiple processes with just a few commands. In this blog post, we’ll explore how to implement CUDA MPS with Python multiprocessing and OpenMM to accelerate molecular dynamics simulations.

Continue reading

Aider and Cheap, Free, and Local LLMs

Aider and the Future of Coding: Open-Source, Affordable, and Local LLMs

The landscape of AI coding is rapidly evolving, with tools like Cursor gaining popularity for multi-file editing and copilot for AI-assisted autocomplete. However, these solutions are both closed-source and require a subscription.

This blog post will explore Aider, an open-source AI coding tool that offers flexibility, cost-effectiveness, and impressive performance, especially when paired with affordable, free, and local LLMs like DeepSeek, Google Gemini, and Ollama.

Continue reading

An Open-Source CUDA for AMD GPUs – ZLUDA

Lots of work has been put into making AMD designed GPUs to work nicely with GPU accelerated frameworks like PyTorch. Despite this, getting performant code on non-NVIDIA graphics cards can be challenging for both users and developers. Even in the case where the developer has appropriately optimised for each platform there are often gaps in performance where, at the driver-level, instructions to the GPU may not be optimised fully. This is because software developed using CUDA can benefit from optimisations like operation-fusing without having to specify in many cases.

This may not be much of a concern for most researchers as we simply use what is available to us. Most of the time this is usually NVIDIA GPUs and there is hardly a choice to it. NVIDIA is aware of this and prices their products accordingly. Part of the problem is that system designers just dont have an incentive to build AMD platfroms other than for highly specialised machines.

Continue reading

PHinally PHunctionalising my PHigures with PHATE feat. Plotly Express.

After being recommended by a friend, I really wanted to try plotly express but I never had the inclination to read more documentation when matplotlib gives me enough grief. While experimenting with ChatGPT I finally decided to functionalise my figure making scripts. With these scripts I manage to produce figures that made people question what I had actually been doing with my time – but I promise this will be worth your time.

I have been using with dimensionality reducition techniques recently and I came across this paper by Moon et al. PHATE is a technique that represents high dimensional (ie biological) data in a way that aims to preserve connections over preserving distance and I knew I wanted to try this as soon as I saw it. Why should you care? PHATE in 3D is faster that t-SNE in 2D. It would almost be rude to not try it out.

PHATE

In my opinion PHATE (or potential of heat diffusion for affinity-based transition embedding) does have a lot going on but that the choices at each stage feel quite sensisble. It might not come as a surprise this was primarily designed to make visual inspection of data easier on the eyes.

Continue reading