Monthly Archives: January 2026

Advanced PyMOL Visualization for Weighted Structural Ensembles (Part 1): Ensemble Comparison

When working with structural ensembles from molecular dynamics, AlphaFold2 subsampling, or ensemble reweighting against experimental data, you quickly run into visualization problems. Many of these problems standard PyMOL tutorials don’t address: what do you do when there’s no single reference structure?

In this two-part series, I’ll share the PyMOL techniques I’ve developed for visualizing weighted ensembles where multiple conformational states coexist. Part 1 covers reference state handling, RMSD-based coloring, and cluster visualization. Part 2 will tackle efficient SASA surface generation for large ensembles. To the best of my knowledge, this is the most advanced PyMOL guide EVER.

The code snippets here are extracted from full scripts attached at the end of this post. All examples use two systems: TeaA (a membrane transporter with distinct open/closed states) and MoPrP (mouse Prion Protein with partially unfolded forms).

Continue reading →

Advanced PyMOL Visualization for Weighted Structural Ensembles (Part 2): Efficient Weighted SASA Surfaces

In Part 1, we covered reference state handling, RMSD-based coloring, and cluster visualization for weighted structural ensembles. Now we tackle a more ambitious goal: generating solvent-accessible surface area (SASA) surfaces that reflect the weighted conformational distribution of your ensemble.

Why surfaces? Because they show the accessible conformational space—where your protein can actually be found, weighted by population. This is particularly powerful when comparing different fitting methods or showing how experimental constraints reshape the ensemble.

The challenge? A typical ensemble might have 500+ frames, each generating thousands of surface points. Naive approaches choke on the computational and memory demands. This post shares the optimizations that make weighted SASA visualization practical.

Continue reading →

Democratising the Dark Arts: Writing Triton Kernels with Claude

Why would you ever want to leave the warm, fuzzy embrace of torch.nn? It works, it’s differentiable, and it rarely causes your entire Python session to segfault without a stack trace. The answer usually comes down to the “Memory Wall.” Modern deep learning is often less bound by how fast your GPU can do math (FLOPS) and more bound by how fast it can move data around (Memory Bandwidth). When you write a sequence of simple PyTorch operations, something like x = x * 2 + y the GPU often reads x from memory, multiplies it, writes it back, reads it again to add y, and writes it back again. It’s the computational equivalent of making five separate trips to the grocery store because you forgot the eggs, then the milk, then the bread. Writing a custom kernel lets you “fuse” these operations. You load the data once, perform a dozen mathematical operations on it while it sits in the ultra-fast chip registers, and write it back once. The performance gains can be massive (often 2x-10x for specific layers).But traditionally, the “cost” of accessing those gains, learning C++, understanding warp divergence, and manual memory management, was just too high for most researchers. That equation is finally changing.

Continue reading →

What Molecular ML Can Learn from the Vision Community’s Representation Revolution

Something remarkable happened in computer vision in 2025: the fields of generative modeling and representation learning, which had developed largely independently, suddenly converged. Diffusion models started leveraging pretrained vision encoders like DINOv2 to dramatically accelerate training. Researchers discovered that aligning generative models to pretrained representations doesn’t just speed things up—it often produces better results.

As someone who works on generative models for (among other things) molecules and proteins, I’ve been watching this unfold with great interest. Could we do the same thing for molecular ML? We now have foundation models like MACE that learn powerful atomic representations. Could aligning molecular generative models to these representations provide similar benefits?

In this post, I’ll summarize what happened in vision (organized into four “phases”), and then discuss what I think are the key lessons for molecular machine learning. The punchline: many of these ideas are already starting to appear in our field, but we’re still in the early stages compared to vision.

For a more detailed treatment of the vision developments with full references and figures, see the extended blog post on my website.

Continue reading →

Scientific Acceleration with Agentic Coding in 2026

In the past month we have surpassed a critical threshold with the capabilities of agentic coding models. What previously sounded like science fiction has now become reality, and I don’t believe any of us are ready for what is to come. In this blog post I share a summary of the breakthrough I am referring to, I give an insight into how I use agents to accelerate my research, and I make some predictions for the year. With pride, I can say this entire blog post was 100% written by me without any support from ChatGPT (except spell checking and the image below).

Continue reading →

Can we make Boltz predict allosteric binding?

Orthosteric vs Allosteric binding (Nano Banana generated)

(While this post is meant to shed light on the problem of making AI structure prediction models like Boltz become better for allosteric binding, it is also an open call for collaborating on this problem.)

I recently took part in a Boltz hackathon organised by the MIT Jameel Clinic. I worked on improving Boltz 2 predictions for allosteric binders. The validation dataset provided was from a recent paper, Co-folding, the future of docking – prediction of allosteric and orthosteric ligands, which benchmarks some of the recent state-of-the-art AI structure prediction models on a curated set of allosteric and orthosteric binders. Generally, all AI structure prediction models are trained mostly on orthosteric binding cases, which means that their performance on allosteric binding is significantly worse.

Continue reading →

Agentic AI

Agents have burst onto the scene in the last year. Agentic AI refers to AI systems that can pursue a goal, make decisions, take actions, and then adapt based on the results.

Unlike traditional AI models that mostly answer questions or classify information, an agentic system can:

Continue reading →

Finding 250GB of Missing Storage On My Mac: A Warning For Large Dataset Users

I recently faced a puzzling issue: my 1TB MacBook Pro showed only 150GB free, but disk analyzers could only account for about 500GB of used space. After hours of troubleshooting, I discovered that Spotlight’s search index had balooned to 233GB, hundreds of times larger than normal.

The Problem

Standard disk analyzers showed that my mac had 330GB of “Inaccessible Disk Space” and 66GB of “Purgeable Disk Space” but no clear explanation for where my storage went. Removing the purgeable space was easy enough with sudo purge but none of the recommended fixes from ChatGPT like clearing Time Machine snapshots, clearing unused conda packages with pip cache purge and conda clean --all, and restarting the computer had any effect on the inaccessible disk space.

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Monthly Archives: January 2026

Advanced PyMOL Visualization for Weighted Structural Ensembles (Part 1): Ensemble Comparison

Advanced PyMOL Visualization for Weighted Structural Ensembles (Part 2): Efficient Weighted SASA Surfaces

Democratising the Dark Arts: Writing Triton Kernels with Claude

What Molecular ML Can Learn from the Vision Community’s Representation Revolution

Scientific Acceleration with Agentic Coding in 2026

Can we make Boltz predict allosteric binding?

Agentic AI

Finding 250GB of Missing Storage On My Mac: A Warning For Large Dataset Users

The Problem