Author Archives: Marius Urbonas

Democratising the Dark Arts: Writing Triton Kernels with Claude

Why would you ever want to leave the warm, fuzzy embrace of torch.nn? It works, it’s differentiable, and it rarely causes your entire Python session to segfault without a stack trace. The answer usually comes down to the “Memory Wall.” Modern deep learning is often less bound by how fast your GPU can do math (FLOPS) and more bound by how fast it can move data around (Memory Bandwidth). When you write a sequence of simple PyTorch operations, something like x = x * 2 + y the GPU often reads x from memory, multiplies it, writes it back, reads it again to add y, and writes it back again. It’s the computational equivalent of making five separate trips to the grocery store because you forgot the eggs, then the milk, then the bread. Writing a custom kernel lets you “fuse” these operations. You load the data once, perform a dozen mathematical operations on it while it sits in the ultra-fast chip registers, and write it back once. The performance gains can be massive (often 2x-10x for specific layers).But traditionally, the “cost” of accessing those gains, learning C++, understanding warp divergence, and manual memory management, was just too high for most researchers. That equation is finally changing.

Continue reading

Handling OAS Scale Datasets Without The Drama

Working with Observed Antibody Space (OAS) dataset sometimes feels a bit like trying to cook dinner with the contents of the whole fridge emptied into the pan. There are countless CSVs, all of different sizes (some might not even fit onto your RAM), and you just want a clean, fast pipeline so you can get back to modelling. The trick is to stop treating the data like a giant spreadsheet you fully load into memory and start treating it like a columnar, on-disk database you stream through. That’s exactly what the 🤗 Datasets library gives you.

At the heart of 🤗 Datasets is Apache Arrow, which stores columns in a memory-mapped format (if you are curious about what that means there is a great explanation in another blog post here. In plain terms: the data mostly lives on disk, and you pull in just the slices you need. It feels interactive even when the dataset is huge. Instead of a single monolithic script that does everything (and takes forever), you layer small, composable steps—standardize a few columns, filter out junk, compute a couple of derived fields—and each step is cached automatically. Change one piece, and only that piece recomputes. Sounds great, right? But of course, the key question now is how to get OAS data into Datasets to begin with.

Continue reading

Cream, Compression, and Complexity: Notes from a Coffee-Induced Rabbit Hole

I have recently stumbled upon this paper which, quite unexpectedly, sent me down a rabbit hole reading about compression, generalisation, algorithmic information theory and looking at gifs of milk mixing with coffee on the internet. Here are some half-processed takeaways from this weird journey.

Complexity of a cup of Coffee

First, check out this cool video.

Continue reading