Tag Archives: OAS

Handling OAS Scale Datasets Without The Drama

Working with Observed Antibody Space (OAS) dataset sometimes feels a bit like trying to cook dinner with the contents of the whole fridge emptied into the pan. There are countless CSVs, all of different sizes (some might not even fit onto your RAM), and you just want a clean, fast pipeline so you can get back to modelling. The trick is to stop treating the data like a giant spreadsheet you fully load into memory and start treating it like a columnar, on-disk database you stream through. That’s exactly what the 🤗 Datasets library gives you.

At the heart of 🤗 Datasets is Apache Arrow, which stores columns in a memory-mapped format (if you are curious about what that means there is a great explanation in another blog post here. In plain terms: the data mostly lives on disk, and you pull in just the slices you need. It feels interactive even when the dataset is huge. Instead of a single monolithic script that does everything (and takes forever), you layer small, composable steps—standardize a few columns, filter out junk, compute a couple of derived fields—and each step is cached automatically. Change one piece, and only that piece recomputes. Sounds great, right? But of course, the key question now is how to get OAS data into Datasets to begin with.

Continue reading →

Exploring the Observed Antibody Space (OAS)

The Observed Antibody Space (OAS) [1,2] is an amazing resource for investigating observed antibodies or as a resource for training antibody specific models, however; its size (over 2.4 billion unpaired and 1.5 million paired antibody sequences as of June 2023) can make it painful to work with. Additionally, OAS is extremely information rich, having nearly 100 columns for each antibody heavy or light chain, further complicating how to handle the data.

From spending a lot of time working with OAS, I wanted to share a few tricks and insights, which I hope will reduce the pain and increase the joy of working with OAS!

Continue reading →

Adding paired BCR data to OAS

Hello,

Today is the day for my final blog post before I enter a thesis writing mode. Using this given opportunity, I would like to present to you our recent update to the Observed Antibody Space (OAS) resource where we included paired antibody data (http://opig.stats.ox.ac.uk/webapps/oas).

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Tag Archives: OAS

Handling OAS Scale Datasets Without The Drama

Exploring the Observed Antibody Space (OAS)

Adding paired BCR data to OAS