Author Archives: Tobias Olsen

Exploring the Observed Antibody Space (OAS)

The Observed Antibody Space (OAS) [1,2] is an amazing resource for investigating observed antibodies or as a resource for training antibody specific models, however; its size (over 2.4 billion unpaired and 1.5 million paired antibody sequences as of June 2023) can make it painful to work with. Additionally, OAS is extremely information rich, having nearly 100 columns for each antibody heavy or light chain, further complicating how to handle the data. 

From spending a lot of time working with OAS, I wanted to share a few tricks and insights, which I hope will reduce the pain and increase the joy of working with OAS!

Continue reading

The exotic zoo of antibodies

When I think of antibodies, I usually think of the standard human Y-shaped IgG. It is easy to forget that the world of antibodies is extremely diverse, both in the constant domain, with many different isotypes (i.e. IgA, IgD, IgE, and IgM), and in the variable domain (i.e. with or without a light chain and CDR lengths). This is before we even start looking at engineered antibodies, like the ones illustrated in a previous blog post by Alissa

Of the many different antibodies, in this blog post, I want to highlight some of the exotic naturally occurring antibodies which might not have gotten much attention yet, but which each have interesting features.

The standard antibody (i.e. humans, mouse)

This is the standard antibody which we will compare with. A protein complex of two paired heavy and light chains forming the well-known Y shape. At the tips, a binding site that consists mainly of the three CDR’s on each chain. Nice and simple. 

Interesting facts:

Continue reading

Festival of Biologics 2022 – November 2-4 Basel, Switzerland

In November I attended the Festival of Biologics (FoB) 2022 conference in Basel, Switzerland. Originally a set of different conferences (now called agendas) that has merged into a single conference, FoB focuses on anything related to biologics. One of the agendas is an antibody specific agenda, derived from the former European Antibody Congress. This year the antibodies agenda had more than 100 talks across multiple tracks, covering many different aspects of using antibodies as therapeutics, making it an exciting conference for an antibody enthusiast. However, while FoB does include talks on machine learning and bioinformatics, most are focused solely on experimental work. Another drawback is that the majority of the talks are by industry, with the few academic speakers almost all also representing a company. This meant that of the few talks about computational methods and tools for protein design, most felt more like a commercial rather than a research presentation. Nonetheless, FoB is still an interesting conference to attend when you are working on applied research for antibody therapeutics. It is an amazing opportunity to hear about which antibody specific problems companies are trying to overcome, which are deemed solved and which are the future problems to solve.

Continue reading

Running code that fails with style

We have all been there, working on code that continuously fails while staring at a dull and colorless command-line. However, we are in luck, as there is a way to make the constant error messages look less depressing. By changing our shell to one which enables a colorful themed command-line and fancy features like automatic text completion and web search your code won’t just fail with ease, but also with style!

A shell is your command-line interpreter, meaning you use it to process commands and output results of the command-line. The shell therefore also holds the power to add a little zest to the command-line. The most well-known shell is bash, which comes pre-installed on most UNIX systems. However, there exist many different shells, all with different pros and cons. The one we will focus on is called Z Shell or zsh for short.

Zsh was initially only for UNIX and UNIX-Like systems, but its popularity has made it accessible on most systems now. Like bash, zsh is extremely customizable and their syntax so similar that most bash commands will work in zsh. The benefit of zsh is that it comes with additional features, plugins and options, and open-source frameworks with large communities. The framework which we will look into is called Oh My Zsh.

Continue reading

ISMB 2022 – July 10-14 Madison, Wisconsin

Madison, Wisconsin, a place known for its superb selection of craft beverages, for having Wisconsin’s Best Cheese Curds, and, most importantly, for hosting the 2022 annual international conference on Intelligent Systems for Molecular Biology (ISMB). Fortunately, we (Lewis and Tobias) got to attend this year’s ISMB and get a taste of Madison. The 2022 conference is the 30th ISMB conference and has grown to become the world’s largest bioinformatics/computational biology conference with nearly 600 presented talks. We therefore got to hear a wide range of different and interesting talks.

Continue reading

Python’s Data Classes

When writing code, you have inevitably needed to store data throughout your pipeline. In these cases you store your value, list or data frame as a variable to easily use it elsewhere in your code. However, sometimes your data has an awkward form, consisting of a number of different length lists or data of different types and sizes. While it is still doable to work with, and using tuples or dictionaries can help, accessing different elements in your data quickly becomes messy and it is less intuitive what your code is actually doing.

To solve the above stated problem, data classes were introduced as a new feature in Python 3.7. A data class is a regular Python class, but with certain methods already implemented for you. This makes them easy to create and removes a lot of boilerplate (repeated code) making them simpler, more intuitive and pretty. Further, as data classes are part of the standard library, you can directly import it without needing to install any external dependencies (noice).

With the sales pitch out of the way, let us look at how we can use data classes.

from dataclasses import dataclass
from typing import Any

@dataclass
class Antibody:
    vgene: str
    jgene: None
    sequence: Any = 'EVQ'
Continue reading

snakeMAKE better workflows with your code

When developing your pipeline for processing, annotating and/or analyzing data, you will probably find yourself needing to continuously re-run it, as you play around with your code. This can become a problem when working with long pipelines, large datasets and cpu’s begging you not to run some pieces of code again.

Luckily, you are not the first one to have been annoyed by this and other related struggles. Some people were actually so annoyed that they created Snakemake. Snakemake can be used to create workflows and help solve problems, such as the one mentioned above. This is done using a Snakefile, which helps you split your pipeline into “rules”. To illustrate how this helps you create a better workflow, we will be looking at the example below.

Continue reading

Packaging with Conda

If you are as happy for the big snake as I am, you have probably wondered how you can create a Conda package with your amazing code. Fear not, in the following text you will learn how to make others go;

conda install -c coolperson amazingcode

Roughly, the only thing needed to create a Conda package, is a ‘meta.yaml’ file specific for your code. This file contains all the metadata needed to create your package and is highly customizable. While this means the meta.yaml can be written to allow your Conda package to work on any operating system and with any dependencies (doesn’t have to be python) it can be annoying to write from scratch (here is a guide for manually writing this file). Since we just want to create a simple Conda package, we will in this guide avoid fiddling around with the meta.yaml file and instead create the file based on a PyPI package. This will also give you a nice template, if you later need to adapt your meta.yaml file.
Note: Conda packages can also be made from GitHub repositories, which is likely favorable in most cases, but it also requires some manual work on the meta.yaml.

1. Create a PyPI package of your code

Continue reading

Bioinformatics Hackathon Reflection

A week ago I participated in Copenhagen Bioinformatics Hackathon 2021, a hackathon focusing on machine learning and proteins, as a mentor for a challenge proposed by our group. The whole experience was fun, but I am also sitting here contemplating over a lot of things I wish I had done differently. For this blog text, I therefore want to highlight two changes which I believe would have greatly improved my challenge and which can hopefully also work as an inspiration for others presenting a hackathon challenge. 

Going into this event I had some experience from a few hackathons I had previously attended. Based on this, I wanted to create a challenge containing two parts. First, a simple task which everyone would be able to create a solution for, and second, a more challenging addition to the first task for more experienced participants. I decided to go with the challenge of predicting which heavy and light chains can form a pair, where the additional challenge was to try to visualize which residues were relevant for this interaction. Together with OAS containing a really nice positive dataset of paired chains, I thought this was going to be an amazing challenge, but as soon as the event began I started seeing the flaws of the challenge.

Continue reading

Better understanding of correlation

Although correlation is often used as the linear relationship between two sets of points, I will in the following text use it more broadly to mean any relationship between two sets of points.

You have tasked yourself with finding the correlation between the different features in your dataset. Your purpose could be to remove highly correlated features or just improve your understanding of your data. Nonetheless, calculating and using the Pearson Correlation Coefficient (PCC) or the Spearman’s rank Correlation Coefficient (SCC) to get an overview of the correlations might be the first thing that comes to your mind.

Unfortunately, both of these are limited to linear (PCC) or monotonic (SCC) relationships. In datasets with many and complex features, many of them will be highly correlated, just not linearly (or monotonic). Instead these correlations can be non-linear which, as seen in the third row in the below figure, does not get detected with PCC.

Figure: PCC of different sets of x and y points. https://en.wikipedia.org/wiki/Correlation_and_dependence
Continue reading