Category Archives: Immunoinformatics

Can few-shot language models perform bioinformatics tasks?

In 2019, I tried my hand at using large language models, specifically GPT-2, for text generation. In that blogpost, I used Hansard files to fine-tune the public release of GPT-2 to generate speeches by several speakers in the House of Commons (link).

In 2020, OpenAI released GPT-3, their new and improved text generation model (paper), which uses a whopping 175 billion parameters (as opposed to its predecessor’s 1.5 billion) and not only proved to be capable of state of the art performance on common text prediction benchmarks, but also generated a considerable amount of interest in the news media:

Continue reading →

Do antibodies care about sex?

In a recent OPIG antibody meeting, the topic of immune system differences between men and women came up. I thought this was cool and something I hadn’t read about, so what a brilliant topic for a blog most. This post is a high-level overview – I’ve listed the papers I’ve used at the bottom of this post so please consult them for more details!

Differences between males and females can lead to pretty big disparities in disease prevalence and outcomes. For example, non-reproductive cancers occur predominantly in males, whilst the majority of autoimmune disease occurs in females. Many factors may be impacting this, including environmental, genetic and hormonal influences, and much more research is required to fully understand these processes. Here I focus on sex-based biology, rather than gender, though both can influence the immune response.

Continue reading →

Antibody Binding is Mediated by a Compact Vocabulary of Paratope-Epitope Interactions

While my own research focuses mainly on what happens in an antibody before it binds its antigen, I recently came across a paper by Akbar et al [1] that examines antibody-antigen interactions using an elegant approach to identify a set of structural motifs that antibodies use to interact with their epitopes. Since I am interested in emergent properties that arise when a sequence is mapped onto an antibody structure, this paper was very exciting. I will also shamelessly admit that I’m a sucker for a pretty figure and this paper has many! Regardless, on to the findings!

Example of identified interaction motifs. Figure from Akbar et al, 2021

Continue reading →

The Coronavirus Antibody Database: 10 months on, 10x the data!

Back in May 2020, we released the Coronavirus Antibody Database (‘CoV-AbDab’) to capture molecular information on existing coronavirus-binding antibodies, and to track what we anticipated would be a boon of data on antibodies able to bind SARS-CoV-2. At the time, we had found around 300 relevant antibody sequences and a handful of solved crystal structures, most of which were characterised shortly after the SARS-CoV epidemic of 2003. We had no idea just how many SARS-CoV-2 binding antibody sequences would come to be released into the public domain…

10 months later (2nd March 2021), we now have tracked 2,673 coronavirus-binding antibodies, ~95% with full Fv sequence information and ~5% with solved structures. These datapoints originate from 100s of independent studies reported in either the academic literature or patent filings.

The entire contents CoV-AbDab database as of 2nd March 2021.

Continue reading →

BioDataScience101: a fantastic initiative to learn bioinformatics and data science

Last Wednesday, I was fortunate enough to be invited as a guest lecturer to the 3rd BioDataScience101 workshop, an initiative spearheaded by Paolo Marcatili, Professor of Bioinformatics at the Technical University of Denmark (DTU). This session, on amino acid sequence analysis applied to both proteomics and antibody drug discovery, was designed and organised by OPIG’s very own Tobias Olsen.

Continue reading →

Speaking about Sequence and Structure at a Summit

A couple of weeks ago I was lucky enough to be asked to speak at the 5th Computational Drug Discovery & Development for Biologics Summit. This was my first virtual conference – it was a shame I didn’t get to visit Boston, and presenting to my empty room was slightly bizarre, but it was great to hear what people have been working on, and there’s definitely something to be said for attending a conference in fluffy socks…

A, antibody structure. An antibody is made up of four chains: two light (orange) and two heavy (blue). Each chain is made up of a series of domains—the variable domains of the light and heavy chains together are known as the Fv region (shown on the right; PDB entry 12E8). The Fv features six loops known as complementarity determining regions or CDRs (shown in dark blue); these are mainly responsible for antigen binding. B, example sequences for the VH and VL, highlighting the CDR regions and the genetic composition. It is estimated that the human antibody repertoire contains up to 1013 unique sequences, enabling the immune system to respond to almost any antigen. This is possible through the recombination of V, D and J gene segments, junctional diversification, and somatic hypermutation.

Continue reading →

It’s been here all along: Analysis of the antibody DE loop

In my work, I mainly look at antigen-bound antibodies and this means a lot of analysing interfaces. Specifically, I spend a lot of my time examining the contributions of complementarity-determining regions (CDRs) to antigen binding, but what about antibodies where the framework (FW) region also contributes to binding? Such structures do exist, and these interactions are rarely trivial. As such, a recent preprint I came across where the authors examined the DE loops of antibodies was a great motivator to broaden my horizons!

Continue reading →

Epitope mapping with structural data for SARS-CoV-2 RBD and 10 known binders

In the past few months we have seen a lot of papers reporting antibodies that they found to bind to SARS-CoV-2 (a database can be found here: http://opig.stats.ox.ac.uk/webapps/covabdab/). Some of them were from the analysis of a patient’s immune system. Some of them come with crystal structures to show where they bind. Some don’t have structures, but they have the sequences and some competition assay data to show approximately where on the spike protein they bind. The main focus is around an area called the Receptor Binding Domain (RBD) which is where the spike protein engages the human ACE2 receptor and causes the downstream problems. In this paper, the authors ran a complete mutagenesis on the RBD of the SARS-CoV-2 spike protein.

Continue reading →

What can you do with the OPIG Antibody Suite? v2.0

Since my last blogpost on this topic back in 2018, OPIG has expanded its range of tools for antibody/BCR analysis. Here is an updated summary of the OPIG antibody databases and immunoinformatics tools.

Continue reading →

Adding paired BCR data to OAS

Hello,

Today is the day for my final blog post before I enter a thesis writing mode. Using this given opportunity, I would like to present to you our recent update to the Observed Antibody Space (OAS) resource where we included paired antibody data (http://opig.stats.ox.ac.uk/webapps/oas).

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Category Archives: Immunoinformatics

Can few-shot language models perform bioinformatics tasks?

Do antibodies care about sex?

Antibody Binding is Mediated by a Compact Vocabulary of Paratope-Epitope Interactions

The Coronavirus Antibody Database: 10 months on, 10x the data!

BioDataScience101: a fantastic initiative to learn bioinformatics and data science

Speaking about Sequence and Structure at a Summit

It’s been here all along: Analysis of the antibody DE loop

Epitope mapping with structural data for SARS-CoV-2 RBD and 10 known binders

What can you do with the OPIG Antibody Suite? v2.0

Adding paired BCR data to OAS