Category Archives: Databases

Exploring the Observed Antibody Space (OAS)

The Observed Antibody Space (OAS) [1,2] is an amazing resource for investigating observed antibodies or as a resource for training antibody specific models, however; its size (over 2.4 billion unpaired and 1.5 million paired antibody sequences as of June 2023) can make it painful to work with. Additionally, OAS is extremely information rich, having nearly 100 columns for each antibody heavy or light chain, further complicating how to handle the data. 

From spending a lot of time working with OAS, I wanted to share a few tricks and insights, which I hope will reduce the pain and increase the joy of working with OAS!

Continue reading

histo.fyi: A Useful New Database of Peptide:Major Histocompatibility Complex (pMHC) Structures

pMHCs are set to become a major target class in drug discovery; unusual peptide fragments presented by MHC can be used to distinguish infected/cancerous cells from healthy cells more precisely than over-expressed biomarkers. In this blog post, I will highlight a prototype resource: Dr. Chris Thorpe’s new database of pMHC structures, histo.fyi.

histo.fyi provides a one-stop shop for data on (currently) around 1400 pMHC complexes. Similar to our dedicated databases for antibody/nanobody structures (SAbDab) and T-cell receptor (TCR) structures (STCRDab), histo.fyi will scrape the PDB on a weekly basis for any new pMHC data and process these structures in a way that facilitates their analysis.

Continue reading

Retrieving AlphaFold models from AlphaFoldDB

There are now nearly a million AlphaFold [1] protein structure predictions openly available via AlphaFoldDB [2]. This represents a huge set of new data that can be used for the development of new methods. The options for downloading structures are either in bulk (sorted by genome), or individually from the webpage for a prediction.

If you want just a few hundred or a few thousand specific structures, across different genomes, neither of these options are particularly practical. For example, if you have several thousand experimental structures for which you have their PDB [3] code, and you want to obtain the equivalent AlphaFold predictions, there is another way!

If we take the example of the PDB’s current molecule of the month, pyruvate kinase (PDB code 4FXF), this is how you can go about downloading the equivalent AlphaFold prediction programmatically.

  1. Query UniProt [4] for the corresponding accession number – an example python script is shown below:
Continue reading

CryoEM is now the dominant technique for solving antibody structures

Last year, the Structural Antibody Database (SAbDab) listed a record-breaking 894 new antibody structures, driven in no small part by the continued efforts of the researchers to understand SARS-CoV-2.

Fig. 1: The aggregate growth in antibody structure data (all methods) over time. Taken from http://opig.stats.ox.ac.uk/webapps/newsabdab/sabdab/stats/ on 25th May 2022.

In this blog post I wanted to highlight the major driving force behind this curve – the huge increase in cryo electron microscopy (cryoEM) data – and the implications of this for the field of structure-based antibody informatics.

Continue reading

New Antibody Therapeutic INNs will no longer end in “-mab”!

Happy 2022, Blopiggers!

My first post of the year is about another major change to the way the World Health Organisation will be assigning “International Non-proprietary Name”s (INNs) to antibody-based therapeutics. I haven’t seen this publicised widely, so I thought I’d share it here as it is an important consideration for anyone mining or exploiting this data.

Continue reading

New review on BCR/antibody repertoire analysis out in MAbs!

In our latest immunoinformatics review, OPIG has teamed up with experienced antibody consultant Dr. Anthony Rees to outline the evidence for BCR/antibody repertoire convergence on common epitopes post-pathogen exposure, and all the ways we can go about detecting it from repertoire gene sequencing data. We highlight the new advances in the repertoire functional analysis field, including the role for OPIG’s latest tools for structure-aware antibody analytics: Structural Annotation of AntiBody repertoires+ (SAAB+), Paratyping, Ab-Ligity, Repertoire Structural Profiling & Structural Profiling of Antibodies to Cluster by Epitope (‘SPACE’).

Continue reading

Getting the PDB structures of compounds in ChEMBL

Recently I was dealing with a set of compounds with known target activities from the ChEMBL database, and I wanted to find out which of them also had PDB  crystal structures in complex with that target.

Referencing this manually is very easy for cases where we are interested in 2-3 compounds, but for any larger number, using the ChEMBL and PDB web services greatly reduces the number of clicks.

Continue reading

Watch out when using PDBbind!

Now that PDBbind 2020 has been released, I want to draw some attention to an issue with using the SDF files that are supplied in the PDBbind refined set 2020.

Normally, SDF files save the chirality information of compounds in the atom block of the file which is shown belowas a snipped of the full sdf file for the ligand of PDB entry 4qsv. The column that defines chirality is marked in red.

As you can see, all columns shown here are 0. The SDF files supplied by PDBbind for some reason do NOT encode chirality information explicitly. This will be a problem when using RDKit to read the molecule and transform it into a smiles string. By using the following commands to read the ligand for 4qsv from PDBBind 2020 and write a SMILES string, we get:

Continue reading

2021 likely to be a bumper year for therapeutic antibodies entering clinical trials; massive increase in new targets

Earlier this month the World Health Organisation (WHO) released Proposed International Nonproprietary Name List 125 (PL125), comprising the therapeutics entering clinical trials during the first half of 2021. We have just added this data to our Therapeutic Structural Antibody Database (Thera-SAbDab), bringing the total number of therapeutic antibodies recognised by the WHO to 711.

This is up from 651 at the end of 2020, a year which saw 89 new therapeutic antibodies introduced to the clinic. This rise of 60 in just the first half of 2021 bodes well for a record-breaking year of therapeutics entering trials.

Continue reading