Tag Archives: PDB

Exploring the Protein Data Bank programmatically

The Worldwide Protein Data Bank (wwPDB or just the PDB to its friends) is a key resource for structural biology, providing a single central repository of protein and nucleic acid structure data. Most researchers interact with the PDB either by downloading and parsing individual entries as mmCIF files (or as legacy PDB files), or by downloading aggregated data, such as the RCSB‘s collection in a single FASTA file of all polymer entity sequences. All too often, researchers end up laboriously writing their own file parsers to digest these files. In recent years though, more sophisticated tools have been made available that make it much easier to access only the data that you need.

Continue reading →

Retrieving AlphaFold models from AlphaFoldDB

There are now nearly a million AlphaFold [1] protein structure predictions openly available via AlphaFoldDB [2]. This represents a huge set of new data that can be used for the development of new methods. The options for downloading structures are either in bulk (sorted by genome), or individually from the webpage for a prediction.

If you want just a few hundred or a few thousand specific structures, across different genomes, neither of these options are particularly practical. For example, if you have several thousand experimental structures for which you have their PDB [3] code, and you want to obtain the equivalent AlphaFold predictions, there is another way!

If we take the example of the PDB’s current molecule of the month, pyruvate kinase (PDB code 4FXF), this is how you can go about downloading the equivalent AlphaFold prediction programmatically.

Query UniProt [4] for the corresponding accession number – an example python script is shown below:

Continue reading →

CryoEM is now the dominant technique for solving antibody structures

Last year, the Structural Antibody Database (SAbDab) listed a record-breaking 894 new antibody structures, driven in no small part by the continued efforts of the researchers to understand SARS-CoV-2.

Fig. 1: The aggregate growth in antibody structure data (all methods) over time. Taken from http://opig.stats.ox.ac.uk/webapps/newsabdab/sabdab/stats/ on 25th May 2022.

In this blog post I wanted to highlight the major driving force behind this curve – the huge increase in cryo electron microscopy (cryoEM) data – and the implications of this for the field of structure-based antibody informatics.

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Tag Archives: PDB

Exploring the Protein Data Bank programmatically

Retrieving AlphaFold models from AlphaFoldDB

CryoEM is now the dominant technique for solving antibody structures