Henriette L. Capel, Odysseas Vavourakis, Benjamin H. Williams, Christopher R. Taylor, and Charlotte M. Deane
The Structural Antibody Database
The Structural Antibody Database (SAbDab) [1] is a publicly available repository of experimentally determined antibody structures, first released in 2013. Explicit support for single-domain antibodies was added in 2021, with SAbDab-nano [2]. Detailed annotations and consistent maintenance have made SAbDab a central resource supporting important advances in the field. SAbDab has been used to study antibody-antigen interactions, including SARS-CoV-2; to predict antibody structure; to design antibodies de-novo; and to investigate antibody flexibility.
SAbDab needs to evolve with experimental and computational advances in the field
Experimental advances in solving protein structures and the growing success of antibodies as therapeutics have expanded the known antibody structural space. These developments have also driven growing interest in alternative antibody formats and constructs, such as multi-specific antibodies and antibody fragments. In parallel, emerging computational technologies have led to substantial advances in protein structure and complex prediction. The success of such models hinges on high-quality data, carefully partitioned into train and test sets to avoid data leakage. Fair and meaningful model comparison is predicated on these data splits being standardised.
SAbDab2 provides easily accessible data for machine learning applications
SAbDab2 is a comprehensive restructuring of SAbDab designed to systematically annotate structures of a wide array of antibody formats for use in machine-learning (ML) applications. As part of this overhaul we introduce SAbDab2 IDs, which uniquely identify single- and paired-chain variable regions by their IMGT numbering. These group structures with identical variable-domain sequences together across different PDB IDs, bound states, formats, and constructs, enabling direct comparison of apo and holo conformations, facilitating epitope analysis, supporting investigation of antibody flexibility, and simplifying redundancy filtering. SAbDab2 contains 21,237 distinct antibody instances (unique structures), derived from 11,085 PDB IDs, and corresponding to 6,540 SAbDab2 IDs (unique variable regions).

ML-grade data with versioned splits facilitates realistic ML modelling and comparison
In addition to collecting and annotating all publicly available antibody structures, we curate and clean a high-quality subset of SAbDab2 to create an ML-ready dataset. At launch, it includes 15,641 variable-region structures (1,739 apo; 13,902 holo, 1,245 of which involve multi-polymer antigens), corresponding to 5,301 unique SAbDab2 IDs (462 available in both holo and apo states, 4,158 holo only; 219 apo only; 388 with multi-polymer antigens).
To facilitate comparisons between ML models, we are also releasing standardised, versioned, backward-compatible train/test splits of this dataset, which mitigate against the data leakage concerns affecting the date-based splits currently prevalent in the literature. Two distinct train/test splits are available. The first, based on antibody sequence similarity alone (“ab-split”), is suitable for antigen-agnostic settings. For applications involving antibody–antigen complexes where antigen-based leakage is a concern, a second split accounts for both antibody and antigen sequence similarities between instances (“ab-ag-split”).
SAbDab2 is easily accessible
SAbDab2 is available at https://sabdab2.opig.stats.ox.ac.uk and can be searched by PDB ID and SAbDab2 ID, by structure and experimental metadata, by CDR sequence, and sequence similarity. Structural data and summary tables are available for download.
The standardised and versioned dataset splits are available for download at https://zenodo.org/records/20083995
Our preprint will be online soon.
References
- James Dunbar, Konrad Krawczyk, Jinwoo Leem, Terry Baker, Angelika Fuchs, Guy Georges, Jiye Shi, and Charlotte M Deane. SAbDab: the structural antibody database. Nucleic Acids Research, 42(D1):D1140–D1146, 2014
- Constantin Schneider, Matthew IJ Raybould, and Charlotte M Deane. SAbDab in the age of biotherapeutics: updates including SAbDab-nano, the nanobody structure tracker. Nucleic Acids Research, 50(D1):D1368–D1372, 2022.
