{"id":14330,"date":"2026-06-15T15:06:22","date_gmt":"2026-06-15T14:06:22","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=14330"},"modified":"2026-06-15T15:06:23","modified_gmt":"2026-06-15T14:06:23","slug":"sabdab2-the-structural-antibody-database-in-the-age-of-machine-learning","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2026\/06\/sabdab2-the-structural-antibody-database-in-the-age-of-machine-learning\/","title":{"rendered":"SAbDab2: The structural antibody database in the age of machine learning\u00a0"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><em>Henriette L. Capel, Odysseas&nbsp;Vavourakis, Benjamin H. Williams, Christopher R. Taylor, and Charlotte M. Deane<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The Structural Antibody Database<\/strong>&nbsp;<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The Structural Antibody Database (SAbDab) [1] is a publicly available repository of experimentally determined antibody structures, first released in 2013. Explicit support for&nbsp;single-domain antibodies was added in 2021, with&nbsp;SAbDab-nano [2]. Detailed annotations and consistent maintenance have made&nbsp;SAbDab&nbsp;a central resource supporting important advances in the field.&nbsp;SAbDab&nbsp;has been used to study antibody-antigen interactions, including SARS-CoV-2; to predict antibody structure; to design antibodies<em>&nbsp;de-novo<\/em>; and to investigate antibody flexibility.&nbsp;<\/p>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\"><strong>SAbDab&nbsp;needs to evolve with experimental and computational advances in the field<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Experimental advances in solving protein structures and the growing success of antibodies as therapeutics have expanded the known antibody structural space.&nbsp;These developments have also driven growing interest in alternative antibody formats and constructs, such as multi-specific antibodies and antibody fragments. In parallel, emerging computational technologies have led to substantial advances&nbsp;in protein structure and complex prediction. The success of such models hinges on high-quality data, carefully partitioned into train and test sets to avoid data leakage. Fair and meaningful model comparison is predicated on these data splits being standardised.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>SAbDab2&nbsp;provides easily accessible data for machine learning applications<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">SAbDab2 is&nbsp;a comprehensive restructuring of&nbsp;SAbDab&nbsp;designed to systematically annotate structures of a wide array of antibody formats for use in machine-learning (ML) applications.&nbsp;As part of this&nbsp;overhaul&nbsp;we introduce SAbDab2 IDs, which uniquely&nbsp;identify&nbsp;single- and paired-chain variable regions by their IMGT numbering.&nbsp;These group structures with identical variable-domain sequences together across different PDB IDs, bound states, formats, and constructs, enabling direct comparison of&nbsp;<em>apo&nbsp;<\/em>and&nbsp;<em>holo<\/em>&nbsp;conformations,&nbsp;facilitating&nbsp;epitope analysis, supporting investigation of antibody flexibility,&nbsp;and simplifying redundancy filtering.&nbsp;SAbDab2&nbsp;contains&nbsp;21,237 distinct antibody instances (unique structures), derived from 11,085 PDB&nbsp;IDs, and corresponding to 6,540 SAbDab2 IDs (unique variable regions).&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/06\/plot_numbers.png?ssl=1\"><img decoding=\"async\" width=\"1200\" height=\"700\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/06\/plot_numbers.png?fit=625%2C364&amp;ssl=1\" alt=\"\" class=\"wp-image-14367\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/06\/plot_numbers.png?w=1200&amp;ssl=1 1200w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/06\/plot_numbers.png?resize=300%2C175&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/06\/plot_numbers.png?resize=1024%2C597&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/06\/plot_numbers.png?resize=768%2C448&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/06\/plot_numbers.png?resize=624%2C364&amp;ssl=1 624w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>ML-grade data with versioned splits&nbsp;facilitates&nbsp;realistic ML modelling and comparison<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In addition to collecting and annotating all publicly available antibody structures, we curate and clean a high-quality subset of SAbDab2 to create an ML-ready dataset.&nbsp;At launch, it includes&nbsp;15,641 variable-region structures&nbsp;(1,739&nbsp;&nbsp;<em>apo<\/em>;&nbsp;13,902&nbsp;<em>holo<\/em>, 1,245 of which involve multi-polymer antigens), corresponding to 5,301 unique SAbDab2 IDs (462 available in both&nbsp;<em>holo&nbsp;<\/em>and&nbsp;<em>apo&nbsp;<\/em>states, 4,158&nbsp;<em>holo&nbsp;<\/em>only; 219&nbsp;<em>apo<\/em>&nbsp;only; 388 with multi-polymer antigens).&nbsp;&nbsp;<br>To&nbsp;facilitate&nbsp;comparisons between ML models, we are also releasing standardised, versioned, backward-compatible train\/test splits of this dataset, which mitigate against the data leakage concerns affecting the date-based splits currently prevalent in the literature. Two distinct train\/test splits are available. The first, based on antibody sequence similarity alone (\u201cab-split\u201d), is suitable for antigen-agnostic settings. For applications involving antibody\u2013antigen complexes where antigen-based leakage is a concern, a second split accounts for both antibody and antigen sequence similarities between instances (\u201cab-ag-split\u201d).&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>SAbDab2 is easily accessible<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">SAbDab2 is available at&nbsp;<a href=\"https:\/\/sabdab2.opig.stats.ox.ac.uk\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/sabdab2.opig.stats.ox.ac.uk&nbsp;<\/a>and&nbsp;can be searched by PDB ID and SAbDab2 ID, by structure and experimental&nbsp;metadata, by CDR&nbsp;sequence, and sequence similarity.&nbsp;&nbsp;Structural data and summary tables are available for download.&nbsp;<br>The standardised and&nbsp;versioned&nbsp;dataset&nbsp;splits are available&nbsp;for download at <a href=\"https:\/\/zenodo.org\/records\/20083995\">https:\/\/zenodo.org\/records\/20083995<\/a><br>Our preprint&nbsp;will be online soon.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>References<\/strong><\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li class=\"\">James Dunbar, Konrad Krawczyk, Jinwoo Leem, Terry Baker, Angelika Fuchs, Guy&nbsp;Georges, Jiye Shi, and Charlotte M Deane.&nbsp;SAbDab: the structural antibody database. <em>Nucleic Acids Research<\/em>, 42(D1):D1140\u2013D1146, 2014&nbsp;<\/li>\n\n\n\n<li class=\"\">Constantin Schneider, Matthew IJ Raybould, and Charlotte M Deane.&nbsp;SAbDab&nbsp;in the age of&nbsp;biotherapeutics: updates including&nbsp;SAbDab-nano, the nanobody structure tracker. <em>Nucleic Acids Research<\/em>, 50(D1):D1368\u2013D1372, 2022.&nbsp;<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Henriette L. Capel, Odysseas&nbsp;Vavourakis, Benjamin H. Williams, Christopher R. Taylor, and Charlotte M. Deane The Structural Antibody Database&nbsp; The Structural Antibody Database (SAbDab) [1] is a publicly available repository of experimentally determined antibody structures, first released in 2013. Explicit support for&nbsp;single-domain antibodies was added in 2021, with&nbsp;SAbDab-nano [2]. Detailed annotations and consistent maintenance have made&nbsp;SAbDab&nbsp;a [&hellip;]<\/p>\n","protected":false},"author":113,"featured_media":14374,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[466,341,186],"tags":[141,317],"ppma_author":[743,783,874,946,509],"class_list":["post-14330","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-antibodies","category-databases","category-immunoinformatics","tag-antibodies","tag-database"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2026\/06\/SAbDab2_banner_wlogo.png?fit=983%2C302&ssl=1","jetpack_sharing_enabled":true,"authors":[{"term_id":743,"user_id":113,"is_guest":0,"slug":"henriette","display_name":"Henriette Capel","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/f38066305d37e1ea97518e79529d1f53b58b4224d30a660f5f73ed8dcb320e4a?s=96&d=mm&r=g","author_category":"","user_url":"","last_name":"Capel","first_name":"Henriette","job_title":"","description":""},{"term_id":783,"user_id":125,"is_guest":0,"slug":"ody","display_name":"Odysseas Vavourakis","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/b74030bdaef5f39ec32be3ae7bb5af054cbcb0b431b1cc51ba1b41d723ecee48?s=96&d=mm&r=g","author_category":"","user_url":"","last_name":"Vavourakis","first_name":"Odysseas","job_title":"","description":""},{"term_id":874,"user_id":137,"is_guest":0,"slug":"ben","display_name":"Ben Williams","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/020ea273be8638c64bf77c36493144bb0116ead71fae7fa3c4f95093a9d81da9?s=96&d=mm&r=g","author_category":"","user_url":"","last_name":"Williams","first_name":"Ben","job_title":"","description":""},{"term_id":946,"user_id":145,"is_guest":0,"slug":"chris","display_name":"Christopher Taylor","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/484343b30bd825ad34f8ae4725d08cf3e43ec3e4139fa95e484d686ecd2a36a5?s=96&d=mm&r=g","author_category":"","user_url":"","last_name":"Taylor","first_name":"Christopher","job_title":"","description":""},{"term_id":509,"user_id":19,"is_guest":0,"slug":"charlotte","display_name":"Charlotte Deane","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/4fe6e4649bc6bfe230c05d03b3d50f6ac788e9d7d86bad33b8928f49dd6a1d7f?s=96&d=mm&r=g","author_category":"","user_url":"http:\/\/www.stats.ox.ac.uk\/~deane\/","last_name":"Deane","first_name":"Charlotte","job_title":"","description":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/14330","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/113"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=14330"}],"version-history":[{"count":4,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/14330\/revisions"}],"predecessor-version":[{"id":14378,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/14330\/revisions\/14378"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media\/14374"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=14330"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=14330"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=14330"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=14330"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}