{"id":12287,"date":"2025-02-12T00:50:21","date_gmt":"2025-02-12T00:50:21","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=12287"},"modified":"2025-02-12T00:50:23","modified_gmt":"2025-02-12T00:50:23","slug":"narrowing-the-gap-between-machine-learning-scoring-functions-and-free-energy-perturbation-using-augmented-data","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2025\/02\/narrowing-the-gap-between-machine-learning-scoring-functions-and-free-energy-perturbation-using-augmented-data\/","title":{"rendered":"Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data"},"content":{"rendered":"\n<p>I&#8217;m delighted to report our collaboration (<a href=\"https:\/\/www.linkedin.com\/in\/ACoAADNLSYIBTm7ihhdlQTa0cE41DN5nZ4R0aM4\"><\/a><a href=\"https:\/\/www.linkedin.com\/in\/isak-valsson\/\">\u00cdsak Valsson<\/a>, <a href=\"https:\/\/www.linkedin.com\/in\/ACoAACIVSWsBp77N5AUaaQb-0ELwO-MX0iOq0GY\"><\/a><a href=\"https:\/\/www.linkedin.com\/in\/matthewtwarren\/\">Matthew Warren<\/a>, <a href=\"https:\/\/www.linkedin.com\/in\/ACoAAAMBQGgBCvp0fkQPDIDYjBAWkL3Y-tu5470\"><\/a><a href=\"https:\/\/www.linkedin.com\/in\/aniketmagarkar\/\">Aniket Magarkar<\/a>, <a href=\"https:\/\/www.linkedin.com\/in\/ACoAAAPSJI0B6YQQkZFdA9nIoqlHaCmfoGc8ZMk\"><\/a><a href=\"https:\/\/www.linkedin.com\/in\/phil-biggin-b7957419\/\">Phil Biggin<\/a>, &amp; <a href=\"https:\/\/www.linkedin.com\/in\/ACoAAALfff0BG3SR-TxEI_mdeDvV4f8_azYB4Hc\"><\/a><a href=\"https:\/\/www.linkedin.com\/in\/charlotte-deane-27918614\/\">Charlotte Deane<\/a>), on &#8220;Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data&#8221;, has been published in <em>Nature&#8217;s Communications Chemistry<\/em> (<a href=\"https:\/\/doi.org\/10.1038\/s42004-025-01428-y\">https:\/\/doi.org\/10.1038\/s42004-025-01428-y<\/a>).<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/02\/IMG_4982.webp?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"362\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/02\/IMG_4982.webp?resize=625%2C362&#038;ssl=1\" alt=\"\" class=\"wp-image-12288\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/02\/IMG_4982.webp?w=685&amp;ssl=1 685w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/02\/IMG_4982.webp?resize=300%2C174&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/02\/IMG_4982.webp?resize=624%2C362&amp;ssl=1 624w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n\n\n\n<p><br>During his MSc dissertation project in the <a href=\"https:\/\/www.linkedin.com\/company\/oxford-statistics\/\">Department of Statistics, University of Oxford<\/a>, OPIG member <a href=\"https:\/\/www.linkedin.com\/in\/ACoAADNLSYIBTm7ihhdlQTa0cE41DN5nZ4R0aM4\"><\/a><a href=\"https:\/\/www.linkedin.com\/in\/isak-valsson\/\">\u00cdsak Valsson<\/a> developed an attention-based GNN to predict protein-ligand binding affinity called &#8220;AEV-PLIG&#8221;. It featurizes a ligand&#8217;s atoms using Atomic Environment Vectors to describe the Protein-Ligand Interactions found in a 3D protein-ligand complex. AEV-PLIG is free and open source (BSD 3-Clause), available from GitHub at <a href=\"https:\/\/github.com\/oxpig\/AEV-PLIG\">https:\/\/github.com\/oxpig\/AEV-PLIG<\/a>, and forked at <a href=\"https:\/\/github.com\/bigginlab\/AEV-PLIG\">https:\/\/github.com\/bigginlab\/AEV-PLIG<\/a>.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p>\u00cdsak also developed a much more challenging protein-ligand binding affinity prediction benchmark than CASF-2016, called &#8220;Out-Of-Distribution Test&#8221;, which is also available (<a href=\"https:\/\/github.com\/isakvals\/OOD-Test\">https:\/\/github.com\/isakvals\/OOD-Test<\/a>). It is designed to assess how well a method generalizes to more dissimilar ligands and proteins than seen its training set. AEV-PLIG performed best in terms of Pearson correlation coefficient on OOD Test, and another tough benchmark also developed in OPIG called &#8220;0-Ligand Bias&#8221; (<a href=\"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf040\">https:\/\/doi.org\/10.1093\/bioinformatics\/btaf040<\/a>). AEV-PLIG proved to be more accurate than RF-score, Pafnucy, OnionNet-2, PointVS, SIGN, and AEScore (Table 1).<br><br><a href=\"https:\/\/www.linkedin.com\/in\/ACoAACIVSWsBp77N5AUaaQb-0ELwO-MX0iOq0GY\"><\/a><a href=\"https:\/\/www.linkedin.com\/in\/matthewtwarren\/\">Matthew Warren<\/a> showed that augmenting our training data (PDBbind v2020) with semi-synthetic data (BindingNet) boosted the performance of AEScore (<a href=\"https:\/\/github.com\/RMeli\/aescore\">https:\/\/github.com\/RMeli\/aescore<\/a>) and we confirmed this with AEV-PLIG.<br><br>Together, we found that with even more data (BindingDB), our augmented AEV-PLIG model&#8217;s prediction accuracy starts to approach that of Free Energy Perturbation (FEP+) for congeneric series of ligands that bind the same protein \u2014 see Figures 3 &amp; 4 in our paper \u2014 yet we are ~400,000 times faster, while using a single GPU instead of several. We also showed AEV-PLIG performed best on our FEP Benchmark than the other ML-based scoring functions we looked at (Table 1).<br><br>Another take home: the performance of AEV-PLIG steadily improved as we increased the fraction of augmented training data with no sign of leveling off (Figure S5). Whether this putative &#8216;scaling law&#8217; applies to protein-ligand binding affinity prediction remains to be seen, but the notion of a simpler model architecture with more physically-relevant features combined with more data shows great promise&#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;m delighted to report our collaboration (\u00cdsak Valsson, Matthew Warren, Aniket Magarkar, Phil Biggin, &amp; Charlotte Deane), on &#8220;Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data&#8221;, has been published in Nature&#8217;s Communications Chemistry (https:\/\/doi.org\/10.1038\/s42004-025-01428-y). During his MSc dissertation project in the Department of Statistics, University of Oxford, OPIG [&hellip;]<\/p>\n","protected":false},"author":35,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[633,187,29,361,189,274,291,202,227,201],"tags":[837,835,832,455,836,831,288,834],"ppma_author":[488],"class_list":["post-12287","post","type-post","status-publish","format-standard","hentry","category-ai","category-cheminformatics","category-code","category-data-science","category-machine-learning","category-molecular-recognition","category-protein-ligand-docking","category-proteins","category-python-code","category-small-molecules","tag-acsfs","tag-augmented-data","tag-fep","tag-gnns","tag-physicochemical-deep-learning","tag-protein-ligand-binding","tag-scoring-functions","tag-synthetic-data"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":488,"user_id":35,"is_guest":0,"slug":"garrett","display_name":"Garrett","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/df625261419c37dd5c5937e37f17a732626acd6eea1e6fabd03d935c25b453bf?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/12287","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/35"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=12287"}],"version-history":[{"count":1,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/12287\/revisions"}],"predecessor-version":[{"id":12289,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/12287\/revisions\/12289"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=12287"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=12287"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=12287"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=12287"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}