{"id":10612,"date":"2023-11-14T16:22:25","date_gmt":"2023-11-14T16:22:25","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=10612"},"modified":"2023-11-14T16:27:18","modified_gmt":"2023-11-14T16:27:18","slug":"let-your-library-design-blosum","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2023\/11\/let-your-library-design-blosum\/","title":{"rendered":"Let your library design blosum"},"content":{"rendered":"\n<p>During the lead optimisation stage of the drug discovery pipeline, we might wish to make mutations to an initially identified binding antibody to improve properties such as developability, immunogenicity, and affinity.<\/p>\n\n\n\n<p>There are many ways we could go about suggesting these mutations including using Large Language Models e.g. <a href=\"https:\/\/github.com\/facebookresearch\/esm\">ESM<\/a> and <a href=\"https:\/\/github.com\/oxpig\/AbLang\">AbLang<\/a>, or Inverse Folding methods e.g. <a href=\"https:\/\/github.com\/dauparas\/ProteinMPNN\">ProteinMPNN<\/a> and <a href=\"https:\/\/opig.stats.ox.ac.uk\/data\/downloads\/AntiFold\/\">AntiFold<\/a>. However, some of our recent work (soon to be pre-printed) has shown that classical non-Machine Learning approaches, such as <a href=\"https:\/\/pypi.org\/project\/blosum\/\">BLOSUM<\/a>, could also be worth considering at this stage.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p>BLOSUM matrices (BLOcks SUbstitution Matrices) simply describe how often each amino acid is substituted with all other amino acids when considering similar, aligned proteins. Common minimum sequence similarity thresholds used are 45%, 62%, and 80%, with each cut-off resulting in different final matrices. These matrices are most often displayed as 20 x 20 arrays of integers, where positive and negative values indicate likely and unlikely substitutions respectively.<\/p>\n\n\n\n<p>Though these matrices were generated from observations of all proteins, we can reverse engineer these with antibodies in mind; the goal here being to obtain substitution likelihoods (that sum to one) that could be used to guide mutations. To obtain these BLOSUM likelihoods, it is useful to examine how BLOSUM matrices are calculated:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-18.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"426\" height=\"150\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-18.png?resize=426%2C150&#038;ssl=1\" alt=\"\" class=\"wp-image-10613\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-18.png?w=426&amp;ssl=1 426w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-18.png?resize=300%2C106&amp;ssl=1 300w\" sizes=\"auto, (max-width: 426px) 100vw, 426px\" \/><\/a><\/figure>\n\n\n\n<p>A full description of this formula can be found <a href=\"https:\/\/www.nature.com\/articles\/nbt0804-1035\">here<\/a> but in brief, <strong><code>a<\/code><\/strong> and <strong><code>b<\/code><\/strong> are two dummy amino acids, <strong><code>s(a,b)<\/code><\/strong> are the integer BLOSUM scores, <strong><code>f_a,b<\/code><\/strong> are background frequencies with which <strong><code>a<\/code><\/strong> and <strong><code>b<\/code><\/strong> occur, <strong><code>lambda <\/code><\/strong>is a scaling factor, and <code><strong>p_ab<\/strong><\/code> are the probabilities we wish to obtain &#8211; how often is <strong><code>a<\/code><\/strong> substituted with <strong><code>b<\/code><\/strong>, and vice versa.<\/p>\n\n\n\n<p>When considering mutations to an antibody&#8217;s CDR loops, BLOSUM-45 is a good matrix to choose as our starting point due to the highly variable nature of these loops. If you are interested in mutating the framework region of an antibody, it may be worth considering using BLOSUM-62 or BLOSUM-80 matrices instead.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-19.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"389\" height=\"410\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-19.png?resize=389%2C410&#038;ssl=1\" alt=\"\" class=\"wp-image-10614\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-19.png?w=389&amp;ssl=1 389w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-19.png?resize=285%2C300&amp;ssl=1 285w\" sizes=\"auto, (max-width: 389px) 100vw, 389px\" \/><\/a><\/figure>\n\n\n\n<p>We can obtain antibody-specific amino acid background frequencies, <strong><code>f_a,b<\/code><\/strong>, by using an antibody database, such as SAbDab or OAS e.g.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-20.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"390\" height=\"218\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-20.png?resize=390%2C218&#038;ssl=1\" alt=\"\" class=\"wp-image-10615\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-20.png?w=390&amp;ssl=1 390w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-20.png?resize=300%2C168&amp;ssl=1 300w\" sizes=\"auto, (max-width: 390px) 100vw, 390px\" \/><\/a><\/figure>\n\n\n\n<p>Finally, we can combine the above, tweaking our value of lambda if we wish, to obtain substitution likelihoods, <code><strong>p_ab<\/strong><\/code>, for an example CDRH3, such as Trastuzumab&#8217;s &#8211; WGGDGFYAMD.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-21.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"177\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-21.png?resize=625%2C177&#038;ssl=1\" alt=\"\" class=\"wp-image-10616\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-21.png?w=799&amp;ssl=1 799w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-21.png?resize=300%2C85&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-21.png?resize=768%2C217&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-21.png?resize=624%2C177&amp;ssl=1 624w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n\n\n\n<p>All the code to generate the above plots and design your own libraries can be found in the following <a href=\"https:\/\/colab.research.google.com\/drive\/1Ef8v_QP7koa3ftfOpI5r9bgLFRXZJ8ZC?usp=sharing\">Colab Notebook<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>During the lead optimisation stage of the drug discovery pipeline, we might wish to make mutations to an initially identified binding antibody to improve properties such as developability, immunogenicity, and affinity. There are many ways we could go about suggesting these mutations including using Large Language Models e.g. ESM and AbLang, or Inverse Folding methods [&hellip;]<\/p>\n","protected":false},"author":91,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[466,14,227],"tags":[741,152],"ppma_author":[558],"class_list":["post-10612","post","type-post","status-publish","format-standard","hentry","category-antibodies","category-howto","category-python-code","tag-blosum","tag-python"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":558,"user_id":91,"is_guest":0,"slug":"lewis","display_name":"Lewis Chinery","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/29bddf38b6dd9db3c7161683ddd1a5fc6bad04b6f40e83334e5d43972380ed54?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/10612","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/91"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=10612"}],"version-history":[{"count":3,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/10612\/revisions"}],"predecessor-version":[{"id":10620,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/10612\/revisions\/10620"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=10612"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=10612"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=10612"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=10612"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}