{"id":7528,"date":"2021-10-26T12:24:02","date_gmt":"2021-10-26T11:24:02","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=7528"},"modified":"2021-10-26T15:59:45","modified_gmt":"2021-10-26T14:59:45","slug":"getting-the-pdb-structures-of-compounds-in-chembl","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2021\/10\/getting-the-pdb-structures-of-compounds-in-chembl\/","title":{"rendered":"Getting the PDB structures of compounds in ChEMBL"},"content":{"rendered":"\n<p>Recently I was dealing with a set of compounds with known target activities from the ChEMBL database, and I wanted to find out which of them also had PDB &nbsp;crystal structures in complex with that target.<\/p>\n\n\n\n<p>Referencing this manually is very easy for cases where we are interested in 2-3 compounds, but for any larger number, using the ChEMBL and PDB web services greatly reduces the number of clicks.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p>If we have the ChEMBL ID of a compound, the UniChem database (https:\/\/www.ebi.ac.uk\/unichem\/) can be used to cross-reference it with the PDB. A guide to the UniChem web services can be found <a href=\"\/webservices\">here<\/a><a href=\"https:\/\/www.ebi.ac.uk\/unichem\/info\/webservices\">.<\/a><\/p>\n\n\n\n<p>Let\u2019s say we are interested in finding structures for Staurosporine, a notoriously promiscuous kinase inhibitor. Staurosporine\u2019s ChEMBL ID is CHEMBL388978.<\/p>\n\n\n\n<p>The UniChem API query to go from ChEMBL ID to PDB ligand ID then looks like:<\/p>\n\n\n\n<p>https:\/\/www.ebi.ac.uk\/unichem\/rest\/src_compound_id\/CHEMBL388978\/1\/3<\/p>\n\n\n\n<p>where the \u201c1\u201d after the ChEMBL ID indicates that this is a CheMBL ID, and the \u201c3\u201d after that \u2013 that a PDB ID output is requested. Using the Requests module in Python allows integrating such API calls with the rest of our analysis.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import requests\nimport json\n\ndef chembl_comp_to_pdb(chembl_compound_id):\n    chembl_q = f\"https:\/\/www.ebi.ac.uk\/unichem\/rest\/src_compound_id\/{chembl_compound_id}\/1\/3\"\n    chembl_res = requests.get(chembl_q)\n    n = json.loads(chembl_res.text)\n    print(n)\n    if (type(n) is list) and (len(n)==0):\n        return None\n    else:\n        return n[0][\"src_compound_id\"]<\/pre>\n\n\n\n<p>Supplying Staurosporine\u2019s chembl ID to the above function reveals that its PDB ID is \u2018STU\u2019.<\/p>\n\n\n\n<p>We can then use the PDB\u2019s Web Services to retrieve structures where this ligand is present. Specifically, I used the PDB\u2019s Search API, documentation and examples for which can be found <a href=\"https:\/\/search.rcsb.org\/#search-api\" data-type=\"URL\" data-id=\"https:\/\/search.rcsb.org\/#search-api\">here<\/a><a href=\"https:\/\/search.rcsb.org\/#search-api\">.<\/a><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def get_pdb_entries_with_pdb_comp(pdb_comp_id):\n    pdb_q = {\"query\": {\n                \"type\": \"terminal\",\n                \"service\": \"text\",\n                \"parameters\": {\n                  \"attribute\": \"rcsb_nonpolymer_instance_feature_summary.comp_id\",\n                  \"operator\": \"exact_match\",\n                  \"value\": \"\"\n                }\n              },\n                \"request_options\": {\"return_all_hits\": True},\n            \n              \"return_type\": \"entry\"\n            }\n    pdb_q[\"query\"][\"parameters\"][\"value\"] = pdb_comp_id\n    pdb_res = requests.get(\" https:\/\/search.rcsb.org\/rcsbsearch\/v1\/query?json=\" + json.dumps(pdb_q))\n    \n    if len(pdb_res.text) > 0:\n        resp = json.loads(pdb_res.text)\n        return [resp[\"result_set\"][i]['identifier'] for i in range(len(resp['result_set'])) if resp[\"result_set\"][i]['score'] == 1]<\/pre>\n\n\n\n<p>The service type is \u201ctext\u201d, as we are making a text query using the compound\u2019s ID. The attribute name is somewhat trickier to get, but all attributes are listed <a href=\"https:\/\/search.rcsb.org\/structure-search-attributes.html\">here<\/a>  (for structure-related queries), and <a href=\"https:\/\/search.rcsb.org\/chemical-search-attributes.html\">here<\/a> &nbsp;( for chemical queries). On the structural side, ligands are classified as \u2018nonpolymer instance features\u2019, and the comp_id is the attribute we want to query by (\u2018STU\u2019 for Staurosporine).<\/p>\n\n\n\n<p>The request options part of the query deals with pagination of the results. In the default case, only the first 10 results are returned, but all results can be returned by setting the \u2018return_all_hits\u2019 flag to True, as shown here: <a href=\"https:\/\/search.rcsb.org\/#pagination\">https:\/\/search.rcsb.org\/#pagination<\/a>.<\/p>\n\n\n\n<p>The \u2018score\u2019 part of the return statement references the search API\u2019s relevance score, which goes up to 1.0  (most relevant).<\/p>\n\n\n\n<p>The above function returns a whopping 86 PDB structures for Staurosporine, which is not surprising, given its promiscuity within the kinase family of proteins. If we are interested only in complexes with a particular target protein, let\u2019s say the kinase CDK2, the target\u2019s UniProt ID can be used to filter the results. For CDK2, the UniProt identifier is &#8220;P24941&#8221;.<\/p>\n\n\n\n<p>To query the PDB based on UniProt identifier,&nbsp; I used the PDB\u2019s GraphQL-based API, the documentation for which can be found here: <a href=\"https:\/\/data.rcsb.org\/#gql-api\">https:\/\/data.rcsb.org\/#gql-api<\/a><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def match_uniprot_from_pdbids(pdb_ids, uniprot_id):\n    pdb_str = \"[\"\n    for pdbid in pdb_ids[:-1]:\n        pdb_str += f\"\"\" \"{pdbid}\",\"\"\"\n    pdb_str += f\"\"\" \"{pdb_ids[-1]}\"]\"\"\"\n\n    pdb_q1 = \"\"\"{\n                  entries(entry_ids:\"\"\" + pdb_str + \"\"\"){\n                    polymer_entities {\n                      rcsb_id\n                      rcsb_polymer_entity_container_identifiers {\n                        reference_sequence_identifiers {\n                          database_accession\n                          database_name\n                        }\n                      }\n                    }\n                  }\n                }\"\"\"\n    pdb_res = requests.get(\"https:\/\/data.rcsb.org\/graphql?query=\" + pdb_q1)\n    m = json.loads(pdb_res.text)\n\n    struct_list = []\n\n    for pdbid in m['data']['entries']:\n        for entity in pdbid['polymer_entities']: \n            pid = entity['rcsb_id']\n            try:\n                for db in entity['rcsb_polymer_entity_container_identifiers']['reference_sequence_identifiers']:\n                    if db['database_name'] == 'UniProt':\n                        uni_id = db['database_accession']\n                        if uni_id == uniprot_id:\n                                    struct_list.append(pid)\n\n            except TypeError as e:\n                print(e)\n                \n    return struct_list<\/pre>\n\n\n\n<p>Running the above shows that there are 4 PDB IDs featuring both human CDK2 and Staurosporine:<\/p>\n\n\n\n<p>1AQ1, 4ERW, 4EZ7, 7NVQ<\/p>\n\n\n\n<p>Note &#8211; what I have shown above are the solutions I arrived at after some Googling &#8211; do let me know if there are better ways to do this!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently I was dealing with a set of compounds with known target activities from the ChEMBL database, and I wanted to find out which of them also had PDB &nbsp;crystal structures in complex with that target. Referencing this manually is very easy for cases where we are interested in 2-3 compounds, but for any larger [&hellip;]<\/p>\n","protected":false},"author":59,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[187,29,361,341,296,14,227,201,15],"tags":[],"ppma_author":[538],"class_list":["post-7528","post","type-post","status-publish","format-standard","hentry","category-cheminformatics","category-code","category-data-science","category-databases","category-hints-and-tips","category-howto","category-python-code","category-small-molecules","category-technical"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":538,"user_id":59,"is_guest":0,"slug":"mihaela","display_name":"Mihaela Smilova","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/a422eaec6fce7de94e43875a04a32290ee9cbafab1a172db27b9e7cfa9aa5a97?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/7528","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/59"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=7528"}],"version-history":[{"count":4,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/7528\/revisions"}],"predecessor-version":[{"id":7532,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/7528\/revisions\/7532"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=7528"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=7528"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=7528"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=7528"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}