{"id":9900,"date":"2023-05-19T18:37:17","date_gmt":"2023-05-19T17:37:17","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=9900"},"modified":"2023-05-19T18:37:19","modified_gmt":"2023-05-19T17:37:19","slug":"checking-your-pdb-file-for-clashing-atoms","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2023\/05\/checking-your-pdb-file-for-clashing-atoms\/","title":{"rendered":"Checking your PDB file for clashing atoms"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Detecting atom clashes in protein structures can be useful in a number of scenarios. For example if you are just about to start some molecular dynamics simulation, or if you want to check that a structure generated by a deep learning model is reasonable. It is quite straightforward to code, but I get the feeling that these sort of functions have been written from scratch hundreds of times. So to save you the effort, here is my implementation!!! <\/p>\n\n\n\n<!--more-->\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from Bio import PDB\nimport numpy as np\n\nparser = PDB.PDBParser(QUIET=True)\ndownloader = PDB.PDBList()\n\n# Atomic radii for various atom types. \n# You can comment out the ones you don't care about or add new ones\natom_radii = {\n#    \"H\": 1.20,  # Who cares about hydrogen??\n    \"C\": 1.70, \n    \"N\": 1.55, \n    \"O\": 1.52,\n    \"S\": 1.80,\n    \"F\": 1.47, \n    \"P\": 1.80, \n    \"CL\": 1.75, \n    \"MG\": 1.73,\n}\n\ndef count_clashes(structure, clash_cutoff=0.63):\n    # Set what we count as a clash for each pair of atoms\n    clash_cutoffs = {i + \"_\" + j: (clash_cutoff * (atom_radii[i] + atom_radii[j])) for i in atom_radii for j in atom_radii}\n\n    # Extract atoms for which we have a radii\n    atoms = [x for x in structure.get_atoms() if x.element in atom_radii]\n    coords = np.array([a.coord for a in atoms], dtype=\"d\")\n\n    # Build a KDTree (speedy!!!)\n    kdt = PDB.kdtrees.KDTree(coords)\n\n    # Initialize a list to hold clashes\n    clashes = []\n\n    # Iterate through all atoms\n    for atom_1 in atoms:\n        # Find atoms that could be clashing\n        kdt_search = kdt.search(np.array(atom_1.coord, dtype=\"d\"), max(clash_cutoffs.values()))\n\n        # Get index and distance of potential clashes\n        potential_clash = [(a.index, a.radius) for a in kdt_search]\n\n        for ix, atom_distance in potential_clash:\n            atom_2 = atoms[ix]\n\n            # Exclude clashes from atoms in the same residue\n            if atom_1.parent.id == atom_2.parent.id:\n                continue\n\n            # Exclude clashes from peptide bonds\n            elif (atom_2.name == \"C\" and atom_1.name == \"N\") or (atom_2.name == \"N\" and atom_1.name == \"C\"):\n                continue\n\n            # Exclude clashes from disulphide bridges\n            elif (atom_2.name == \"SG\" and atom_1.name == \"SG\") and atom_distance &gt; 1.88:\n                continue\n\n            if atom_distance &lt; clash_cutoffs[atom_2.element + \"_\" + atom_1.element]:\n                clashes.append((atom_1, atom_2))\n\n    return len(clashes) \/\/ 2<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">ChatGPT explains my code<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">I gave ChatGPT my code and asked it to explain. I think it does a good enough job!!!<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">ChatGPT explanation:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The <code>count_clashes<\/code> function analyses a PDB structure and determines the number of atom clashes present. It takes the structure and an optional clash cutoff value as input. The clash cutoff values are established for each pair of atom types based on their van der Waals radii. An atom pair is considered clashing if their distance is within a specified value times the sum of their radii.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To begin, the function extracts the coordinates of atoms with defined radii, filtering them based on their element type. A KDTree, a spatial searching data structure, is then constructed using the coordinates of the filtered atoms. This enables efficient searching.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Next, the function iterates through each atom in the filtered atom list. For each atom, a KDTree search is performed to identify potential clashes with other atoms. The function applies several conditions to exclude false positives: clashes within the same residue, clashes involving peptide bonds (C-N or N-C), and clashes between atoms participating in disulphide bridges (SG-SG) beyond a specific distance threshold.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If the distance between two atoms falls below the clash cutoff value for their respective atom types, they are considered a clash and added to the clashes list. Finally, the function returns the count of clashes divided by 2. This adjustment is necessary because each clash is detected twice (once for atom A with atom B and vice versa), providing an accurate count of clashes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Feel free to customise the function according to your specific requirements, such as calculating contacts between protein chains or incorporating additional criteria to identify specific types of interactions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Usage<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s consider an example where we have downloaded a PDB file with the ID &#8220;1GIG&#8221; using the PDB downloader. We want to analyse this structure and check for atom clashes using the <code>count_clashes<\/code> function. Here&#8217;s an example usage:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">file = downloader.retrieve_pdb_file(\"1gig\", pdir=\".\", file_format=\"pdb\")\nstructure = parser.get_structure(file, file)\nnum_clashes = count_clashes(structure, clash_cutoff=0.63)\nprint(\"Number of atom clashes:\", num_clashes)<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">When running the code, it will display the number of atom clashes detected in the structure:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">Number of atom clashes: 5<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">So there you go!! A very well explained function to count the number of clashing atoms in a PDB file. Many thanks to my bestie ChatGPT for helping me out!!! If you have read this far I hope this was not a complete waste of your time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Detecting atom clashes in protein structures can be useful in a number of scenarios. For example if you are just about to start some molecular dynamics simulation, or if you want to check that a structure generated by a deep learning model is reasonable. It is quite straightforward to code, but I get the feeling [&hellip;]<\/p>\n","protected":false},"author":71,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"ngg_post_thumbnail":0,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[29,228,227],"tags":[24,724,5,152],"ppma_author":[552],"class_list":["post-9900","post","type-post","status-publish","format-standard","hentry","category-code","category-protein-structure","category-python-code","tag-bioinformatics","tag-chatgpt","tag-protein-structure","tag-python"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":552,"user_id":71,"is_guest":0,"slug":"brennan","display_name":"Brennan Abanades Kenyon","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/5c85dcbb5b1499e82ecfc264ec387c8302ac238c786e68cc5c92e9c21904d260?s=96&d=mm&r=g","author_category":"","user_url":"","last_name":"Abanades Kenyon","first_name":"Brennan","job_title":"","description":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/9900","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/71"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=9900"}],"version-history":[{"count":5,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/9900\/revisions"}],"predecessor-version":[{"id":9905,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/9900\/revisions\/9905"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=9900"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=9900"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=9900"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=9900"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}