{"id":7995,"date":"2022-04-20T15:45:27","date_gmt":"2022-04-20T14:45:27","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=7995"},"modified":"2023-03-03T14:44:24","modified_gmt":"2023-03-03T14:44:24","slug":"how-to-prepare-a-molecule-for-rdkit","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2022\/04\/how-to-prepare-a-molecule-for-rdkit\/","title":{"rendered":"How to prepare a molecule for RDKit"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">RDKit is very fussy when it comes to inputs in SDF format. Using the SDMolSupplier, we get a significant rate of failure even on curated datasets such as the PDBBind refined set. Pymol has no such scruples, and with that, I present a function which has proved invaluable to me over the course of my DPhil. For reasons I have never bothered to explore, using pymol to convert from sdf, into mol2 and back to sdf format again (adding in missing hydrogens along the way) will almost always make a molecule safe to import using RDKit:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from pathlib import Path\nfrom pymol import cmd\n\ndef py_mollify(sdf, overwrite=False):\n    \"\"\"Use pymol to sanitise an SDF file for use in RDKit.\n\n    Arguments:\n        sdf: location of faulty sdf file\n        overwrite: whether or not to overwrite the original sdf. If False,\n            a new file will be written in the form &lt;sdf_fname&gt;_pymol.sdf\n            \n    Returns:\n        Original sdf filename if overwrite == False, else the filename of the\n        sanitised output.\n    \"\"\"\n    sdf = Path(sdf).expanduser().resolve()\n    mol2_fname = str(sdf).replace('.sdf', '_pymol.mol2')\n    new_sdf_fname = sdf if overwrite else str(sdf).replace('.sdf', '_pymol.sdf')\n    cmd.load(str(sdf))\n    cmd.h_add('all')\n    cmd.save(mol2_fname)\n    cmd.reinitialize()\n    cmd.load(mol2_fname)\n    cmd.save(str(new_sdf_fname))\n    return new_sdf_fname<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>RDKit is very fussy when it comes to inputs in SDF format. Using the SDMolSupplier, we get a significant rate of failure even on curated datasets such as the PDBBind refined set. Pymol has no such scruples, and with that, I present a function which has proved invaluable to me over the course of my [&hellip;]<\/p>\n","protected":false},"author":61,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[187,29,296,14,291,221,227,201,15],"tags":[129],"ppma_author":[541],"class_list":["post-7995","post","type-post","status-publish","format-standard","hentry","category-cheminformatics","category-code","category-hints-and-tips","category-howto","category-protein-ligand-docking","category-python","category-python-code","category-small-molecules","category-technical","tag-rdkit"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":541,"user_id":61,"is_guest":0,"slug":"jack","display_name":"Jack Scantlebury","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/2d30962dbe9d08db0ac110abbf9ffd5bd52f4eb7da79636d286fb584280feb2c?s=96&d=mm&r=g","author_category":"","user_url":"","last_name":"Scantlebury","first_name":"Jack","job_title":"","description":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/7995","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/61"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=7995"}],"version-history":[{"count":2,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/7995\/revisions"}],"predecessor-version":[{"id":7997,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/7995\/revisions\/7997"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=7995"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=7995"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=7995"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=7995"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}