{"id":3327,"date":"2017-02-08T10:36:58","date_gmt":"2017-02-08T10:36:58","guid":{"rendered":"http:\/\/www.blopig.com\/blog\/?p=3327"},"modified":"2017-02-08T10:36:58","modified_gmt":"2017-02-08T10:36:58","slug":"using-rdkit-to-load-ligand-sdfs-into-pandas-dataframes","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2017\/02\/using-rdkit-to-load-ligand-sdfs-into-pandas-dataframes\/","title":{"rendered":"Using RDKit to load ligand SDFs into Pandas DataFrames"},"content":{"rendered":"<p>If you have downloaded lots of ligand SDF files from the PDB, then a good way of viewing\/comparing all their properties would be to load it into a Pandas DataFrame.<\/p>\n<p>RDKit has a very handy function just for this &#8211; it\u2019s found under the <a href=\"http:\/\/www.rdkit.org\/Python_Docs\/rdkit.Chem.PandasTools-module.html\">PandasTool module.<\/a><\/p>\n<p>I show an example below within Jupypter-notebook, in which I load in the SDF file, view the table of molecules and perform other RDKit functions to the molecules.<\/p>\n<p>First import the PandasTools module:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"enlighter\" data-enlighter-linenumbers=\"false\">from rdkit.Chem import PandasTools\r\n<\/pre>\n<p>Read in the SDF file:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"enlighter\" data-enlighter-linenumbers=\"false\">SDFFile\u00a0=\u00a0\".\/Ligands_noHydrogens_noMissing_59_Instances.sdf\"\r\nBRDLigs\u00a0=\u00a0PandasTools.LoadSDF(SDFFile)\r\n<\/pre>\n<p>You can see the whole table by calling the dataframe:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"enlighter\" data-enlighter-linenumbers=\"false\">BRDLigs<\/pre>\n<p><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss1-1.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-3330 aligncenter\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss1-1.png?resize=419%2C225&#038;ssl=1\" alt=\"\" width=\"419\" height=\"225\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss1-1.png?resize=300%2C161&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss1-1.png?resize=768%2C411&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss1-1.png?resize=1024%2C548&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss1-1.png?resize=624%2C334&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss1-1.png?w=1509&amp;ssl=1 1509w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss1-1.png?w=1250&amp;ssl=1 1250w\" sizes=\"auto, (max-width: 419px) 100vw, 419px\" \/><\/a><\/p>\n<p>The ligand properties in the SDF file are stored as columns. You can view what these properties are, and in my case I have loaded 59 ligands each having up to 26 properties:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-linenumbers=\"false\">BRDLigs.info()<\/pre>\n<p><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss2.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-3332 aligncenter\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss2.png?resize=401%2C207&#038;ssl=1\" alt=\"\" width=\"401\" height=\"207\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss2.png?resize=300%2C155&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss2.png?resize=768%2C397&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss2.png?resize=1024%2C530&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss2.png?resize=624%2C323&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss2.png?w=1489&amp;ssl=1 1489w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss2.png?w=1250&amp;ssl=1 1250w\" sizes=\"auto, (max-width: 401px) 100vw, 401px\" \/><\/a><\/p>\n<p>It is also very easy to perform other RDKit functions on the dataframe. For instance, I noticed there is no heavy atom column, so I added my own called &#8216;NumHeavyAtoms&#8217;:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-linenumbers=\"false\">BRDLigs['NumHeavyAtoms']=BRDLigs.apply(lambda x: x['ROMol'].GetNumHeavyAtoms(), axis=1)\r\n\r\n<\/pre>\n<p>Here is the column added to the table, alongside columns containing the molecules&#8217; SMILES and RDKit molecule:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-linenumbers=\"false\">BRDLigs[['NumHeavyAtoms','SMILES','ROMol']]\r\n\r\n<\/pre>\n<p><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss3.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-3335 aligncenter\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss3.png?resize=389%2C136&#038;ssl=1\" alt=\"\" width=\"389\" height=\"136\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss3.png?resize=300%2C105&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss3.png?resize=768%2C270&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss3.png?resize=1024%2C360&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss3.png?resize=624%2C219&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss3.png?w=1480&amp;ssl=1 1480w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/02\/ss3.png?w=1250&amp;ssl=1 1250w\" sizes=\"auto, (max-width: 389px) 100vw, 389px\" \/><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you have downloaded lots of ligand SDF files from the PDB, then a good way of viewing\/comparing all their properties would be to load it into a Pandas DataFrame. RDKit has a very handy function just for this &#8211; it\u2019s found under the PandasTool module. I show an example below within Jupypter-notebook, in which [&hellip;]<\/p>\n","protected":false},"author":38,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[29,10,14],"tags":[154,152,129,134],"ppma_author":[530],"class_list":["post-3327","post","type-post","status-publish","format-standard","hentry","category-code","category-groupmeetings","category-howto","tag-jupyter","tag-python","tag-rdkit","tag-small-molecules"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":530,"user_id":38,"is_guest":0,"slug":"susan","display_name":"Susan Leung","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/e1a367a2f8d7409be8aa6d2beff2d277525c90331ec34202854a9a5116b4eaa4?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/3327","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/38"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=3327"}],"version-history":[{"count":6,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/3327\/revisions"}],"predecessor-version":[{"id":3337,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/3327\/revisions\/3337"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=3327"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=3327"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=3327"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=3327"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}