{"id":13102,"date":"2025-09-15T21:05:59","date_gmt":"2025-09-15T20:05:59","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=13102"},"modified":"2025-09-19T10:39:16","modified_gmt":"2025-09-19T09:39:16","slug":"extracting-3d-pharmacophore-points-with-rdkit","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2025\/09\/extracting-3d-pharmacophore-points-with-rdkit\/","title":{"rendered":"Extracting 3D Pharmacophore Points with RDKit"},"content":{"rendered":"\n<p class=\"has-small-font-size wp-block-paragraph\">Pharmacophores are simplified representations of the key interactions ligands make with proteins, such as hydrogen bonds, charge interactions, and aromatic contacts. Think of them as the essential &#8220;bumps and grooves&#8221; on a key that allow it to fit its lock (the protein). These maps can be derived from ligands or protein\u2013ligand complexes and are powerful tools for virtual screening and generative models. Here, we\u2019ll see how to extract 3D pharmacophore points from a ligand using RDKit.<br><em>(Code adapted from Dr. Ruben Sanchez.)<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading has-medium-font-size\">Why pharmacophore \u201cpoints\u201d?<\/h2>\n\n\n\n<p class=\"has-small-font-size wp-block-paragraph\">RDKit represents each pharmacophore feature (donor, acceptor, aromatic, etc.) as a <strong>point in 3D space<\/strong>, located at the feature center. These points capture the essential interaction motifs of a ligand without requiring the full atomic detail.<\/p>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading has-medium-font-size\">Core idea: RDKit\u2019s FeatureFactory<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"has-small-font-size\">So, how does this work? RDKit uses a definition file (<code>BaseFeatures.fdef<\/code>) that contains SMARTS patterns to identify common pharmacophore features like donors, acceptors, and aromatic rings.<\/li>\n\n\n\n<li class=\"has-small-font-size\">The <code>ChemicalFeatures.BuildFeatureFactory<\/code> class reads this file and creates a <strong>feature factory<\/strong>. This factory can then scan a molecule with a 3D conformation and identify all matching features.\n<ul class=\"wp-block-list\">\n<li><strong>Family<\/strong>: The type of feature (e.g., &#8220;Donor&#8221;, &#8220;Acceptor&#8221;).<\/li>\n\n\n\n<li class=\"has-small-font-size\"><strong>Atom indices<\/strong>: The specific atoms in the molecule that define the feature.<\/li>\n\n\n\n<li><strong>3D position<\/strong>: A single point in 3D space representing the feature&#8217;s geometric center.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading has-medium-font-size\">The function<\/h2>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from rdkit import Chem\nfrom rdkit.Chem import AllChem\nimport numpy as np\nfrom collections import defaultdict\nfrom typing import Iterable, Dict, List, Tuple\n\n# Define which pharmacophore families we want to extract\n\nPHARMACOPHORE_FAMILES_TO_KEEP = ('Donor', 'Acceptor','Aromatic')\n\n_FEATURES_FACTORY = []\n\ndef get_features_factory(features_names, resetPharmacophoreFactory=False):\n    global _FEATURES_FACTORY, _FEATURES_NAMES\n    if resetPharmacophoreFactory or (len(_FEATURES_FACTORY) &gt; 0 and _FEATURES_FACTORY[-1] != features_names):\n        _FEATURES_FACTORY.pop()\n        _FEATURES_FACTORY.pop()\n    if len(_FEATURES_FACTORY) == 0:\n        feature_factory = AllChem.BuildFeatureFactory(os.path.join(RDConfig.RDDataDir, 'BaseFeatures.fdef'))\n        _FEATURES_NAMES = features_names\n        if features_names is None:\n            features_names = list(feature_factory.GetFeatureFamilies())\n\n        _FEATURES_FACTORY.extend([feature_factory, features_names])\n    return _FEATURES_FACTORY\n\n\ndef getPharmacophoreCoords(mol, features_names: Iterable[str] = PHARMACOPHORE_FAMILES_TO_KEEP, confId:int=-1) -&gt; \\\n        Tuple[Dict[str, np.ndarray],  Dict[str, List[np.ndarray]]] :\n\n    \"\"\"\n    Extracts 3D pharmacophore points from a molecule.\n\n    Args:\n        mol: An RDKit molecule with an embedded 3D conformer.\n        families: A tuple of pharmacophore family names to extract.\n        confId: The conformer ID to use.\n\n    Returns:\n        A dictionary containing the coordinates ('coords'), family types ('families'),\n        and atom indices ('indices') for each pharmacophore point.\n    \"\"\"\n\n    feature_factory, keep_featnames = get_features_factory(features_names)\n    rawFeats = feature_factory.GetFeaturesForMol(mol, confId=confId)\n    featsDict = defaultdict(list)\n    idxsDict = defaultdict(list)\n    allCoords = defaultdict(list)\n\n    for f in rawFeats:\n        if f.GetFamily() in keep_featnames:\n            featsDict[f.GetFamily()].append(np.array(f.GetPos(confId=f.GetActiveConformer())))\n            idxsDict[f.GetFamily()].append(np.array(f.GetAtomIds()))\n            allCoords['Coords'].append(np.array(f.GetPos(confId=f.GetActiveConformer())))\n            allCoords['Family'].append(f.GetFamily())\n\n    new_feats_dict = {}\n    for key in featsDict:\n        new_feats_dict[key] = np.concatenate(featsDict[key]).reshape((-1,3))\n    return new_feats_dict, idxsDict, allCoords<\/pre>\n\n\n\n<h2 class=\"wp-block-heading has-medium-font-size\">What it returns<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"has-small-font-size\"><strong><code>new_feats_dict<\/code><\/strong> : Maps each feature family to a NumPy array of its 3D coordinates.<\/li>\n\n\n\n<li class=\"has-small-font-size\"><strong><code>idxsDict<\/code><\/strong> : Maps each feature family to a list of the atom indices that define each feature.<\/li>\n\n\n\n<li class=\"has-small-font-size\"><strong><code>allCoords<\/code><\/strong> Contains two flat lists, one for all feature coordinates and one for their corresponding families.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">End-to-end example<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Now, let&#8217;s use our function in a complete workflow.<br>1. First, we need a molecule with a 3D structure.<br>2. Next, we call our function and extract Pharmacophore Points<br>3. Let&#8217;s see what we found.<br><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from rdkit import Chem, RDConfig\nfrom rdkit.Chem import AllChem\nimport numpy as np\nimport os.path\n\n# 1) Build a 3D molecule\nsmiles = \"c1cc(ccc1O)C(=O)NC\"  # example ligand\nmol = Chem.AddHs(Chem.MolFromSmiles(smiles))\nAllChem.EmbedMolecule(mol, AllChem.ETKDGv3())\nAllChem.UFFOptimizeMolecule(mol)\n\n# 2) Extract pharmacophore points\nnew_feats_dict, idxsDict, allCoords = getPharmacophoreCoords(\n    mol, features_names=PHARMACOPHORE_FAMILES_TO_KEEP, confId=-1\n)\n\n# 3) Use the structured outputs\nprint(\"Families present:\", list(new_feats_dict.keys()))\n\nfor fam, coords in new_feats_dict.items():\n    print(f\"- {fam}: {coords.shape[0]} point(s)\")<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Handling Overlapping Atoms<\/p>\n\n\n\n<p class=\"has-small-font-size wp-block-paragraph\">A single atom can be part of multiple features. If you need a list of unique atoms involved in any pharmacophore, you can get them from <code>idxsDict<\/code><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Global (all families) atom index summary\nall_ids = [int(i) for feats in idxsDict.values() for arr in feats for i in arr]\nprint(\"Total indices:\", len(all_ids))\n\n# Keep only unique atom indices (across all features)\npharm_indices = sorted(set(all_ids))\nprint(f\"Unique atoms involved in features: {len(pharm_indices)}\")\n<\/pre>\n\n\n\n<h2 class=\"wp-block-heading has-medium-font-size\">Saving a Subset for Visualization<\/h2>\n\n\n\n<p class=\"has-small-font-size wp-block-paragraph\">What if you want to visualize only the atoms that contribute to specific features, like Donors and Acceptors? You can create a new, minimal molecule containing only those atoms and save it as an SDF file.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># 1) Pick families to save\nFAMS_TO_SAVE = (\"Donor\", \"Acceptor\")\n\n# 2) Collect unique atom indices from those families only\nids_save = []\nfor fam in FAMS_TO_SAVE:\n    if fam in idxsDict:\n        for arr in idxsDict[fam]:\n            ids_save.extend(int(i) for i in arr)\nids_save = sorted(set(ids_save))\n\n# 3) Build a minimal atom-subset molecule (no bonds for simplicity)\ndef build_subset_mol(mol, atom_indices):\n    rw = Chem.RWMol()\n    conf = mol.GetConformer()\n    coords = conf.GetPositions()\n\n    # add atoms with original Z and charge\n    for old in atom_indices:\n        a_old = mol.GetAtomWithIdx(old)\n        a_new = Chem.Atom(a_old.GetAtomicNum())\n        a_new.SetFormalCharge(a_old.GetFormalCharge())\n        a_new.SetNoImplicit(True)\n        rw.AddAtom(a_new)\n\n    # add conformer with the original coordinates\n    conf_new = Chem.Conformer(len(atom_indices))\n    for i, old in enumerate(atom_indices):\n        x, y, z = coords[old]\n        conf_new.SetAtomPosition(i, (float(x), float(y), float(z)))\n    rw.AddConformer(conf_new, assignId=True)\n\n    return rw.GetMol()\n\nsubset = build_subset_mol(mol, ids_save)\n\n# 4) Save to SDF\nwriter = Chem.SDWriter(\"donor_acceptor_atoms.sdf\")\nwriter.write(subset)\nwriter.close()\n\nprint(f\"Saved {len(ids_save)} atoms to donor_acceptor_atoms.sdf\")\n<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">You can now open this SDF file in a molecular viewer to see exactly which atoms contribute to the key hydrogen bonding features!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Pharmacophores are simplified representations of the key interactions ligands make with proteins, such as hydrogen bonds, charge interactions, and aromatic contacts. Think of them as the essential &#8220;bumps and grooves&#8221; on a key that allow it to fit its lock (the protein). These maps can be derived from ligands or protein\u2013ligand complexes and are powerful [&hellip;]<\/p>\n","protected":false},"author":110,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[187,14,447,227,201,15],"tags":[129],"ppma_author":[750],"class_list":["post-13102","post","type-post","status-publish","format-standard","hentry","category-cheminformatics","category-howto","category-molecular-design","category-python-code","category-small-molecules","category-technical","tag-rdkit"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":750,"user_id":110,"is_guest":0,"slug":"yael","display_name":"Yael Ziv","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/3c9bc72679de72f6dcf8e4c545102ef6e0cf449a772414f2a7958c9c1971b740?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/13102","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/110"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=13102"}],"version-history":[{"count":4,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/13102\/revisions"}],"predecessor-version":[{"id":13190,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/13102\/revisions\/13190"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=13102"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=13102"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=13102"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=13102"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}