{"id":10522,"date":"2023-11-08T10:35:51","date_gmt":"2023-11-08T10:35:51","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=10522"},"modified":"2025-10-03T23:07:19","modified_gmt":"2025-10-03T22:07:19","slug":"the-workings-of-fragmensteins-rdkit-neighbour-aware-minimisation","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2023\/11\/the-workings-of-fragmensteins-rdkit-neighbour-aware-minimisation\/","title":{"rendered":"The workings of Fragmenstein&#8217;s RDKit neighbour-aware minimisation"},"content":{"rendered":"\n<p><a href=\"https:\/\/github.com\/matteoferla\/Fragmenstein\/tree\/master\/fragmenstein\" data-type=\"link\" data-id=\"https:\/\/github.com\/matteoferla\/Fragmenstein\/tree\/master\/fragmenstein\">Fragmenstein<\/a> is a Python module that combine hits or position a derivative following given templates by being very strict in obeying them. This is done by creating a &#8220;monster&#8221;, a compound that has the atomic positions of the templates, which then reanimated by very strict energy minimisation. This is done in two steps, first in RDKit with an extracted frozen neighbourhood and then in PyRosetta within a flexible protein. The mapping for both combinations and placements are complicated, but I will focus here on a particular step the minimisation, primarily in answer to an enquiry, namely how does the RDKit minimisation work.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/fig_1-schematic.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"415\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/fig_1-schematic.png?resize=625%2C415&#038;ssl=1\" alt=\"\" class=\"wp-image-10541\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/fig_1-schematic.png?resize=1024%2C680&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/fig_1-schematic.png?resize=300%2C199&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/fig_1-schematic.png?resize=768%2C510&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/fig_1-schematic.png?resize=1536%2C1020&amp;ssl=1 1536w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/fig_1-schematic.png?resize=2048%2C1360&amp;ssl=1 2048w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/fig_1-schematic.png?resize=624%2C415&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/fig_1-schematic.png?w=1250&amp;ssl=1 1250w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/fig_1-schematic.png?w=1875&amp;ssl=1 1875w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\">Normal run<\/h2>\n\n\n\n<p>Choosing as a test the data from <a href=\"https:\/\/doi.org\/10.1073\/pnas.2212931120\" data-type=\"link\" data-id=\"https:\/\/doi.org\/10.1073\/pnas.2212931120\">Gahbauer et al. 2023<\/a> (where Fragmenstein is used to find an awesome hit), I will do a normal merge of the hits from PDB:5RSW and PDB:5RUE, while placing PDB:5SQJ against these. Parenthetically, I do not use the word &#8220;dock&#8221; as it does not test MCMC rototranslations of pre-generating conformers and instead creates a conformer and fixes it with not conformer resampling \u2014the PyRosetta part uses a custom FastRelax script with no rotamers and with extra constraints.<br>In the footnote below the data is prepared.<br>In the footnote a dataclass instance with the three structures was created with values <code>accession<\/code>, <code>clean_pdbblock<\/code> and <code>mol<\/code>. Now let&#8217;s combine them.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from fragmenstein import Victor, Igor\n\napo_pdbblock = '\\n'.join([l for l in template1.clean_pdbblock.split('\\n') if 'ATOM' in l])\n\nIgor.init_pyrosetta()\nvicky = Victor([template1.mol, template2.mol], pdb_block=apo_pdbblock)\nvicky.combine(long_name='merger')\nprint(vicky.summarize())<\/pre>\n\n\n\n<p>The latter gave a potential of -14.1 kcal\/mol with 23 constrained atoms and none unconstrained and an RMSD of 0.23\u00c5 and run in 17s. Sounds great, so let&#8217;s have a gander. Parenthetically, Fragmenstein has nglview display options, but unfortunately this no longer works in Colab and with newer IPython widget module.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">view = py3Dmol.view(width=800, height=400)\nview.addModel(apo_pdbblock, 'pdb')\nview.setStyle({'chain': 'A'}, {'cartoon': {'color': 'turquoise'}})\nfor hit in vicky.hits:\n    view.addModel(Chem.MolToMolBlock(hit), 'mol')\n    view.setStyle({'model': -1}, {'stick': {'colorscheme': 'whiteCarbon'}})\nview.addModel(Chem.MolToMolBlock(vicky.minimized_mol), 'mol')\nview.setStyle({'model': -1}, {'stick': {'colorscheme': 'yellowCarbon'}})\nview.zoomTo({'model': -1})\nview.show()<\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-11.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"555\" height=\"323\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-11.png?resize=555%2C323&#038;ssl=1\" alt=\"\" class=\"wp-image-10545\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-11.png?w=555&amp;ssl=1 555w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-11.png?resize=300%2C175&amp;ssl=1 300w\" sizes=\"auto, (max-width: 555px) 100vw, 555px\" \/><\/a><\/figure>\n\n\n\n<p>Now let&#8217;s place. Parenthetically, I should say that I use Fragmenstein like this when testing hypotheses, for large scale mergers I submit HPC jobs running the <code>fragmenstein pipeline<\/code>.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">vicky = Victor([template1.mol, template2.mol], pdb_block=apo_pdbblock)\nvicky.place(derivative.SMILES, long_name='placed')<\/pre>\n\n\n\n<p>The placement worked well (1\u00c5 RMSD), although I ought to mention that Laboratory (the pipeline, <code>expand_isomers<\/code> of <code>.place<\/code>) makes Victor try all stereoisomers, as a result the resulting stereoisomer will likely be a product of the monster step. About the classes, the class <code>Monster<\/code> handles the making of the stitched together compound, <code>Igor<\/code> the PyRosetta minimisation (<code>Fritz<\/code> the OpenMM minimisation), <code>Victor<\/code> orchestrates the two, while <code>Laboratory<\/code> does multiple mergers or\/and placements. There is also <code>Walter<\/code> which sets up the scene with synthetic tests \u2014although it is .<\/p>\n\n\n\n<p>But what if <code>5RSW<\/code> did not exist? What happens? Let&#8217;s throw Fragmenstein under the bus as it has no idea how to best place the other part. <\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-9.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"314\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-9.png?resize=625%2C314&#038;ssl=1\" alt=\"\" class=\"wp-image-10543\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-9.png?w=799&amp;ssl=1 799w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-9.png?resize=300%2C151&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-9.png?resize=768%2C385&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-9.png?resize=624%2C313&amp;ssl=1 624w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n\n\n\n<p>This is because in order to compromise with the atomic positions especially between mismatching templates it does not do a easy constrained embedding via RDKit <code>AllChem.EmbedMolecule<\/code> or even <code>AllChem.ConstrainedEmbed<\/code>, as this would fail. Parenthetically, a water could be provided as a hit along with a simple map override. Sometimes, the list in <code>Victor(...).monster.mol_options<\/code> might have a satisfactory solution. <br>However, here we want to dissect the workings of Monster. Truth be told, the above is actually a third run of the code as the first run gave 0.9\u00c5 relative to the actual conformer of the derivative compound (below) and the second 1.2\u00c5, both of which are good. The fact that each run gives a slightly different outcome for the unconstrained part is rather meaningless without knowing the crystallographic conformer: one needs to balance risk when shortlisting compounds and, if that second hit or the charged native ligand did not exist, few scientists would be that profligate to take that gamble.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-17.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"521\" height=\"382\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-17.png?resize=521%2C382&#038;ssl=1\" alt=\"\" class=\"wp-image-10584\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-17.png?w=521&amp;ssl=1 521w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-17.png?resize=300%2C220&amp;ssl=1 300w\" sizes=\"auto, (max-width: 521px) 100vw, 521px\" \/><\/a><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Neighbourhood<\/h2>\n\n\n\n<p>Monster can do a RDKit based minimisation using MMFF in a frozen neighbourhood.<br>This takes a fraction of a second, but is imperfect so a second minimisation in the protein is done. It is imperfect as it in vacuo, the neighbourhood is fixed and the score is internal energy. The latter is independent of entropy whereas in PyRosetta the hybrid score function is better at accounting for this as its not pure physics-based and instead has fitted values (fudge factors in purist lingo) even if it is in implicit solvent (Lazaridis-Karplus solvation potential)  and the torsions are not explicitly there \u2014for a quick overview of thermodynamics see this <a href=\"https:\/\/www.blopig.com\/blog\/2023\/11\/demystifying-the-thermodynamics-of-ligand-binding\/\" data-type=\"link\" data-id=\"https:\/\/www.blopig.com\/blog\/2023\/11\/demystifying-the-thermodynamics-of-ligand-binding\/\">previous post<\/a>).<\/p>\n\n\n\n<p>On the operations, the first step is getting the neighbourhood.<br>Victor, which handles the apo PDB block, calls <code>monster.get_neighborhood(apo, cutoff, mol)<\/code> with <code>mol<\/code> being optional as otherwise <code>monster.positioned_mol<\/code> is used (<a href=\"https:\/\/github.com\/matteoferla\/Fragmenstein\/blob\/285d27593cc0b0891f9e7a590d227d5228190e21\/fragmenstein\/victor\/_victor_plonk.py#L66\" data-type=\"link\" data-id=\"https:\/\/github.com\/matteoferla\/Fragmenstein\/blob\/285d27593cc0b0891f9e7a590d227d5228190e21\/fragmenstein\/faux_victors\/quick.py#L37\">code<\/a>). The PDB block is read into RDKit (it lacked altLoc see footnote) and then the method gets every atom in the protein Chem.Mol within a atom-to-atom cutoff distance and then expands any aromatic atoms to the full aromatic ring.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Extract the neighbourhood!\nneigh: Chem.Mol = vicky.monster.get_neighborhood(vicky.apo_pdbblock, 8.)\n# `Monster.get_neighborhood(None, vicky.apo_pdbblock, 8., mol)` would have done the same\n\n# view it\nview = py3Dmol.view(width=800, height=400)\nview.addModel(Chem.MolToMolBlock(neigh), 'mol')\nview.setStyle({'model': -1}, {'stick': {'colorscheme': 'cyanCarbon'}})\nview.addModel(Chem.MolToMolBlock(vicky.monster.positioned_mol), 'mol')\nview.setStyle({'model': -1}, {'stick': {'colorscheme': 'magentaCarbon'}})\nview.zoomTo({'model': -1})\nview.show()<\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-15.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"636\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-15.png?resize=625%2C636&#038;ssl=1\" alt=\"\" class=\"wp-image-10578\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-15.png?w=688&amp;ssl=1 688w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-15.png?resize=295%2C300&amp;ssl=1 295w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-15.png?resize=624%2C635&amp;ssl=1 624w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Minimise<\/h2>\n\n\n\n<p>The next step in minimisation where the neighbourhood atoms are, in molecular mechanics parlance, &#8220;frozen&#8221; (<em>i.e. <\/em>unable to move). Parenthetically, crystallographers use the word &#8220;constrained&#8221; for this and &#8220;restrained&#8221; for a constrained atom (say with a harmonic tether): personally I feel it is best avoid using that terminology.<br>The minimisation is done by <code>monster.mmff_minimize<\/code>, which substitutes the dummy atoms for a carbon (which would crash otherwise), then the following happens (with some extra steps such as reruns):<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Given\natom_indices_to_freeze: List[int]\natom_indices_to_tether: List[int]\nmol: Chem.Mol\nff_max_displacement: float\nff_constraint: float\n\n# Run\np: AllChem.ForceField.MMFFMolProperties = AllChem.MMFFGetMoleculeProperties(mol, 'MMFF94')\nff: AllChem.ForceField.ForceField = AllChem.MMFFGetMoleculeForceField(mol, p)\nff.Initialize()\nfor i in atom_indices_to_freeze:\n      ff.AddFixedPoint(i)\nfor i in atom_indices_to_tether:\n      ff.MMFFAddPositionConstraint(i, maxDispl=ff_max_displacement, forceConstant=ff_constraint)\nff.Minimize()\nff.CalcEnergy()<\/pre>\n\n\n\n<p><a href=\"https:\/\/github.com\/rdkit\/rdkit\/blob\/115317f43e3bdfd73673ca0e4c6b4035aa26a034\/Code\/ForceField\/UFF\/PositionConstraint.cpp#L35\" data-type=\"link\" data-id=\"https:\/\/github.com\/rdkit\/rdkit\/blob\/115317f43e3bdfd73673ca0e4c6b4035aa26a034\/Code\/ForceField\/UFF\/PositionConstraint.cpp#L35\">Looking at the C-code<\/a> for the function <code>ff.MMFFAddPositionConstraint<\/code>, one sees that this is a harmonic force: force_constant times the square of the difference between distance and max_displacement_constant (zero if distance is less than max_displacement_constant).<br>The whole operating acts in place on the Chem.Mol.<\/p>\n\n\n\n<p>The last step is extracting the compound from the combined mol\u2013neighbourhood system with is done via <code>monster.extract_from_neighborhood<\/code>. The latter actually removes the neighbourhood before calling <code>Chem.GetMolFrags<\/code> because when things go super-duper badly wrong there may be new bonds.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Example<\/h2>\n\n\n\n<p>So lets use an externally generated conformers and find the best using this code.<br>First, let&#8217;s fix the template and make a bunch of embedded conformers of the derivative constrained against it (below <code>target<\/code>). Fragmenstein is fine with imperfect matches, but for this walkthrough we will use <code>Chem.Mol.GetSubstructMatch<\/code>.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from fragmenstein import Monster\n\n# no hydroxy and carboxy &gt; amide in target\nmonstah = Monster([template1.mol])\nmonstah.place_smiles('Nc1ccc(C(=O)N)cc1')\nmod_template: Chem.Mol = monstah.positioned_mol<\/pre>\n\n\n\n<p>Next let&#8217;s generate conformers but with certain atoms fixed. Note that <code>AllChem.EmbedMultipleConfs<\/code> will drift so needs realigning.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">target = Chem.MolFromSmiles( derivative.SMILES )\nmod_conf: Chem.Conformer = mod_template.GetConformer()\ncoords = {ti: mod_conf.GetAtomPosition(qi) for qi, ti in enumerate(target.GetSubstructMatch( mod_template ))}\nAllChem.EmbedMultipleConfs(target, coordMap=coords, numConfs=10)\natom_map = [(ti, qi) for qi, ti in enumerate(fauxstah.positioned_mol.GetSubstructMatch(mod_template))]\n# fix drift\nfor ci in range(target.GetNumConformers()):\n    assert rdMolAlign.AlignMol(target, mod_template, prbCid=ci, atomMap=atom_map) &lt; .5 # By definition this will never happen, but do be aware of rounding errors<\/pre>\n\n\n\n<p>So let&#8217;s minimise!<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from typing import List\nfrom rdkit.Chem import rdMolAlign\nfrom fragmenstein import MinizationOutcome  # just a dataclass for typehinting\n\n# make a faux monster\nfauxstah = Monster.__new__(Monster)\nfauxstah.positioned_mol = Chem.Mol(target)\nconf: Chem.Conformer\nsolutions: List[MinizationOutcome] = []\nfor conf in target.GetConformers():\n    fauxstah.positioned_mol.RemoveAllConformers()\n    fauxstah.positioned_mol.AddConformer( conf )\n    neighborhood: Chem.Mol = fauxstah.get_neighborhood(apo_pdbblock, cutoff=8.)\n    out = fauxstah.mmff_minimize(fauxstah.positioned_mol,\n                            neighborhood=neighborhood,\n                            ff_max_displacement=0.1,\n                            ff_constraint=5.0,\n                            allow_lax=True)\n    solutions.append(out)\n\nsolutions = sorted(solutions, key=lambda mo: mo.U_post)\n# let's cheat! RMSD against the actual crystallographic one.\nfor opt in [opt.mol for opt in solutions]:\n    print(rdMolAlign.CalcRMS(derivative.mol, opt))<\/pre>\n\n\n\n<p>The third entry of the internal energy sorted conformers is the closest (1.0\u00c5). Obviously, this is cheating as that info would not be available.<br>Let&#8217;s have a gander of the most similar:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-16.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"560\" height=\"381\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-16.png?resize=560%2C381&#038;ssl=1\" alt=\"\" class=\"wp-image-10583\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-16.png?w=560&amp;ssl=1 560w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-16.png?resize=300%2C204&amp;ssl=1 300w\" sizes=\"auto, (max-width: 560px) 100vw, 560px\" \/><\/a><\/figure>\n\n\n\n<p>The solutions from the conformer hack <\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Footnote: data generation<\/h1>\n\n\n\n<p>Fragmenstein in the demo submodule already has some examples, but here I want to operate from first principles.<\/p>\n\n\n\n<p>First define the accession codes, lets use a <code>dataclass<\/code> instance to hold the data:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from dataclasses import dataclass\nfrom rdkit import Chem\n\n@dataclass\nclass Template:\n    accession: str\n    chemcomp: str\n    SMILES: str\n    raw_pdbblock: str = None\n    clean_pdbblock: str = None\n    mol: Chem.Mol = None\n\nderivative = Template(accession='5SQJ', chemcomp='QRU', SMILES='c1cc2c(c(c1)O)C[C@H]([C@H]2NC(=O)c3ccc(cc3)NC(=O)NC4CC4)C(=O)[O-]')\ntemplate1 = Template(accession='5RUE', chemcomp='BHA', SMILES='c1cc(c(cc1N)O)C(=O)[O-]')\ntemplate2 = Template(accession='5RSW', chemcomp='6FZ', SMILES='c1ccc2c(c1)CC(C2)C(=O)[O-]')<\/pre>\n\n\n\n<p>Second, let&#8217;s retrieve the PDB blocks and remove pesky altloc (RDKit hates them):<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import requests\nfrom typing import Callable\n\ndef remove_altloc(pdbblock: str) -&gt; str:\n    \"\"\"Removes all altlocs and actually segi duplicates.\n    Test line:\n    \n    ATOM      1  N   MET A   1       0.000  0.000  0.000  1.00 60.69           N\n    \"\"\"\n    lines: List[str] = []\n    seen: List[Tuple[str, str, int]] = []\n    for line in pdbblock.split('\\n'):\n        if 'ANISOU' in line:\n            continue # skip\n        if line[:4] != 'ATOM' and line[:6] != 'HETATM':\n            lines.append(line)\n            continue\n        atom_info = line[12:16].strip(), line[21].strip(), int(line[22:26].strip())\n        if atom_info not in seen:\n            lines.append(f'{line[:16]} {line[17:]}')\n            seen.append(atom_info)\n        else: # skip\n            pass\n    return '\\n'.join(lines)\n\ndef fetch(pdb_acc: str) -&gt; str:\n    response: requests.Response = requests.get(f'https:\/\/files.rcsb.org\/download\/{pdb_acc}.pdb')\n    response.raise_for_status()\n    return response.text\n\nmodel: Template\nfor model in (derivative, template1, template2):\n    model.raw_pdbblock=fetch(model.accession)\n    model.clean_pdbblock = remove_altloc( model.raw_pdbblock)<\/pre>\n\n\n\n<p>Let&#8217;s extract the hits<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from fragmenstein import Monster, Victor\n\nfor model in (template1, template2, derivative):\n    model.mol = Victor.extract_mol(model.accession, block=model.clean_pdbblock,\n                                   ligand_resn=model.chemcomp,\n                                   smiles=model.SMILES,\n                                   removeHs=True,\n                                   proximityBonding=False,\n                                   throw_on_error=True)\n    model.mol.SetProp(model.chemcomp)<\/pre>\n\n\n\n<p>And lastly let&#8217;s have a gander.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import py3Dmol\n\ndef get_view(pdbblock: str, resn: str, chain='A', width=800, height=400) -&gt; py3Dmol.view:\n    view = py3Dmol.view(width=width, height=height)\n    view.addModel(pdbblock, 'pdb')\n    view.setStyle({'chain': chain}, {'cartoon': {'color': 'turquoise'}})\n    view.setStyle({'chain': {'$ne': chain}}, {})\n    view.setStyle({'within':{'distance':'5', 'sel':{'resn':resn}}}, {'stick': {}})\n    view.setStyle({'resn':resn}, {'stick': {'colorscheme': 'yellowCarbon'}})\n    view.zoomTo({'resn':resn})\n    return view\n\nmodel = derivative\nview: py3Dmol.view = get_view(model.clean_pdbblock, model.chemcomp)\nview.show()<\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-10.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"311\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-10.png?resize=625%2C311&#038;ssl=1\" alt=\"\" class=\"wp-image-10544\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-10.png?w=800&amp;ssl=1 800w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-10.png?resize=300%2C149&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-10.png?resize=768%2C382&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/11\/image-10.png?resize=624%2C310&amp;ssl=1 624w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Fragmenstein is a Python module that combine hits or position a derivative following given templates by being very strict in obeying them. This is done by creating a &#8220;monster&#8221;, a compound that has the atomic positions of the templates, which then reanimated by very strict energy minimisation. This is done in two steps, first in [&hellip;]<\/p>\n","protected":false},"author":102,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[187,14,201],"tags":[152,129,134],"ppma_author":[701],"class_list":["post-10522","post","type-post","status-publish","format-standard","hentry","category-cheminformatics","category-howto","category-small-molecules","tag-python","tag-rdkit","tag-small-molecules"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":701,"user_id":102,"is_guest":0,"slug":"nuben","display_name":"Matteo Ferla","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/2185fc527a4ace0863c903d27646996994413c31d80e8e4c175477b836e7715c?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/10522","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/102"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=10522"}],"version-history":[{"count":5,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/10522\/revisions"}],"predecessor-version":[{"id":10621,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/10522\/revisions\/10621"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=10522"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=10522"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=10522"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=10522"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}