{"id":11298,"date":"2024-05-14T22:12:08","date_gmt":"2024-05-14T21:12:08","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=11298"},"modified":"2025-01-02T13:55:30","modified_gmt":"2025-01-02T13:55:30","slug":"pyrosetta-for-rfdiffusion","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2024\/05\/pyrosetta-for-rfdiffusion\/","title":{"rendered":"Pyrosetta for RFdiffusion"},"content":{"rendered":"<div class=\"wp-block-image\">\n<figure class=\"alignright size-large is-resized\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/IMG_6605-1-scaled.jpg?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"833\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/IMG_6605-1.jpg?resize=625%2C833&#038;ssl=1\" alt=\"\" class=\"wp-image-11307\" style=\"width:195px;height:auto\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/IMG_6605-1-scaled.jpg?resize=768%2C1024&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/IMG_6605-1-scaled.jpg?resize=225%2C300&amp;ssl=1 225w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/IMG_6605-1-scaled.jpg?resize=1152%2C1536&amp;ssl=1 1152w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/IMG_6605-1-scaled.jpg?resize=1536%2C2048&amp;ssl=1 1536w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/IMG_6605-1-scaled.jpg?resize=624%2C832&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/IMG_6605-1-scaled.jpg?w=1920&amp;ssl=1 1920w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/IMG_6605-1-scaled.jpg?w=1250&amp;ssl=1 1250w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n<\/div>\n\n\n<p>I will not lie: I often struggle to find a snippet of code that did something in PyRosetta or I spend hours facing a problem caused by something not working as I expect it to. I recently did a tricky project involving RFdiffusion and I kept slipping on the PyRosetta side. So to make future me, others, and ChatGTP5 happy, here are some common operations to make working with PyRosetta for RFdiffusion easier.<\/p>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\">Import<\/h2>\n\n\n\n<p>For easy copypasting, let&#8217;s start PyRosetta and import things. As said in previous post I am not keen on star imports, because they are not meant to be used indiscriminately and can cause issues \u2014a common pythonic example is when someone mixes <code>collections.Counter<\/code> and <code>typing.Counter<\/code>.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import pyrosetta\nimport pyrosetta_help as ph\nfrom types import ModuleType\n# Better than star imports:\nprc: ModuleType = pyrosetta.rosetta.core\nprp: ModuleType = pyrosetta.rosetta.protocols\npru: ModuleType = pyrosetta.rosetta.utility\nprn: ModuleType = pyrosetta.rosetta.numeric\npr_conf: ModuleType = pyrosetta.rosetta.core.conformation\npr_scoring: ModuleType = pyrosetta.rosetta.core.scoring\npr_res: ModuleType = pyrosetta.rosetta.core.select.residue_selector\npr_options: ModuleType = pyrosetta.rosetta.basic.options\n\nlogger = ph.configure_logger()\npyrosetta.distributed.maybe_init(extra_options=ph.make_option_string(no_optH=False,\n                                                                     ex1=None,\n                                                                     ex2=None,\n                                                                     ignore_unrecognized_res=True,\n                                                                     load_PDB_components=False,\n                                                                     ignore_waters=True,\n                                                                    )\n                                 )<\/pre>\n\n\n\n<p>Load from a PDB file:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">pyrosetta.pose_from_file('\ud83d\udc7e\ud83d\udc7e\ud83d\udc7e.pdb')<\/pre>\n\n\n\n<p>or from string<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">pose = pyrosetta.Pose()\nprc.import_pose.pose_from_pdbstring(pose, 'ATOM \ud83d\udc7e\ud83d\udc7e\ud83d\udc7e...')<\/pre>\n\n\n\n<p>or from a mmCIF file. Note this will not work straight from PyMol because an extra entry at end is needed as <a href=\"https:\/\/blog.matteoferla.com\/2023\/02\/reading-mmcif-from-pymol-in-pyrosetta.html\">discussed previously<\/a>. I should say working with mmCIF is uncommon, but does have some nice metadata handling advantages.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">pose: pyrosetta.Pose = prc.import_pose.pose_from_file('\ud83d\udc7e\ud83d\udc7e\ud83d\udc7e.cif', read_fold_tree=True, type=prc.import_pose.FileType.CIF_file)<\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"alignright size-large is-resized\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/thread.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"464\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/thread.png?resize=625%2C464&#038;ssl=1\" alt=\"\" class=\"wp-image-11300\" style=\"width:273px;height:auto\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/thread.png?resize=1024%2C761&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/thread.png?resize=300%2C223&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/thread.png?resize=768%2C571&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/thread.png?resize=624%2C464&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/thread.png?w=1280&amp;ssl=1 1280w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n<\/div>\n\n\n<p>There are several other pose importers. Another useful one, for threading especially, is <code>pyrosetta.pose_from_sequence<\/code> is rather helpful \u2014just watch out for the eye of Sauron when threading.<\/p>\n\n\n\n<p>Extra complication happens when dealing with residue types that are not in the database. These need to be added to the pose before loading the file.<\/p>\n\n\n\n<p>As always with PyRosetta, if you pass an illegal value, say <code>prc.import_pose.pose_from_file(Exception)<\/code> you&#8217;ll get a <code>TypeError<\/code> that will tell you the accepted arguments of the overloaded function, which sometimes differs from the <code>help(fun<\/code>) info.<\/p>\n\n\n\n<p>A new residue type can be added from a params file or from memory. The official way to create a params file is from the Python 2.7 script from the Rosetta download, but that was not good for my purposes so I wrote a new converter, the <a href=\"https:\/\/github.com\/matteoferla\/rdkit_to_params\" data-type=\"link\" data-id=\"https:\/\/github.com\/matteoferla\/rdkit_to_params\">rdkit_to_params<\/a> module, which I also made into a <a href=\"https:\/\/params.matteoferla.com\/\" data-type=\"link\" data-id=\"https:\/\/params.matteoferla.com\/\">web app<\/a> due to popular demand.<\/p>\n\n\n\n<p>A params file can be passed via the initialisation options (<code>-extra_res_fa<\/code>) or via the following:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">pru: ModuleType = pyrosetta.rosetta.utility\n\nparams_filenames: List[str] = ...\npose: pyrosetta.Pose = ...\nparams_vector = pru.vector1_string()\nparams_vector.extend([f for f in params_filenames if f])\npyrosetta.generate_nonstandard_residue_set(pose, params_vector)<\/pre>\n\n\n\n<p>Having added a residue type does <em>not<\/em> mean a residue was added to the pose. For that you&#8217;d need to do:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">prc: ModuleType = pyrosetta.rosetta.core\n\nresiset: prc.chemical.ResidueTypeSet = pyrosetta.generate_nonstandard_residue_set(pose, params_vector)\nnew_res = prc.conformation.ResidueFactory.create_residue( resiset.name_map( name ) )\npose.append_residue_by_jump(new_res, pose.num_jump() + 1)<\/pre>\n\n\n\n<p>In the rdkit-to-params module, <code>Params.add_residuetype<\/code> adds a params but from a string basically.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def add_residuetype(self, pose: pyrosetta.Pose, params_block: str, name3: str, reset:bool=True) \\\n                                            -&gt; pyrosetta.rosetta.core.chemical.ResidueTypeSet:\n        \"\"\"\n        Adds the params to a copy of a residue type set of a pose.\n        If reset is True it will also save it as the default RTS \u2014it keeps other custom residue types.\n        \"\"\"\n        rts = pose.conformation().modifiable_residue_type_set_for_conf(prc.chemical.FULL_ATOM_t)\n        buffer = pyrosetta.rosetta.std.stringbuf(params_block)\n        stream = pyrosetta.rosetta.std.istream(buffer)\n        new = prc.chemical.read_topology_file(stream,\n                                                                 name3,\n                                                                 rts)\n        rts.add_base_residue_type(new)\n        if reset:\n            pose.conformation().reset_residue_type_set_for_conf(rts)\n        return rts\n<\/pre>\n\n\n\n<p>Connections, either polymeric (<code>LOWER<\/code> and <code>UPPER<\/code>) or otherwise (<code>CONN1<\/code> etc.) are a topic that would require its own blog post, so I won&#8217;t cover it here.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">PDBInfo<\/h2>\n\n\n\n<p>PDB information (residue numbering, chain, occupancy and more) is stored in a <code>pyrosetta.rosetta.core.pose.PDBInfo<\/code> instance attached to the pose (accessible via <code>.pdb_info()<\/code> method, which returns the instance attached and not a copy).<\/p>\n\n\n\n<p>This very handy. Two of its methods are <code>pose2pdb(res=\ud83d\udc7e)<\/code> and <code>pdb2pose(chain=\ud83d\udc7e, res=\ud83d\udc7e)<\/code> which allows one to jump between the two standards. A nice thing is that it does not segfault when a pose residue index that does not exist is called (<code>RuntimeError<\/code>). If the a PDB residue is request that does not exist it will return 0.<\/p>\n\n\n\n<p>A parenthetical warning \u2014 one of its methods is <code>.remarks()<\/code>, which is a no go. In the PDB format, REMARK is basically a comment that has really strict tools-specific grammar and is a total mess. The <code>.remarks()<\/code> does not work in PyRosetta and segfaults even when the file loaded did indeed have REMARK lines. And passing a <code>pyrosetta.rosetta.core.io.Remarks<\/code> instance will not work nor will appending to a <code>Remarks<\/code> or even making one:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">remarks = pyrosetta.rosetta.core.io.Remarks()\nremark = pyrosetta.rosetta.core.io.RemarkInfo()\nremark.value = 'Hello world'\nremarks.append(remark) # ValueError: vector<\/pre>\n\n\n\n<p>So were metadata needed to be added to a file, editing the PDB block might be a better bet \u2014editing the mmCIF dictionary data is even betterer. In a pinch appending comment lines to a PDB helps too (but don&#8217;t tell anyone).<\/p>\n\n\n\n<p>Certain operations (RemodelMover, grafting etc.) will make the PDB information obsolete or make it get lost.<\/p>\n\n\n\n<p>The Pose class has a method <code>split_by_chain<\/code> which returns the chains (as defined in the fold tree) preserving the PDBInfo, but the function (external to <code>pyrosetta.Pose<\/code>) <code>pyrosetta.rosetta.core.pose.append_pose_to_pose<\/code> does not, so it needs correcting:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def add_chain(built: pyrosetta.Pose, new: pyrosetta.Pose, reset: bool = False) -&gt; None:\n    \"\"\"\n    Add a chain ``new`` to a pose ``built`` preserving the residue numbering.\n\n    :param built: this is the pyrosetta.Pose that will be built into...\n    :param new: the addendum\n    :param reset: resets the PDBInfo for the chain present to A\n    :return:\n    \"\"\"\n    built_pi = built.pdb_info()\n    if built_pi is None or reset:\n        built_pi = prc.pose.PDBInfo(built)\n        built.pdb_info(built_pi)\n        for r in range(1, built.total_residue() + 1):\n            built_pi.set_resinfo(res=r, chain_id='A', pdb_res=r)\n    for chain in new.split_by_chain():\n        offset: int = built.total_residue()\n        pyrosetta.rosetta.core.pose.append_pose_to_pose(built, chain, new_chain=True)\n        # the new `built` residues will not have PDBinfo\n        chain_pi = chain.pdb_info()\n        for r in range(1, chain.total_residue() + 1):\n            built_pi.set_resinfo(res=r + offset, chain_id=chain_pi.chain(r), pdb_res=chain_pi.number(r))\n    built_pi.obsolete(False)<\/pre>\n\n\n\n<p>The functions <code>pyrosetta.rosetta.protocols.grafting.delete_region<\/code> or <code>pyrosetta.rosetta.protocols.grafting.return_region<\/code> do not impact the PDBinfo.<\/p>\n\n\n\n<p>Given immutation start sequence one can fix the chain infomation thusly:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def fix_starts(pose: pyrosetta.Pose, chain_letters: str, start_seqs: List[str]):\n    \"\"\"\n    Fix the chains in place \n\n    .. code-block:: python\n\n        strep_seq = 'MEAGIT'\n        start_seqs = ['MKIYY', strep_seq, strep_seq, 'GEFAR', strep_seq, strep_seq, 'FKDET']\n        fix_starts(pose, chain_letters='ACDEFGB', start_seq=start_seq)\n\n    :param pose:\n    :param chain_letters:\n    :param start_seq: Confusingly, the first is ignored: the start of the pose is the start of the first chain.\n    :return:\n    \"\"\"\n    pi = pose.pdb_info()\n    seq = pose.sequence()\n    seq_iter = iter(start_seqs[1:]+[None])\n    chain_iter = iter(chain_letters)\n    start_idx = 1\n    while True:\n        this_chain = next(chain_iter)\n        next_seq = next(seq_iter)\n        if next_seq is None:\n            for i in range(start_idx, len(seq)+1):\n                pi.chain(i, this_chain)\n            break\n        else:\n            next_start = seq.find(next_seq, start_idx) + 1\n            for i in range(start_idx, next_start):\n                pi.chain(i, this_chain)\n            start_idx = next_start\n    pose.update_pose_chains_from_pdb_chains()\n    assert pose.num_chains() == len(chain_letters), f'{pose.num_chains()} != {len(chain_letters)}'\n<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Threading set-up<\/h2>\n\n\n\n<p>The next steps address the fact, that the backbone skeleton lacks sidechains and needs to be threaded with a sequence generated with proteinMPNN. When one imports a pose with missing sidechains into PyRosetta, these get added. They are good, but not perfect, so copying the coordinates of the conserved sidechains is a must. A key move is superposition.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Superposition<\/h2>\n\n\n\n<p>First, a silly pedantic lexical side note. When one has a sequence of elements and one spaces out the elements so they match, that is called <code>to align<\/code>. When one places one object over another but letting one still be visible, say with a stagger like the traces in a joy plot or translating without rotating (<em>e.g.<\/em> \u25bd+\u25b3=\u2721\ufe0e) that is called to <code>superimpose<\/code>, when one rototranslates fully (<em>e.g<\/em>. \u25bd+\u25b3\u21ba=\u25bd), that is called <code>to superpose<\/code>. Just to go through both words, below I will the align sequences, to superposed structures. Although, in this specific case, the metadata has the information so best use that.<\/p>\n\n\n\n<p>There are actually several functions that allow superpositions, including presets for CA atoms or all atoms and so forth. The above is customisable as you have to tell it what atom goes to which.<\/p>\n\n\n\n<p>In PyRosetta one can superpose two poses with the following:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">mobile: pyrosetta.Pose = ...\nref: pyrosetta.Pose = ...\natom_map: prs.map_core_id_AtomID_core_id_AtomID = ... # this is a hash-mapping of mobile `pyrosetta.AtomID` to reference `pyrosetta.AtomID`\nrmsd: float = pr_scoring.superimpose_pose(mod_pose=mobile, ref_pose=ref, atom_map=atom_map)<\/pre>\n\n\n\n<p>If <code>pyrosetta.AtomID<\/code> seems familiar, that is because for constraints one has to play around with them a lot. So where we to align two poses by the CA atoms of a specified PDB chain that ought to be common, we can do:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">prc: ModuleType = pyrosetta.rosetta.core\nprp: ModuleType = pyrosetta.rosetta.protocols\npru: ModuleType = pyrosetta.rosetta.utility\nprn: ModuleType = pyrosetta.rosetta.numeric\nprs: ModuleType = pyrosetta.rosetta.std \npr_conf: ModuleType = pyrosetta.rosetta.core.conformation\npr_scoring: ModuleType = pyrosetta.rosetta.core.scoring\npr_res: ModuleType = pyrosetta.rosetta.core.select.residue_selector\npr_options: ModuleType = pyrosetta.rosetta.basic.options\n\ndef superpose_pose_by_chain(pose, ref, chain: str, strict: bool=True) -&gt; float:\n    \"\"\"\n    superpose by PDB chain letter\n\n    :param pose:\n    :param ref:\n    :param chain:\n    :return:\n    \"\"\"\n    atom_map = prs.map_core_id_AtomID_core_id_AtomID()\n    chain_sele: pr_res.ResidueSelector = pr_res.ChainSelector(chain)\n    r: int  # reference pose residue number (Fortran)\n    m: int  # mobile pose residue number (Fortran)\n    for r, m in zip(pr_res.selection_positions(chain_sele.apply(ref)),\n                    pr_res.selection_positions(chain_sele.apply(pose))\n                    ):\n        if strict:\n            assert pose.residue(m).name3() == ref.residue(r).name3(), 'Mismatching residue positions!'\n        ref_atom = pyrosetta.AtomID(ref.residue(r).atom_index(\"CA\"), r)\n        mobile_atom = pyrosetta.AtomID(pose.residue(m).atom_index(\"CA\"), m)\n        atom_map[mobile_atom] = ref_atom\n    return pr_scoring.superimpose_pose(mod_pose=pose, ref_pose=ref, atom_map=atom_map)<\/pre>\n\n\n\n<p>So what operations move the pose? A lot. The main two to keep an eye out for are RFdiffusion itself and Relax and its pesky drift.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">TRB file<\/h2>\n\n\n\n<p>Now that we know how to superpose two poses, we need to figure out which are the conserved ones. To determine what is conserved in a RFdiffusion run, the pickle <code>.trb<\/code> file is a good option, as it will have some good information. If this file was deleted or similar, then see alignment below. The designed protein, is termed the hallucination (technically the field is moving away from that term, but it is a nice one), while the parent template, the reference. The dictionary in the <code>.trb<\/code> file will have several keys some in the pattern <code>[complex_\/]con_[ref\/hal]_[pdb_idx\/idx0]. In RFdiffusion, one designs only chain A and all other chains become chain B: complex_<\/code> includes chain B. So to get the mapping of the conserved residues is as easy as <code>dict(zip(trb['complex_con_ref_idx0'], trb['complex_con_hal_idx0']))<\/code>. <code>inpaint_seq<\/code> is a sequence of booleans spanning the length of the complex.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Digression: Alignment<\/h2>\n\n\n\n<p>Sequence alignment comes into play when superposing protein by the conserved regions \u2014outside of RFdiffusion with the trb file. For that <code>Bio.Align.PairwiseAligner<\/code> can be used (since <code>pairwise2<\/code> was deprecated).<\/p>\n\n\n\n<p>In the default settings, the character &#8216;-&#8216; in the input sequence is treated like a regular character and not a free gap. Were one hellbent to do so a custom substitution matrix would be needed.<\/p>\n\n\n\n<p>However, if the poses to superpose differed by a span that any identity is coincidental, such as a redesigned part of a domain, then we need to strongly penalise mismatches and not penalise gap extensions.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from Bio.Align import PairwiseAligner, Alignment, substitution_matrices\n\ndef superpose_by_common_alignment(mobile: pyrosetta.Pose, ref: pyrosetta.Pose) -&gt; float:\n    \"\"\"\n    Pairwise alignment of the sequences of the poses.\n    This time only spans that are common.\n\n    return  (ref_index, mobile_index)\n    :param mobile:\n    :param ref:\n    :return:\n    \"\"\"\n    # ## align\n    aligner = PairwiseAligner()\n    # We don't want to get mismatches aligned, so no to:\n    # aligner.substitution_matrix = substitution_matrices.load(\"BLOSUM62\")\n    aligner.internal_gap_score = -10\n    aligner.extend_gap_score = -0.01\n    aligner.end_gap_score = -0.01\n    ref_seq: str = ref.sequence()\n    pose_seq: str = mobile.sequence()\n    aln: Alignment = aligner.align(ref_seq, pose_seq)[0]\n    # like before but with an extra condition `ref_seq[t] == pose_seq[q]`\n    aln_map: Dict[int, int] = {t: q for t, q in zip(aln.indices[0], aln.indices[1]) if\n               q != -1 and t != -1 and ref_seq[t] == pose_seq[q]}\n    # ## make pyrosetta atom map\n    ...\n    # return RMSD\n    return pr_scoring.superimpose_pose(mod_pose=mobile, ref_pose=ref, atom_map=atom_map)<\/pre>\n\n\n\n<p>In the case that one end does not matter in the sequence, there are attributes that control the left (N-terminal) an right (C-terminal) side gaps. For example, the attributes <code>target_right_gap_score<\/code> and <code>target_right_extend_gap_score<\/code> could be set to zero, which make the C-terminal difference an inkshed of gaps.<\/p>\n\n\n\n<p>Were one to want align to homologues, this would be fine:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from Bio.Align import PairwiseAligner, Alignment, substitution_matrices\n\ndef superpose_by_alignment(mobile: pyrosetta.Pose, ref: pyrosetta.Pose) -&gt; float:\n    \"\"\"\n    Pairwise alignment of the sequences of the poses.\n\n    return  (ref_index, mobile_index)\n    :param mobile:\n    :param ref:\n    :return:\n    \"\"\"\n    # ## align\n    aligner = PairwiseAligner()\n    aligner.substitution_matrix = substitution_matrices.load(\"BLOSUM62\")\n    ref_seq: str = ref.sequence()\n    pose_seq: str = mobile.sequence()\n    aln: Alignment = aligner.align(ref_seq, pose_seq)[0]\n    # `aln.indices` has the mapping\n    # an index of -1 is a map to a gap\n    aln_map: Dict[int, int] = {t: q for t, q in zip(aln.indices[0], aln.indices[1]) if\n               q != -1 and t != -1}\n    # ## make pyrosetta atom map\n    # for the purpose of explanation the following block is not in its own function, but will be repeated in a minute.\n    # where these all part of the same code, the repeated part would need to be its own function as it is very modular anyway!\n    atom_map = prs.map_core_id_AtomID_core_id_AtomID()\n    for r, m in aln_map.items():\n        ref_atom = pyrosetta.AtomID(ref.residue(r + offset).atom_index(\"CA\"), r + offset)\n        mobile_atom = pyrosetta.AtomID(mobile.residue(m + offset).atom_index(\"CA\"), m + offset)\n        atom_map[mobile_atom] = ref_atom\n    # ## superpose and return RMSD\n    return pr_scoring.superimpose_pose(mod_pose=mobile, ref_pose=ref, atom_map=atom_map)<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Stealing sidechains<\/h2>\n\n\n\n<p>One critical detail to keep in mind is that the RFdiffusion\/hallucination is a backbone skeleton without sidechains. When this is imported, PyRosetta will add the missing atoms, sequentially by residue. Intriguingly, the sidechains match somewhat where they should be, but the match is not perfect and the potential energy is worse (higher). As a result we need to steal the sidechains from the template. To do this we need to be careful with variable names as the superposition runs on the words &#8216;ref&#8217; and &#8216;mobile&#8217;, while the trb metadata runs off &#8216;hal&#8217; and &#8216;ref&#8217;.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def steal_frozen(acceptor: pyrosetta.Pose,\n                 donor: pyrosetta.Pose, trb: Dict[str, Any],\n                 move_acceptor: bool = False\n                 ):\n    \"\"\"\n    Copy all the conserved coordinates from the donor to the acceptor.\n    These are determined by ``trb`` dict from RFdiffusion.\n    This is similar to ``prp.comparative_modeling.StealSideChainsMover``,\n    but does not require the threading infrastructure.\n\n    The hallucination is the acceptor, the parent is the donor.\n    The RFdiffused pose is skeleton, but when imported the sidechains are added.\n    The theft is done in 3 steps.\n\n    1. A mapping of residue idx, atom idx to residue idx, atom idx is made.\n    2. The hallucination is superimposed on the parent if move_acceptor is True, else vice versa.\n    3. The coordinates are copied from the parent to the hallucination.\n\n    The term 'ref' gets confusing.\n     hallucination is fixed \/ mutanda, parent is mobile\n    fixed is called ref, but ref means parent for RFdiffusion so it is flipped)\n\n\n    :param acceptor:\n    :param donor:\n    :param trb:\n    :return:\n    \"\"\"\n\n    # ## Make mapping of all atoms of conserved residues\n    donor2acceptor_idx1s: Dict[Tuple[FTypeIdx, FTypeIdx], Tuple[FTypeIdx, FTypeIdx, str]] = {}\n    # these run off 0-based indices\n    for donor_res_idx0, acceptor_res_idx0 in zip(trb['complex_con_ref_idx0'], trb['complex_con_hal_idx0']):\n        donor_res_idx1 = donor_res_idx0 + 1\n        acceptor_res_idx1 = acceptor_res_idx0 + 1\n        acceptor_res = acceptor.residue(acceptor_res_idx1)\n        donor_res = donor.residue(donor_res_idx1)\n        assert donor_res.name3() == acceptor_res.name3(), f'donor {donor_res.name3()} != acceptor {acceptor_res.name3()}'\n        mob_atomnames = [donor_res.atom_name(ai1) for ai1 in range(1, donor_res.natoms() + 1)]\n        for fixed_atm_idx1 in range(1, acceptor_res.natoms() + 1):\n            aname = acceptor_res.atom_name(fixed_atm_idx1)  # key to map one to other: overkill bar for HIE\/HID\n            if aname not in mob_atomnames:\n                print(f'Template residue {donor_res.annotated_name()}{donor_res_idx1} lacks atom {aname}')\n                continue\n            donor_atm_idx1 = donor_res.atom_index(aname)\n            donor2acceptor_idx1s[(donor_res_idx1, donor_atm_idx1)] = (acceptor_res_idx1, fixed_atm_idx1, aname)\n\n    # ## Superpose\n    atom_map = prs.map_core_id_AtomID_core_id_AtomID()\n    if move_acceptor:\n        mobile: pyrosetta.Pose = acceptor\n        fixed: pyrosetta.Pose = donor\n    else:\n        mobile: pyrosetta.Pose = donor\n        fixed: pyrosetta.Pose = acceptor\n    for (donor_res_idx1, donor_atm_idx1), (acceptor_res_idx1, acceptor_atm_idx1, aname) in donor2acceptor_idx1s.items():\n        if move_acceptor:\n            mob_res_idx1, mob_atm_idx1 = acceptor_res_idx1, acceptor_atm_idx1\n            fixed_res_idx1, fixed_atm_idx1 = donor_res_idx1, donor_atm_idx1\n        else:\n            mob_res_idx1, mob_atm_idx1 = donor_res_idx1, donor_atm_idx1\n            fixed_res_idx1, fixed_atm_idx1 = acceptor_res_idx1, acceptor_atm_idx1\n        if aname.strip() not in ('N', 'CA', 'C', 'O'):  # BB superpositon\n            continue\n        fixed_atom = pyrosetta.AtomID(fixed_atm_idx1, fixed_res_idx1)\n        mobile_atom = pyrosetta.AtomID(mob_atm_idx1, mob_res_idx1)\n        atom_map[mobile_atom] = fixed_atom\n    rmsd = prc.scoring.superimpose_pose(mod_pose=mobile, ref_pose=fixed, atom_map=atom_map)\n    # I am unsure why this is not near zero but around 0.1\u20130.3\n    assert rmsd &lt; 1, f'RMSD {rmsd} is too high'\n\n    # ## Copy coordinates\n    to_move_atomIDs = pru.vector1_core_id_AtomID()\n    to_move_to_xyz = pru.vector1_numeric_xyzVector_double_t()\n    for (donor_res_idx1, donor_atm_idx1), (acceptor_res_idx1, acceptor_atm_idx1, aname) in donor2acceptor_idx1s.items():\n        # if aname in ('N', 'CA', 'C', 'O'):  # BB if common\n        #     continue\n        # this does not stick: fixed_res.set_xyz( fixed_ai1, mob_res.xyz(mob_ai1) )\n        to_move_atomIDs.append(pyrosetta.AtomID(acceptor_atm_idx1, acceptor_res_idx1))\n        to_move_to_xyz.append(donor.residue(donor_res_idx1).xyz(donor_atm_idx1))\n\n    acceptor.batch_set_xyz(to_move_atomIDs, to_move_to_xyz)\n\n    # ## Fix HIE\/HID the brutal way\n    v = prc.select.residue_selector.ResidueNameSelector('HIS').apply(acceptor)\n    relax = prp.relax.FastRelax(pyrosetta.get_score_function(), 1)\n    movemap = pyrosetta.MoveMap()\n    movemap.set_bb(False)\n    movemap.set_chi(v)\n    movemap.set_jump(False)\n    relax.apply(acceptor)\n    return rmsd, donor2acceptor_idx1s<\/pre>\n\n\n\n<p>Once in a while, I get an email asking me about HIE\/HID, because I mention it somewhere, so I am reticent to address a wee bit of code there: the HIS tautomerisation. In Rosetta, the HID tautomer is a patch on HIS, (<code>residue.annotated_name() == 'H[HIS_D]'<\/code>. For a specific change one can mutate the residue to alanine, then the patched\/unpatched residue (cf. \u00a7Thread). There are tasks that can added to packers to sample the his tautomers \/ HNQ flips, but the laziest way is to let Relax deal with it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Thread<\/h2>\n\n\n\n<p>In PyRosetta one can mutate a residue via<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">prp: ModuleType = pyrosetta.rosetta.protocols\nprp.simple_moves.MutateResidue(1, 'ALA').apply(pose)<\/pre>\n\n\n\n<p>This allows mutations to not only base residues, but also patched residues, such as <code>NtermProteinFull<\/code> (with extra N-terminal proton) or <code>acetylated<\/code> etc. To see if in the database folder there is what you are after it is honestly easier simply looking in the folder:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import pyrosetta, itertools\nfrom pathlib import Path\nfrom itertools import chain\nfa = 'database\/chemical\/residue_type_sets\/fa_standard'\n\n# residue types\nprint([p.name for p in (Path(pyrosetta.__file__).parent \/ fa \/ 'residue_types' ).glob('*\/*.params')])\n\n# params types\npatches_folder = (Path(pyrosetta.__file__).parent \/ fa \/ 'patches' )\nprint([p.name for p in itertools.chain(patches_folder.glob('*.txt'), patches_folder.glob('*\/*.txt'))])<\/pre>\n\n\n\n<p><br>Just be aware, that <code>MutateResidue<\/code> segfaults on error, so try the patch first and it is fiddly at first.<\/p>\n\n\n\n<p>To remove the terminus patches one can do the following:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import pyrosetta\nprc: ModuleType = pyrosetta.rosetta.core\npru: ModuleType = pyrosetta.rosetta.utility\n\ndef remove_terminus_patches(pose):\n    clean_template_pose = template_pose.clone()\n    prc.pose.remove_nonprotein_residues(pose)\n    ### find\n    lowers = pru.vector1_std_pair_unsigned_long_protocols_sic_dock_Vec3_t()\n    uppers = pru.vector1_std_pair_unsigned_long_protocols_sic_dock_Vec3_t()\n    prc.sic_dock.get_termini_from_pose(pose, lowers, uppers)\n    ### remove\n    for upper, _ in uppers:\n        prc.conformation.remove_upper_terminus_type_from_conformation_residue(clean_template_pose.conformation(), upper)\n    for lower, _ in lowers:\n        prc.conformation.remove_lower_terminus_type_from_conformation_residue(clean_template_pose.conformation(), lower)<\/pre>\n\n\n\n<p>However, mutating a whole pose based on a sequence is a pillar of homology modelling, namely threading. I have discussed in an other post <a href=\"https:\/\/blog.matteoferla.com\/2021\/10\/filling-missing-loops-by-cannibalising.html\" data-type=\"link\" data-id=\"https:\/\/blog.matteoferla.com\/2021\/10\/filling-missing-loops-by-cannibalising.html\">threading and hybridisation in PyRosetta<\/a> and most examples here are simply ported from <a href=\"https:\/\/pyrosetta-help.readthedocs.io\/en\/latest\/_modules\/pyrosetta_help\/threading.html#thread\" data-type=\"link\" data-id=\"https:\/\/pyrosetta-help.readthedocs.io\/en\/latest\/_modules\/pyrosetta_help\/threading.html#thread\">a helper module of mine<\/a>, so I&#8217;ll be brief. Threading in Rosetta requires a &#8220;Grishin file&#8221;, which is a specific form of pairwise sequence alignment file. The mover is <code>ThreadingMover<\/code> with the aid of <code>StealSideChainsMover<\/code>, namely:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">prc: ModuleType = pyrosetta.rosetta.core\nprp: ModuleType = pyrosetta.rosetta.protocols\n\nclean_template_pose = remove_terminus_patches(pose) # no termini from earlier!\ntarget_pose = pyrosetta.Pose()\nprc.pose.make_pose_from_sequence(target_pose, target_sequence, 'fa_standard')\n\n## Thread\nalign = prc.sequence.read_aln(format='grishin', filename=aln_file)[1]\nthreader = prp.comparative_modeling.ThreadingMover(align=align, template_pose=clean_template_pose)\nthreader.apply(target_pose)\n## Steal sidechains\nqt = threader.get_qt_mapping(target_pose)\nsteal = prp.comparative_modeling.StealSideChainsMover(clean_template_pose, qt)\nsteal.apply(target_pose)<\/pre>\n\n\n\n<p>The main point of the aforementioned post about threading was loop modelling by cannibalising AF2, which can be done with the following settings before using the <code>ThreadingMover<\/code> mover:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># optional set for loop modelling from reference poses\nposes: List[pyrosetta.Pose] = ... # poses to cannibilise for loops\nlengths: List[int] = [3,] # 3,6,9 are traditional choices\nfragment_sets = pru.vector1_std_shared_ptr_core_fragment_FragSet_t(len(lengths))\nfor i, l in enumerate(lengths):\n        fragment_sets[i+1] = prc.fragment.ConstantLengthFragSet(l)\n        for pose in poses:\n            prc.fragment.steal_constant_length_frag_set_from_pose(pose, fragment_sets[i+1])\n\nthreader = ...\nthreader.build_loops(True)\nthreader.randomize_loop_coords(True)  # default\nthreader.frag_libs(fragment_sets)\nthreader.apply(...)<\/pre>\n\n\n\n<p>As may be apparent, the <code>ThreadingMover<\/code> does not like non-base residues. So for ligand residues, one can just use <code>append_subpose_to_pose<\/code> function.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">prc: ModuleType = pyrosetta.rosetta.core\npr_res: ModuleType = pyrosetta.rosetta.core.select.residue_selector\n\ndef steal_ligands(donor_pose, acceptor_pose) -&gt; None:\n    \"\"\"\n    Steals non-Protein residues from donor_pose and adds them to acceptor_pose\n\n    Do not use with nucleic acid polymers.\n\n    :param donor_pose:\n    :param acceptor_pose:\n    :return:\n    \"\"\"\n    PROTEIN = prc.chemical.ResidueProperty.PROTEIN\n    prot_sele = pr_res.ResiduePropertySelector(PROTEIN)\n    not_sele = pr_res.NotResidueSelector(prot_sele)\n    rv = pr_res.ResidueVector(not_sele.apply(donor_pose))\n    # if it were DNA...\n    # for from_res, to_res in ph.rangify(rv):\n    #     prc.pose.append_subpose_to_pose(acceptor_pose, donor_pose, from_res, to_res, True)\n    for res in rv:\n        prc.pose.append_subpose_to_pose(acceptor_pose, donor_pose, res, res, True)<\/pre>\n\n\n\n<p>Now back to threading. As mentioned, I just use my helper function to deal with the above:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import pyrosetta_help\n\ndef thread(template_block, target_seq, target_name, template_name,\n           temp_folder='\/tmp'):\n    # load template\n    template = pyrosetta.Pose()\n    prc.import_pose.pose_from_pdbstring(template, template_block)\n    # thread\n    aln_filename = f'{temp_folder}\/{template_name}-{target_name}.grishin'\n    ph.write_grishin(target_name=target_name,\n                     target_sequence=target_seq,\n                     template_name=template_name,\n                     template_sequence=template.sequence(),\n                     outfile=aln_filename\n                     )\n    aln: prc.sequence.SequenceAlignment = prc.sequence.read_aln(format='grishin', filename=aln_filename)[1]\n    threaded: pyrosetta.Pose\n    threader: prp.comparative_modeling.ThreadingMover\n    threadites: pru.vector1_bool\n    threaded, threader, threadites = ph.thread(target_sequence=target_seq,\n                                               template_pose=template,\n                                               target_name=target_name,\n                                               template_name=template_name,\n                                               align=aln\n                                               )\n    # no need to superpose. It is already aligned\n    # superpose(template, threaded)\n    # fix pdb info\n    n = threaded.total_residue()\n    pi = prc.pose.PDBInfo(n)\n    for i in range(1, n + 1):\n        pi.number(i, i)\n        pi.chain(i, 'A')\n    threaded.pdb_info(pi)\n    return threaded<\/pre>\n\n\n\n<p>A pythonic parenthesis. Names of callable objects are, by good practice, verbs, however stuff gets confusing when a word can be both a noun and a verb: I have tripped up so often with thread and fragment. Hence the use of the past particle in the above. For fragment, which has initial-stress derivation I naughtily use <code>fr\u00e0gment<\/code> and <code>fragm<code>\u00e8<\/code>nt<\/code>. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Relax<\/h2>\n\n\n\n<p>The workhorse of Rosetta is without question the FastRelax mover. It is very customisable, so most often there is no near to play with packers, Monte Carlo samplers etc. This is true for design mode. I won&#8217;t go into detail on the basics as this is covered in every tutorial.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">scorefxn: pr_scoring.ScoreFunction = pyrosetta.get_fa_scorefxn()\n# don't forget to set weights!\nscorefxn.set_weight(pr_scoring.ScoreType.atom_pair_constraint, 3)\nscorefxn.set_weight(pr_scoring.ScoreType.coordinate_constraint, 5)\ncycles = 3  # more than 5 is probably not going to do much\nrelax = prp.relax.FastRelax(scorefxn, cycles)\n# set movemap to freeze residues\n# say repack the sidechains of chain A and its neighbours \n# allow backbone movements for chain A only\nmovemap = pyrosetta.MoveMap()\nrs: ModuleType = prc.select.residue_selector\nchainA_sele: rs.ResidueSelector = rs.ChainSelector('A')\nchainA: pru.vector1_bool = chainA_sele.apply(pose)\nneigh_sele: rs.ResidueSelector = rs.NeighborhoodResidueSelector(chainA_sele, True, distance)\nneighs: pru.vector1_bool = neigh_sele.apply(pose)\nmovemap.set_chi(neighs)\nmovemap.set_bb(chainA)\nmovemap.set_jump(False)\nrelax.set_movemap(movemap)\n# go!\nreplax.apply(pose)<\/pre>\n\n\n\n<p>In the case of RFdiffusion the template ought to have been minimised otherwise either the scores will be for which case was minimised better if the other chains are also minimised (don&#8217;t) or the scores will be for cases where the designed binder interacts or not with a spuriously strained part of the binding partner.<br>In the above example the neighbourhood was moved as an example. If you do repack these sidechains, which is reasonable, do make sure to keep the complex and the original binding partner will no longer work.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"alignright size-full is-resized\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/image.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"672\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/image.png?resize=625%2C672&#038;ssl=1\" alt=\"\" class=\"wp-image-11308\" style=\"width:207px;height:auto\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/image.png?w=930&amp;ssl=1 930w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/image.png?resize=279%2C300&amp;ssl=1 279w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/image.png?resize=768%2C826&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2024\/05\/image.png?resize=624%2C671&amp;ssl=1 624w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><figcaption class=\"wp-element-caption\">Parametrically designed Christmas tree (so old school way)<br><a href=\"https:\/\/michelanglo.sgc.ox.ac.uk\/r\/christmas\">https:\/\/michelanglo.sgc.ox.ac.uk\/r\/christmas<\/a><\/figcaption><\/figure>\n<\/div>\n\n\n<p>A cool thing once can do with PyRosetta is use electron design as a constraint (<a href=\"https:\/\/blog.matteoferla.com\/2020\/04\/how-to-set-up-electron-density.html\" data-type=\"link\" data-id=\"https:\/\/blog.matteoferla.com\/2020\/04\/how-to-set-up-electron-density.html\">tutorial<\/a>) \u2014theoretically one could used RFdiffusion via blender API, gemmi and numpy manupulations to make custom shaped protein, but that would be for fun and not actually relevant here.<\/p>\n\n\n\n<p>If there are covalent bond lengths that need fixing, the we have do it cartesian:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">constraint_weight = 5\nscorefxn: pr_scoring.ScoreFunction = pyrosetta.create_score_function('ref2015_cart')\nscorefxn.set_weight(pr_scoring.ScoreType.coordinate_constraint, constraint_weight)\nscorefxn.set_weight(pr_scoring.ScoreType.angle_constraint, constraint_weight)\nscorefxn.set_weight(pr_scoring.ScoreType.atom_pair_constraint, constraint_weight)\n\nrelax = pyrosetta.rosetta.protocols.relax.FastRelax(scorefxn, 3)\nrelax.cartesian(True)\nrelax.minimize_bond_angles(True)\nrelax.minimize_bond_lengths(True)\nrelax.apply(pose)<\/pre>\n\n\n\n<p>In both example I allude to constraints. These are well covered and are set up in various way for example, given the pose index and atom name of the atoms we can do:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def constrain_distance(pose, fore_idx, fore_name, aft_idx, aft_name, x0_in=1.334, sd_in=0.2, tol_in=0.02, weight=1):\n    AtomPairConstraint = pr_scoring.constraints.AtomPairConstraint  # noqa\n    fore = pyrosetta.AtomID(atomno_in=pose.residue(fore_idx).atom_index(fore_name),\n                                rsd_in=fore_idx)\n    aft = pyrosetta.AtomID(atomno_in=pose.residue(aft_idx).atom_index(aft_name),\n                              rsd_in=aft_idx)\n    fun = pr_scoring.func.FlatHarmonicFunc(x0_in=x0_in, sd_in=sd_in, tol_in=tol_in)\n    if weight != 1:\n        fun = pr_scoring.func.ScalarWeightedFunc(weight, fun)\n    con = AtomPairConstraint(fore, aft, fun)\n    pose.add_constraint(con)\n    return con\n\ndef constrain_angle(pose, fore_idx, fore_name, mid_name, mid_idx, aft_idx, aft_name, x0_in=109\/180*3.14, sd_in=10\/180*3.14, weight=1):\n    AngleConstraint = pr_scoring.constraints.AngleConstraint\n    fore = pyrosetta.AtomID(atomno_in=pose.residue(fore_idx).atom_index(fore_name),\n                                rsd_in=fore_idx)\n    mid = pyrosetta.AtomID(atomno_in=pose.residue(mid_idx).atom_index(mid_name),\n                                rsd_in=mid_idx)\n    aft = pyrosetta.AtomID(atomno_in=pose.residue(aft_idx).atom_index(aft_name),\n                              rsd_in=aft_idx)\n    fun = pr_scoring.func.CircularHarmonicFunc(x0_radians=x0_in, sd_radians=sd_in)\n    if weight != 1:\n        fun = pr_scoring.func.ScalarWeightedFunc(weight, fun)\n    con = AngleConstraint(fore, mid, aft, fun)\n    pose.add_constraint(con)\n    return con\n\ndef constrain_position(pose: pyrosetta.Pose, target_index: int, ref_index: int, x0_in=0., sd_in=0.01):\n    ref_ca = pyrosetta.AtomID(atomno_in=pose.residue(ref_index).atom_index('CA'), rsd_in=ref_index)\n    target_ca = pyrosetta.AtomID(atomno_in=pose.residue(target_index).atom_index('CA'), rsd_in=frozen_index)\n    target_xyz = pose.residue(frozen_index).xyz(target_ca.atomno())\n    fun = pr_scoring.func.HarmonicFunc(x0_in=x0_in, sd_in=sd_in)\n    if weight != 1:\n        fun = pr_scoring.func.ScalarWeightedFunc(weight, fun)\n    con = pr_scoring.constraints.CoordinateConstraint(a1=target_ca, fixed_atom_in=ref_ca, xyz_target_in=target_xyz, func=fun, scotype=pr_scoring.ScoreType.coordinate_constraint)\n    pose.add_constraint(con)\n\ndef constrain_peptide_gap(pose, chain_break, x0_in=1.334, sd_in=0.2, tol_in=0.02):\n    \"\"\"\n    Close a gap between \n    \n    \"\"\"\n    AtomPairConstraint = pr_scoring.constraints.AtomPairConstraint  # noqa\n    fore_c = pyrosetta.AtomID(atomno_in=pose.residue(chain_break).atom_index('C'),\n                                rsd_in=chain_break)\n    aft_n = pyrosetta.AtomID(atomno_in=pose.residue(chain_break + 1).atom_index('N'),\n                              rsd_in=chain_break + 1)\n    fun = pr_scoring.func.FlatHarmonicFunc(x0_in=x0_in, sd_in=sd_in, tol_in=tol_in)\n    con = AtomPairConstraint(fore_c, aft_n, fun)\n    pose.add_constraint(con)<\/pre>\n\n\n\n<p>One thing to keep an eye out for is how healthy are the constraints after minimisation:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def show_cons(pose):\n    \"\"\"\n    print the score for the various constraints\n    \"\"\"\n    get_atomname = lambda atomid: pose.residue(atomid.rsd()).atom_name(atomid.atomno()).strip()\n    get_description = lambda atomid: f'{pose.residue(atomid.rsd()).name3()}{pi.pose2pdb(atomid.rsd()).strip().replace(\" \", \":\")}.{get_atomname(atomid)}'\n    for con in pose.constraint_set().get_all_constraints():\n        if con.type() == 'AtomPair':\n            print(con.type(), get_description(con.atom1()), get_description(con.atom2()), con.score(pose))\n        elif con.type() == 'Angle':\n            print(con.type(), get_description(con.atom1()), get_description(con.atom2()), get_description(con.atom3()), con.score(pose))\n        else:\n            print(con.type(), con.score(pose))<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Design<\/h2>\n\n\n\n<p>One can do design with FastRelax. Due to the way the mover samples, it fixes sub-optimal residues one by one as opposed to large scale remodelling \u2014for that Remodel might be better suitable (<a href=\"https:\/\/blog.matteoferla.com\/2021\/04\/remodel-in-pyrosetta.html\" data-type=\"link\" data-id=\"https:\/\/blog.matteoferla.com\/2021\/04\/remodel-in-pyrosetta.html\">tutorial<\/a>). Design is really useful for RFdiffusion as the latter can in essence be thought as coarse-grain so there may be some residue combinations that are subpar. Obviously, when FastDesign choses a residue it is based on the scorefunction, which is imperfect: a quirky example of this is its proclivity to change certain residues in Streptag when bound to streptavidin even though the former is the product of rounds and rounds of phage display.<\/p>\n\n\n\n<p>To run design one needs to create a task factory for it onto which restrictions (operations) are pushed. Confusingly, a default task factory designs everything, while the default task factory in relax does not. The <code>ProhibitSpecifiedBaseResidueTypes<\/code> is really handy for preventing cysteine from making life hard and also for correcting surface charges \u2014protein are not very soluble if the isoelectric point is close to pH 7.4 or aggregate if there are exposed patches. On the latter note, proteinMPNN when using the vanilla weights potentially might make inadvertently a transmembrane helix, so that is something to keep an eye out for.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def create_design_tf(pose:pyrosetta.Pose, design_sele: pr_res.ResidueSelector, distance:int) -&gt; prc.pack.task.TaskFactory:\n    \"\"\"\n    Create design task factory for relax.\n    Designs the ``design_sele`` and repacks around ``distance`` of it.\n\n    Remember to do\n    \n    ... code-block:: python\n\n        relax.set_enable_design(True)\n        relax.set_task_factory(task_factory)\n    \"\"\"\n    # design is default, so this is not done:\n    # residues_to_design = design_sele.apply(pose)\n    # design_ops = prc.pack.task.operation.OperateOnResidueSubset(????, residues_to_design)\n    no_cys = pru.vector1_std_string(1)\n    no_cys[1] = 'CYS'\n    no_cys_ops =  prc.pack.task.operation.ProhibitSpecifiedBaseResidueTypes(no_cys)\n    # No design, but repack\n    repack_sele = pr_res.NeighborhoodResidueSelector(design_sele, distance, False)\n    residues_to_repack = repack_sele.apply(pose)\n    repack_rtl = prc.pack.task.operation.RestrictToRepackingRLT()\n    repack_ops = prc.pack.task.operation.OperateOnResidueSubset(repack_rtl, residues_to_repack)\n    # No repack, no design\n    frozen_sele = pr_res.NotResidueSelector(pr_res.OrResidueSelector(design_sele, repack_sele))\n    residues_to_freeze = frozen_sele.apply(pose)\n    prevent_rtl = prc.pack.task.operation.PreventRepackingRLT()\n    frozen_ops = prc.pack.task.operation.OperateOnResidueSubset(prevent_rtl, residues_to_freeze)\n    # pyrosetta.rosetta.core.pack.task.operation.RestrictAbsentCanonicalAASRLT\n    # pyrosetta.rosetta.core.pack.task.operation.PreserveCBetaRLT\n    task_factory = prc.pack.task.TaskFactory()\n    task_factory.push_back(no_cys_ops)\n    task_factory.push_back(repack_ops)\n    task_factory.push_back(frozen_ops)\n    return task_factory<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Favour native residues<\/h3>\n\n\n\n<p>A residue is changed because the variant has a better score than the wild type \u2014technically within the metropolis criterion of the Monte Carlo sampling. Therefore it is possible for the sequence to drift near neutrally, which is not what we want here. To this effect the constraint <code>ResidueTypeConstraint<\/code> addresses this by favouring the native residue. This is a constraint, not a packer operation, and is controlled by the weight <code>pyrosetta.rosetta.core.scoring.ScoreType.res_type_constraint<\/code>. There are a least three ways to declare the constraint:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># native residues relative to pose object \u2014 the pose is used to get name3 and is not kept (so can change)\n# very much like the 'constrain to starting positions' in relax which creates a myriad coordinate constraints\nprc.scoring.constraints.ResidueTypeConstraint(pose, seqpos=1, native_residue_bonus=1)\n# or relative to user specified residue\nprc.scoring.constraints.ResidueTypeConstraint(seqpos=1, aa_in='A', name3_in='ALA', native_residue_bonus=1)\n# or using a wrapper for every residue in the pose\nprp.protein_interface_design.FavorNativeResidue(pose, 1)\n\n# all options require\nscorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.res_type_constraint, 1)<\/pre>\n\n\n\n<p>The bonus and weight to add is 1 and 1 in this example, but on preliminary tests, I feel like 1 and 2 is a better call, but that is a discussion for another time.<\/p>\n\n\n\n<p>The <code>ProhibitSpecifiedBaseResidueTypes<\/code> operation restricts against given amino acids, while to allow non-canonical amino acids a packer palette needs to be added (<code>task_factory.set_packer_palette<\/code>), but neither gives weights to given residues. One simple option to do that would be to add multiple <code>FavorNativeResidue<\/code> constraints \u2014they need not be wrapped in <code>AmbiguousConstraint<\/code> because the non-scoring ones get zero.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Variable length<\/h3>\n\n\n\n<p>When I used RFdiffusion I gave the algorithm a range in the contigmap, thus allowing the best length of an insertion to win. This does however make the pipeline a bit complicated. First I had to adapt the helper scripts of ProteinMPNN to be used as Python functions (see footnote). There are then two options for the design step: refer to the metadata or  do a sequence alignment. The former requires being organised, while the latter could use the alignment code discussed above.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def design_different_by_trb(pose: pyrosetta.Pose, trb: Dict[str, Any], cycles = 5, scorefxn=None):\n    \"\"\"The RFdiffusion case\"\"\"\n    idx_sele = prc.select.residue_selector.ResidueIndexSelector()\n        for idx0, b in enumerate(trb['inpaint_seq']):\n            if b:\n                continue\n            idx_sele.append_index(idx0 + 1)\n    task_factory: prc.pack.task.TaskFactory = create_design_tf(pose, design_sele=idx_sele, distance=0)\n    if scorefxn is None:\n        scorefxn = pyrosetta.get_fa_scorefxn()\n    relax = pyrosetta.rosetta.protocols.relax.FastRelax(scorefxn, cycles)\n    relax.set_enable_design(True)\n    relax.set_task_factory(task_factory)\n    relax.apply(pose)\n\ndef design_different_by_alignment(pose: pyrosetta.Pose, ref: pyrosetta.Pose, cycles = 5, scorefxn=None):\n    \"\"\"Not the RFdiffusion case\"\"\"\n    ref = ref.split_by_chain(1)\n    ref2pose: dict = align_for_atom_map(pose.split_by_chain(1), ref)\n    conserved = list(ref2pose.values())\n    idx_sele = pr_res.ResidueIndexSelector()\n    for i in range(1, len(pose.chain_sequence(1))):\n        if i not in conserved:\n            idx_sele.append_index(i)\n    task_factory: prc.pack.task.TaskFactory = create_design_tf(pose, design_sele=idx_sele, distance=0)\n    if scorefxn is None:\n        scorefxn = pyrosetta.get_fa_scorefxn()\n    relax = pyrosetta.rosetta.protocols.relax.FastRelax(scorefxn, cycles)\n    relax.set_enable_design(True)\n    relax.set_task_factory(task_factory)\n    relax.apply(pose)<\/pre>\n\n\n\n<p>The degree of change during a FastDesign run depends on how strained the residues are. So it important that the design be FastRelax properly first and not all at once.<br>Actually I often use FastRelax in a first pass with the backbone fixed to prevent the model from blowing up.<br>This degree of change _I think_ makes a good metric for scoring. If the pose.sequence() changed by more than 20% (irrespective of native favouring), then that is hard no.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Scoring<\/h2>\n\n\n\n<p>RFdiffusion will be made to generate lots of designs. The reason is because they are not all amazing. So they need to be scored.<br><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Coarse-grain clashes<\/h3>\n\n\n\n<p>First, we need to determine if the polyglycine skeleton will clash with the extended neighbourhood when the protein is part of a greater machinery that could not be feasibly fed into RFdiffusion. An example is a membrane, say one were designing a binder to a transmembrane protein, the design should not cross the membrane, so one could get the transmembrane protein from OPM database, superpose the skeleton and use the layers of O \/ N atoms. For a project of mine, I had a multidimensional polymeric lattice. Scoring via a scorefunction and using only the Lenard\u2013Jones repulsion is a tad overkill and prone to errors. So simply doing distance based calculations works:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">skeleton: pyrosetta.Pose = ...\nclasher: pyrosetta.Pose = ...  # where the skeleton combined with this pose, it should not have clashes\n\nxyz_skeleton = extract_coords(skeleton)\nxyz_clasher = extract_coords(clasher)\n\nall_distances = np.sqrt(np.sum((xyz_clasher[:, np.newaxis, :] - xyz_skeleton[np.newaxis, :, :]) ** 2, axis=-1))\nn_clashing = np.count_nonzero(all_distances &lt; 1.)<\/pre>\n\n\n\n<p>If designing interactions, the above but with &lt; 3. will given backbone hydrogen bonding to any atom of the reference.<br>Say the skeleton passed, we then generate the sequence with proteinMPNN, thread it, relax it and tune the sequence. Now we can do some proper scoring.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\u2206\u2206G<\/h3>\n\n\n\n<p>The first is the predicted Gibbs free energy from a scorefunction. The catch is that this absolute energy, which should not be taken too seriously. The reason being is that they are optimised for \u2206\u2206G calculations, so have empirical terms (i.e. fudge factors) to make them work, including the REF value (a per residue type weight, ref2015 values shown below), which means that were one to make a linear string of peptide it will not have a predicted absolute energy of folding of zero. However, in terms of magnitude is there is a screaming high value then the model is for the bin.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"json\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">{'ALA': 1.32468, 'ARG': -0.09474, 'ASN': -1.34026, 'ASP': -2.14574, 'CYS': 3.25479, 'GLN': -1.45095, 'GLU': -2.72453, 'GLY': 0.79816, 'HIS': -0.30065, 'ILE': 2.30374, 'LEU': 1.66147, 'LYS': -0.71458, 'MET': 1.65735, 'PHE': 1.21829, 'PRO': -1.64321, 'SER': -0.28969, 'THR': 1.15175, 'TRP': 2.26099, 'TYR': 0.58223, 'VAL': 2.64269}<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Per residue score<\/h3>\n\n\n\n<p>The scorefunction is a sum of the scores of each particle\/residue, and the per-residue scores are obtainable. These are a common way to determine how accurate the model likely is. The per-residue score should not &gt;+10 kcal\/mol, and most residues in minimised PDB protein are under 5 kcal\/mol, with one or two at the +8 kcal\/mol mark likely due to removed ligands and crystalline waters (implicit solvent is an approximation of bulk solvent, not frozen waters). Two things are worth remembering, the <code>ScoreFunction<\/code> does not store any data, but <code>Pose.energies<\/code> does. But both can access this data. Personally I prefer the latter, but the former has weights which is handy.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">prc: ModuleType = pyrosetta.rosetta.core\npr_scoring: ModuleType = pyrosetta.rosetta.core.scoring\n\nscorefxn = pyrosetta.ScoreFunction = pyrosetta.get_fa_scorefxn()\n# showcase weights:\nweights: Dict[str, float] = {w.name: scorefxn.get_weight(w) for w in scorefxn.get_nonzero_weighted_scoretypes()}\nprint(weights)\n# get per residue scores:\nres_scores = []\nfor i in range(1, monomer.total_residue() + 1):\n    v = pru.vector1_bool(pose.total_residue())\n    v[i] = 1\n    res_scores.append(scorefxn.get_sub_score(pose, v))\n\n# some magic can be done with the annoying EMapVector\nv: pr_scoring.EMapVector = scorefxn.weights()\nprint( v[pr_scoring.total_score], v[pr_scoring.fa_sol] )<\/pre>\n\n\n\n<p>Where using <code>pose.energies<\/code> is a lot easier<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">scorefxn(pose) # fills pose.energies\nenergies: pr_scoring.Energies = pose.energies()\ntyped_energies: Dict[str, float] = energies.active_total_energies()\na: npt.NDArray = energies.residue_total_energies_array()\nenergy = pd.DataFrame(a)  # amazingly civilised\n# alternative a specific residue can be queried via EMapVector as above\nv: pr_scoring.EMapVector = energies.residue_total_energies(1)<\/pre>\n\n\n\n<p>Do note that due to the fact that it takes two particles for a two body score (e.g. hydrogen bond), there&#8217;s a function to toggle whether these are halved or not (cf. <a href=\"https:\/\/www.rosettacommons.org\/node\/11245\" data-type=\"link\" data-id=\"https:\/\/www.rosettacommons.org\/node\/11245\">this post<\/a>).<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">scorefxn.weights()  # Create the EnergyMap\nemopts = pr_scoring.methods.EnergyMethodOptions(scorefxn.energy_method_options())\nemopts.hbond_options().decompose_bb_hb_into_pair_energies(True)\nscorefxn.set_energy_method_options(emopts)<\/pre>\n\n\n\n<p>Parenthetically, if the <code>pd.DataFrame(pose.energies().residue_total_energies_array())<\/code>  was not saved, but is needed again, here&#8217;s a function to read the junk on a scored dumped PDB.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def rosetta_pdb_to_df(pdbblock: str) -&gt; pd.DataFrame:\n    parsable = False\n    _data = []\n    for line in pdbblock.split('\\n'):\n        if '#BEGIN_POSE_ENERGIES_TABLE' in line:\n            parsable = True\n            continue\n        elif '#END_POSE_ENERGIES_TABLE' in line:\n            break\n        elif not parsable:\n            continue\n        parts = line.strip().split()\n        if parts[0] == 'label':\n            _data.append(parts)\n        elif parts[0] == 'weights':\n            _data.append([parts[0]] + list(map(float, parts[1:-1])) + [float('nan')])\n        else:\n            _data.append([parts[0]] + list(map(float, parts[1:])))\n    data = pd.DataFrame(_data)\n    data.columns = data.iloc[0]\n    data = data.iloc[1:].copy()\n    return data<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Interface energy<\/h3>\n\n\n\n<p>The thing we want to have is a nice interface. There are two ways to this. One is civilised, but prone to segfaults: the <code>InterfaceAnalyzerMover<\/code> mover.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def score_interface(complex: Union[pyrosetta.Pose, Sequence[pyrosetta.Pose]], interface: str):\n    if isinstance(complex, Sequence):\n        _complex = complex[0].clone()\n        for c in complex[1:]:\n            add_chain(_complex, c)\n        complex = _complex\n    ia = pyrosetta.rosetta.protocols.analysis.InterfaceAnalyzerMover(interface)\n    ia.apply(complex)\n    return {'complex_energy': ia.get_complex_energy(),\n            'separated_interface_energy': ia.get_separated_interface_energy(),\n            'complexed_sasa': ia.get_complexed_sasa(),\n            'crossterm_interface_energy': ia.get_crossterm_interface_energy(),\n            'interface_dG': ia.get_interface_dG(),\n            'interface_delta_sasa': ia.get_interface_delta_sasa()}<\/pre>\n\n\n\n<p>Like many other movers, no ligands are allowed.<\/p>\n\n\n\n<p>Interface is a chain letter form, say <code>A_BC<\/code>. This crashes out above 3 chains I believe. <\/p>\n\n\n\n<p>The other way is scoring the monomers in isolation (<code>pose.split_chain<\/code>) or translating a silly distance after having reset the <code>pose.energies<\/code> but preferably repacked the surface layer.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Changed sequence<\/h3>\n\n\n\n<p>As said above, how many residues changed when tweaked can be a proxy for something being not quite right. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Number of neighbours<\/h3>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">pose2pdb = oligomer.pdb_info().pose2pdb\nchainA_sele = pr_res.ChainSelector('A')\n# boolean vector:\nv = pr_res.NeighborhoodResidueSelector(chainA_sele, distance, False).apply(oligomer)\n# vector of pose idxs:\nclose_residues = prc.select.get_residues_from_subset(v)\n# array of PDB numbers\nprint( [pose2pdb(r) for r in close_residues] )<\/pre>\n\n\n\n<p>In another blog post I go through <a href=\"https:\/\/blog.matteoferla.com\/2020\/06\/love-thy-neighbours-but-select-them.html\" data-type=\"link\" data-id=\"https:\/\/blog.matteoferla.com\/2020\/06\/love-thy-neighbours-but-select-them.html\">the two different residue neighbourhood selectors<\/a>, but briefly, <code>NeighborhoodResidueSelector<\/code> is centroid (roughly the beta carbon), while <code>CloseContactResidueSelector<\/code> is closest atom.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">cc_sele = pr_res.CloseContactResidueSelector()\ncc_sele.central_residue_group_selector(chainA_sele)\ncc_sele.threshold(3)  # \u00c5<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Hydrophobicity &amp; co.<\/h3>\n\n\n\n<p>In a tool developed in OPIG, <a href=\"https:\/\/www.pnas.org\/doi\/10.1073\/pnas.1810576116\" data-type=\"link\" data-id=\"https:\/\/www.pnas.org\/doi\/10.1073\/pnas.1810576116\">Therapeutic Antibody Profiler<\/a>, an set of metrics where characterised, such as patches of surface hydrophobicity etc. which most likely would be well worth using in protein binder design. (I&#8217;ve not used these yet so don&#8217;t have a snippet to share).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Movement<\/h3>\n\n\n\n<p>One may be after rigidity. An MD run is best for that, but pyrosetta can do (discussed in detail in <a href=\"https:\/\/blog.matteoferla.com\/2022\/10\/shake-it-like-polaroid-picture-md-in.html\" data-type=\"link\" data-id=\"https:\/\/blog.matteoferla.com\/2022\/10\/shake-it-like-polaroid-picture-md-in.html\">another blog post<\/a>) via the <code>BackrubMover<\/code> mover.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def movement(original: pyrosetta.Pose,\n             trials: int = 100, temperature: int = 1.0, replicate_number: int = 20) -&gt; List[float]:\n    scorefxn = pyrosetta.get_fa_scorefxn()\n    backrub = pyrosetta.rosetta.protocols.backrub.BackrubMover()\n    mon\u00e9gasque = pyrosetta.rosetta.protocols.monte_carlo.GenericMonteCarloMover(maxtrials=trials,\n                                                                                max_accepted_trials=trials,\n                                                                                # gen.max_accepted_trials() = 0\n                                                                                task_scaling=5,\n                                                                                # gen.task_scaling()\n                                                                                mover=backrub,\n                                                                                temperature=temperature,\n                                                                                sample_type='low',\n                                                                                drift=True)\n    mon\u00e9gasque.set_scorefxn(scorefxn)\n    # find most deviant\n    rs = []\n    for i in range(replicate_number):\n        variant = original.clone()\n        mon\u00e9gasque.apply(variant)\n        if mon\u00e9gasque.accept_counter() &gt; 0:\n            rs.append(pr_scoring.bb_rmsd(variant, original))\n        else:\n            rs.append(float('nan'))\n    return rs\n<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Hopefully this exhaustive, and exhausting, overview of the movers and functions in PyRosetta, has adequately showcases what operations can be done in PyRosetta to better tune and analyse variants from RFdiffusion.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Footnote: ProteinMPNN helper as API functions<\/h2>\n\n\n\n<p>As mentioned, my designs had different lengths and I wanted to create the definitions for ProteinMPNN via my script as it was reading from tar.gz archives and other details. But briefly, I parsed the RFdiffusion skeletons thusly:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">no_fix = []\ndefinitions = []\nglobal_fixed_chains: Dict[str, List[List[str]]] = {}\nglobal_fixed_positions = {}\n\nfor path in paths:\n    # ## Load data\n    name = path.stem\n    pdbblock = get_ATOM_only(path.read_text())\n    # ## Parse definition (i.e. coordinates)\n    definition = parse_PDBblock(pdbblock, name)\n    definitions.append(definition)\n    # ## Fixed chain\n    fixed_chains = define_fixed_chains(definition, designed_chain_list='A') # chain A is designed\n    global_fixed_chains[name] = fixed_chains\n    # ## Fixed pos\n    sequence = get_chainA_sequence(pdbblock)  # assuming the sequence to change is chain A\n    # The skeleton will have stretches of glycines where the design was made:\n    masked = re.sub(r'(G{3,})', lambda match: '_' * len(match.group(1)), sequence)\n    fixed_list = [[i for i, r in enumerate(masked) if r == '_']]\n    fixed_positions = define_unfixed_positions(definition, ['A'], fixed_list)\n    global_fixed_positions[name] = fixed_positions\n\n# write out definitions, global_fixed_chains, global_fixed_positions\n\nwith open(chains_definitions_path, 'w') as fh:  # only chains_definitions.jsonl is a JSONL\n    for definition in definitions:\n        fh.write(json.dumps(definition) + '\\n')\n\nwith open(fixed_chains_path, 'w') as fh:\n    json.dump(global_fixed_chains, fh)\n\nwith open(fixed_positions_path, 'w') as fh:\n    json.dump(global_fixed_positions, fh)<\/pre>\n\n\n\n<p>Where the functions are:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">\"\"\"\nThis is a minor refactoring of the original code, with the following changes:\n\n`parse_PDBblock(pdbblock: str, name: str, chain_alphabet, ca_only=False)`\naccepts a PDB block ``pdbblock`` and a name ``name`` and returns a dictionary that forms a line for the JSONL.\n\nThe code is mostly kept as was, with minor chances.\n\"\"\"\n\nimport json\nfrom typing import List, Sequence, Dict, Any\nimport numpy as np\n\nalpha_1 = list(\"ARNDCQEGHILKMFPSTWYV-\")\nstates = len(alpha_1)\nalpha_3 = ['ALA', 'ARG', 'ASN', 'ASP', 'CYS', 'GLN', 'GLU', 'GLY', 'HIS', 'ILE',\n           'LEU', 'LYS', 'MET', 'PHE', 'PRO', 'SER', 'THR', 'TRP', 'TYR', 'VAL', 'GAP']\n\naa_1_N = {a: n for n, a in enumerate(alpha_1)}\naa_3_N = {a: n for n, a in enumerate(alpha_3)}\naa_N_1 = {n: a for n, a in enumerate(alpha_1)}\naa_1_3 = {a: b for a, b in zip(alpha_1, alpha_3)}\naa_3_1 = {b: a for a, b in zip(alpha_1, alpha_3)}\n\ndef get_ATOM_only(pdbblock: str) -&gt; str:\n    \"\"\"\n    This gets all ATOM, regardless of name and chain\n    \"\"\"\n    return '\\n'.join([line for line in pdbblock.splitlines() if line.startswith('ATOM')])\n\ndef get_chainA_sequence(pdbblock: str) -&gt; str:\n    sequence = ''\n    residues_seen = set()\n    for line in pdbblock.splitlines():\n        if line.startswith(\"ATOM\") and \" CA \" in line and \" A \" in line:\n            res_info = line[17:26]  # Residue name and number for uniqueness\n            if res_info not in residues_seen:\n                residues_seen.add(res_info)\n                res_name = line[17:20].strip()\n                sequence += three_to_one.get(res_name, '?')\n    return sequence\n\nthree_to_one = {\n    'ALA': 'A', 'CYS': 'C', 'ASP': 'D', 'GLU': 'E',\n    'PHE': 'F', 'GLY': 'G', 'HIS': 'H', 'ILE': 'I',\n    'LYS': 'K', 'LEU': 'L', 'MET': 'M', 'ASN': 'N',\n    'PRO': 'P', 'GLN': 'Q', 'ARG': 'R', 'SER': 'S',\n    'THR': 'T', 'VAL': 'V', 'TRP': 'W', 'TYR': 'Y'\n}\ndef AA_to_N(x):\n    # [\"ARND\"] -&gt; [[0,1,2,3]]\n    x = np.array(x);\n    if x.ndim == 0: x = x[None]\n    return [[aa_1_N.get(a, states - 1) for a in y] for y in x]\n\n\ndef N_to_AA(x):\n    # [[0,1,2,3]] -&gt; [\"ARND\"]\n    x = np.array(x);\n    if x.ndim == 1: x = x[None]\n    return [\"\".join([aa_N_1.get(a, \"-\") for a in y]) for y in x]\n\n\ndef inner_parse_PDBblock(pdbblock, atoms=['N', 'CA', 'C'], chain=None) -&gt; tuple:\n    '''\n    input:  pdb_filename = PDB filename\n            atoms = atoms to extract (optional)\n    output: (length, atoms, coords=(x,y,z)), sequence\n    '''\n    xyz, seq, min_resn, max_resn = {}, {}, 1e6, -1e6\n    for line in pdbblock.split(\"\\n\"):\n        if line[:6] == \"HETATM\" and line[17:17 + 3] == \"MSE\":\n            line = line.replace(\"HETATM\", \"ATOM  \")\n            line = line.replace(\"MSE\", \"MET\")\n\n        if line[:4] == \"ATOM\":\n            ch = line[21:22]\n            if ch == chain or chain is None:\n                atom = line[12:12 + 4].strip()\n                resi = line[17:17 + 3]\n                resn = line[22:22 + 5].strip()\n                x, y, z = [float(line[i:(i + 8)]) for i in [30, 38, 46]]\n\n                if resn[-1].isalpha():\n                    resa, resn = resn[-1], int(resn[:-1]) - 1\n                else:\n                    resa, resn = \"\", int(resn) - 1\n                #         resn = int(resn)\n                if resn &lt; min_resn:\n                    min_resn = resn\n                if resn &gt; max_resn:\n                    max_resn = resn\n                if resn not in xyz:\n                    xyz[resn] = {}\n                if resa not in xyz[resn]:\n                    xyz[resn][resa] = {}\n                if resn not in seq:\n                    seq[resn] = {}\n                if resa not in seq[resn]:\n                    seq[resn][resa] = resi\n\n                if atom not in xyz[resn][resa]:\n                    xyz[resn][resa][atom] = np.array([x, y, z])\n    # ^^ end of xyz loop\n\n    # convert to numpy arrays, fill in missing values\n    seq_, xyz_ = [], []\n    try:\n        for resn in range(min_resn, max_resn + 1):\n            if resn in seq:\n                for k in sorted(seq[resn]): seq_.append(aa_3_N.get(seq[resn][k], 20))\n            else:\n                seq_.append(20)\n            if resn in xyz:\n                for k in sorted(xyz[resn]):\n                    for atom in atoms:\n                        if atom in xyz[resn][k]:\n                            xyz_.append(xyz[resn][k][atom])\n                        else:\n                            xyz_.append(np.full(3, np.nan))\n            else:\n                for atom in atoms: xyz_.append(np.full(3, np.nan))\n        return np.array(xyz_).reshape(-1, len(atoms), 3), N_to_AA(np.array(seq_))\n    except TypeError:\n        return 'no_chain', 'no_chain'\n\n\ndef get_chain_alphabet():\n    init_alphabet = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',\n                     'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',\n                     'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']\n    extra_alphabet = [str(item) for item in list(np.arange(300))]\n    return init_alphabet + extra_alphabet\n\n\ndef parse_PDBblock(pdbblock: str, name: str, ca_only=False):\n    chain_alphabet = get_chain_alphabet()\n    my_dict = {}\n    s = 0\n    concat_seq = ''\n    for letter in chain_alphabet:\n        if ca_only:\n            sidechain_atoms = ['CA']\n        else:\n            sidechain_atoms = ['N', 'CA', 'C', 'O']\n        xyz, seq = inner_parse_PDBblock(pdbblock=pdbblock, atoms=sidechain_atoms, chain=letter)\n        if type(xyz) != str:\n            concat_seq += seq[0]\n            my_dict['seq_chain_' + letter] = seq[0]\n            coords_dict_chain = {}\n            if ca_only:\n                coords_dict_chain['CA_chain_' + letter] = xyz.tolist()\n            else:\n                coords_dict_chain['N_chain_' + letter] = xyz[:, 0, :].tolist()\n                coords_dict_chain['CA_chain_' + letter] = xyz[:, 1, :].tolist()\n                coords_dict_chain['C_chain_' + letter] = xyz[:, 2, :].tolist()\n                coords_dict_chain['O_chain_' + letter] = xyz[:, 3, :].tolist()\n            my_dict['coords_chain_' + letter] = coords_dict_chain\n            s += 1\n    my_dict['name'] = name\n    my_dict['num_of_chains'] = s\n    my_dict['seq'] = concat_seq\n    if s &lt; len(chain_alphabet):\n        return my_dict\n    else:\n        raise Exception('Too many chains')\n\n\ndef define_fixed_chains(chains_definition: Dict[str, Any], designed_chain_list=('A',)):\n    \"\"\"\n    The fixed chain definition file is a JSON dictionary of name to tuple\/list of two: designed_chain_list and fixed_chain_list\n    \"\"\"\n    all_chain_list = [item[-1:] for item in list(chains_definition) if item[:9] == 'seq_chain']  # ['A','B', 'C',...]\n    fixed_chain_list = [letter for letter in all_chain_list if\n                        letter not in designed_chain_list]  # fix\/do not redesign these chains\n    return list(designed_chain_list), fixed_chain_list\n\n\ndef define_global_fixed_chains(chains_definitions, global_designed_chain_list=('A',)):\n    return {chains_definition['name']: define_fixed_chains(chains_definition, global_designed_chain_list)\n            for chains_definition in chains_definitions}\n\n\ndef define_fixed_positions(chains_definition: Dict[str, Any],\n                           designed_chain_list: Sequence[str],\n                           fixed_list: Sequence[Sequence[int]]):\n    all_chain_list = [item[-1:] for item in list(chains_definition) if item[:9] == 'seq_chain']\n    fixed_position_dict = {}\n    for i, chain in enumerate(designed_chain_list):\n        fixed_position_dict[chain] = fixed_list[i]\n    for chain in all_chain_list:\n        if chain not in designed_chain_list:\n            fixed_position_dict[chain] = []\n    return fixed_position_dict\n\n\ndef define_unfixed_positions(chains_definition,\n                             designed_chain_list: Sequence[str],\n                             unfixed_list: Sequence[Sequence[int]]):\n    \"\"\"\n    This will be an entry in a dictionary passed to ``chain_id_jsonl``\n    (Misnomer: it's not a JSONL, but a JSON, only ``jsonl_path`` is)\n\n    :param chains_definition:\n    :param designed_chain_list:\n    :param unfixed_list:\n    :return:\n    \"\"\"\n    all_chain_list = [item[-1:] for item in list(chains_definition) if item[:9] == 'seq_chain']\n    fixed_position_dict = {}\n    for chain in all_chain_list:\n        seq_length = len(chains_definition[f'seq_chain_{chain}'])\n        all_residue_list = (np.arange(seq_length) + 1).tolist()\n        if chain not in designed_chain_list:\n            fixed_position_dict[chain] = all_residue_list\n        else:\n            idx = np.argwhere(np.array(designed_chain_list) == chain)[0][0]\n            fixed_position_dict[chain] = list(set(all_residue_list) - set(unfixed_list[idx]))\n    return fixed_position_dict\n<\/pre>\n\n\n","protected":false},"excerpt":{"rendered":"<p>I will not lie: I often struggle to find a snippet of code that did something in PyRosetta or I spend hours facing a problem caused by something not working as I expect it to. I recently did a tricky project involving RFdiffusion and I kept slipping on the PyRosetta side. So to make future [&hellip;]<\/p>\n","protected":false},"author":102,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[14,228,227],"tags":[5,58,770,152],"ppma_author":[701],"class_list":["post-11298","post","type-post","status-publish","format-standard","hentry","category-howto","category-protein-structure","category-python-code","tag-protein-structure","tag-protein-structure-prediction","tag-pyrosetta","tag-python"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":701,"user_id":102,"is_guest":0,"slug":"nuben","display_name":"Matteo Ferla","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/2185fc527a4ace0863c903d27646996994413c31d80e8e4c175477b836e7715c?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/11298","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/102"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=11298"}],"version-history":[{"count":5,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/11298\/revisions"}],"predecessor-version":[{"id":12174,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/11298\/revisions\/12174"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=11298"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=11298"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=11298"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=11298"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}