Finding the parts in common between two molecules appears to be a straightforward, but actually is a maze of layers. The task, maximum common substructure (MCS) searching, in RDKit is done by Chem.rdFMCS.FindMCS, which is highly customisable with lots of presets. What if one wanted to control in minute detail if a given atom X and is a match for atom Y? There is a way and this is how.
Aim
First, there are several resources on the basics of MCS, such as Greg Landrum’s tutorial on the subject. Herein, I want to discuss a specific use case: full control over the atom matching itself.
There are two options: calculate all combinations and filter them, or create a custom atom matcher. The first option will grind to a halt very quickly, so the second option is often the only viable one.
For Fragmenstein in order to map a molecule onto overlapping 3D molecules I had to resort to the latter deep in the code.

Basics
As the basics are well covered elsewhere, I am going to wizz through them.
To find the MCS between two molecules, the following code can be used.
The function `name2mol` (get a mol from a name via NIH Cactus) and `flatgrid` (a `Draw.MolsToGrid` wrapper) are defined in the footnote.
from rdkit import Chem
from rdkit.Chem import rdFMCS
capsaicin = name2mol('capsaicin')
jasminaldehyde = name2mol('jasminaldehyde')
result: rdFMCS.MCSResult = rdFMCS.FindMCS([capsaicin, jasminaldehyde])
flatgrid([capsaicin, jasminaldehyde, Chem.MolFromSmarts(result.smartsString)])

This is not what one would expect, because one needs to tweak the parameters of the search to match that expectation.
The function rdFMCS.FindMCS went throw several evolutions and is overloaded. The first older/messier way is that it accepts several arguments, which I would say fall into two groups:
- The boolean arguments
matchValences,ringMatchesRingOnly,completeRingsOnly,matchChiralTag - The compare enum arguments
atomCompare,bondCompare,ringCompare
The two are mostly mutually exclusive but don’t have a univocal mapping (ringMatchesRingOnly, completeRingsOnly are both covered by ringCompare). The compare enums are more powerful.
For example, here are the values they can take:
atomCompareargument accepts therdFMCS.AtomCompareenum, which has the values
❧CompareAny,CompareElements,CompareIsotopes,CompareAnyHeavyAtom.bondComparewantsrdFMCS.BondCompare
❧CompareAny,CompareOrder,CompareOrderExactringComparewantsrdFMCS.RingCompare
❧IgnoreRingFusion,PermissiveRingFusion,StrictRingFusion
The newer one, accepts a single parameter object (rdFMCS.MCSParameters). This objects two attributes with boolean-holding namedtuple-like objects (AtomCompareParameters and BondCompareParameters), which are a reshuffle of the boolean arguments; and two “typer” attributes (AtomTyper and BondTyper), which can be more than just the enums and allows some nice stuff.
For Fragmenstein, I was in need of transpiling the former system into the latter as arguments may be passed in the former system but the latter was used. As a result I had a function to do so:
def transmute_FindMCS_parameters(
**mode: Unpack[ExtendedFMCSMode]) -> rdFMCS.MCSParameters: # noqa lowercase not applicable
"""
The function ``rdFMCS.FindMCS`` has two ways of being used.
In one, a series of arguments are passed,
in another a ``rdFMCS.MCSParameters`` object is passed (a wrapped C++ structure).
Unfortunately, there does not seem to be a way to transmute the former into the other.
Hence, this function
The ``params.AtomTyper`` and ``params.BondTyper`` members
can be either
* the enum ``rdFMCS.AtomCompare`` or ``rdFMCS.BondCompare``
* or a subclass of ``rdFMCS.MCSBondCompare`` or ``rdFMCS.MCSAtomCompare``
The ``rdFMCS.RingCompare`` appears to be absorbed into the ``params.BondCompareParameters``
member.
"""
params = rdFMCS.MCSParameters()
# three integers: https://github.com/rdkit/rdkit/blob/b208da471f8edc88e07c77ed7d7868649ac75100/Code/GraphMol/FMCS/FMCS.h
# they are not rdFMCS.AtomCompare rdFMCS.BondCompare and rdFMCS.RingCompare enums?
# atom parameters
atomCompare: rdFMCS.AtomCompare = mode.get('atomCompare', rdFMCS.AtomCompare.CompareElements)
params.AtomTyper = atomCompare # int or a callable
params.AtomCompareParameters.MatchIsotope = atomCompare == rdFMCS.AtomCompare.CompareIsotopes
params.AtomCompareParameters.CompleteRingsOnly = mode.get('completeRingsOnly', False)
params.AtomCompareParameters.MatchChiralTag = mode.get('matchChiralTag', False)
params.AtomCompareParameters.MatchValences = mode.get('matchValences', False)
params.AtomCompareParameters.RingMatchesRingOnly = mode.get('ringMatchesRingOnly', False)
# bond parameters
bondCompare: rdFMCS.BondCompare = mode.get('bondCompare', rdFMCS.BondCompare.CompareOrder)
ringCompare: rdFMCS.RingCompare = mode.get('ringCompare', rdFMCS.RingCompare.IgnoreRingFusion)
params.BondTyper = bondCompare
params.BondCompareParameters.CompleteRingsOnly = mode.get('completeRingsOnly', False)
params.BondCompareParameters.MatchFusedRings = ringCompare != rdFMCS.RingCompare.IgnoreRingFusion
params.BondCompareParameters.MatchFusedRingsStrict = ringCompare == rdFMCS.RingCompare.StrictRingFusion
params.BondCompareParameters.RingMatchesRingOnly = mode.get('ringMatchesRingOnly', False)
params.Threshold = mode.get('threshold', 1.0)
params.MaximizeBonds = mode.get('maximizeBonds', True)
params.Timeout = mode.get('timeout', 3600)
params.Verbose = mode.get('verbose', False)
params.InitialSeed = mode.get('seedSmarts', '')
# parameters with no equivalence (i.e. made up)
params.BondCompareParameters.MatchStereo = mode.get('matchStereo', False)
params.AtomCompareParameters.MatchFormalCharge = mode.get('matchFormalCharge', False)
params.AtomCompareParameters.MaxDistance = mode.get('maxDistance', -1)
# params.ProgressCallback Deprecated
return params
Comparators
This meandering around legacy code-burden is to get to the new comparator objects. The atomTyper attribute accepts the aforementioned enums, but also can accept a callable… such as a subclass of rdFMCS.MCSAtomCompare or rdFMCS.MCSBondCompare. These are callable class instances that when called with the parameter obejct, the two molecules and the two atoms or bonds that are to be compare, they return a boolean verdict. The largest number of contiguous truths wins!
Obviously, there’s a catch. RDKit is PyBoost11–wrapped C++ code —as is evident by its flagrant disregard for PEP8 guidelines turned dogmas. In the case of subclassing a wrapped C++-class one cannot do super calls to unwrapped methods of the base class as the Python signatures will not match the C++ ones.
As a result the method __call__ needs to be written without parent calls:
class VanillaCompareAtoms(rdFMCS.MCSAtomCompare):
"""
Give than the following does not work, one has to do it in full:
.. code-block:: python
:caption: This will raise an error:
super().__call__(parameters, mol1, atom_idx1, mol2, atom_idx2)
This class replicates the vanilla functionality
"""
def __init__(self, comparison: rdFMCS.AtomCompare = rdFMCS.AtomCompare.CompareAnyHeavyAtom):
"""
Whereas the atomCompare is an enum, this is a callable class.
But in parameters there is no compareElement booleans etc. only Isotope...
In https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/FMCS/Wrap/testFMCS.py
it is clear one needs to make one's own.
"""
super().__init__() # noqa the p_obect (python object aka ctypes.py_object) is not used
self.comparison = comparison
def __call__(self, # noqa signature matches... it is just Boost being Boost
parameters: rdFMCS.MCSAtomCompareParameters,
mol1: Chem.Mol,
atom_idx1: int,
mol2: Chem.Mol,
atom_idx2: int) -> bool:
a1: Chem.Atom = mol1.GetAtomWithIdx(atom_idx1)
a2: Chem.Atom = mol2.GetAtomWithIdx(atom_idx2)
# ------- isotope ------------------------
if parameters.MatchIsotope and a1.GetIsotope() != a2.GetIsotope(): # noqa
return False
elif self.comparison == rdFMCS.AtomCompare.CompareIsotopes and a1.GetIsotope() != a2.GetIsotope(): # noqa
return False
elif self.comparison == rdFMCS.AtomCompare.CompareElements and a1.GetAtomicNum() != a2.GetAtomicNum(): # noqa
return False
elif self.comparison == rdFMCS.AtomCompare.CompareAnyHeavyAtom \
and (a1.GetAtomicNum() == 1 or a2.GetAtomicNum() == 1): # noqa
return False
elif self.comparison == rdFMCS.AtomCompare.CompareAny:
pass
# ------- valence ------------------------
if parameters.MatchValences and a1.GetTotalValence() != a2.GetTotalValence(): # noqa
return False
# ------- chiral ------------------------
if parameters.MatchChiralTag and not self.CheckAtomChirality(parameters, mol1, atom_idx1, mol2, atom_idx2):
return False
# ------- formal ------------------------
if parameters.MatchFormalCharge and not self.CheckAtomCharge(parameters, mol1, atom_idx1, mol2, atom_idx2):
return False
# ------- ring ------------------------
if parameters.RingMatchesRingOnly and not self.CheckAtomRingMatch(parameters, mol1, atom_idx1, mol2, atom_idx2):
return False
# ------- complete ------------------------
if parameters.CompleteRingsOnly:
# don't know its intended to be used
pass
# ------- distance ------------------------
if parameters.MaxDistance: # integer...
# todo fill!
pass
return True
Now, we can subclass it and change the call as we please.
In my case, I wrote a class that accepts a mapping that has to be obeyed. For that I still however, need to add a method to filter out the matching substructures, but that is an easy feat as there is already a method that says when two atoms match, i.e. the __call__ method of the comparator!
Edit
In recent versions, the comparisons are called by rdFMCS.FindMCS not only between molA to molB atoms, but also molB to molB atoms, presumably to address isomorphisms.
Footnote
The following snippet to gets me a molecule from a name using Cactus:
import requests
from urllib.parse import quote
from rdkit import Chem
from rdkit.Chem import AllChem
def name2mol(name) -> Chem.Mol:
"""
Using Cactus get a molecule from a name.
"""
response: requests.Response = requests.get(
f'https://cactus.nci.nih.gov/chemical/structure/{quote(name)}/smiles')
response.raise_for_status()
smiles: str = response.text
mol: Chem.Mol = Chem.MolFromSmiles(smiles)
mol.SetProp('_Name', name)
AllChem.EmbedMolecule(mol)
return mol
I always want to see the molecules on a grid but in 2D, so:
from rdkit.Chem import Draw
from typing import List
from IPython.display import Image
def flatgrid(mols, *args, **kwargs) -> Image:
copies: List[Chem.Mol] = [Chem.Mol(m) for m in mols]
*map(AllChem.Compute2DCoords, copies), # noqa, it's in place
if 'legends' not in kwargs:
kwargs['legends'] = [m.GetProp('_Name') if m.HasProp('_Name') else '-' for m in mols]
return Draw.MolsToGridImage(copies, *args, **kwargs)
Obviously, in the main text I have a figure, so the code was in reality:
from IPython.display import Image
img: Image = flatgrid([capsaicin, jasminaldehyde, Chem.MolFromSmarts(result.smartsString)])
with open('molecules.png', 'wb') as fh:
fh.write(image.data)
