Atom mapping with RXNMapper

When recently looking at some reaction data, I was confronted with the problem of atom-to-atom mapping (AAM) and what tools are available to tackle it. AAM refers to the process of mapping individual atoms in reactants to their corresponding atoms in the products, which is important for defining a reaction template and identifying which bonds are being formed and broken. This has many downstream uses for computational chemists, such as for reaction searching and forward and retrosynthesis planning1. The problem is that many reaction databases do not contain these mappings, and annotation by expert chemists is impractical for databases containing thousands (or more) data points.

This 2021 paper1 performs a benchmark of various AAM tools using a curated ‘Golden’ dataset of 1,851 reactions, including a subset of manually mapped reactions from the USPTO database. Among the different algorithms, IBM’s RXNMapper2, which is based on a transformer model and takes in SMILES as input, was found to perform the best, correctly mapping 1,550 (83.74%) of reactions (although the authors identify some problems with resetting of standardization performed and erroneously mapping certain USPTO reactions).

RXNMapper is conda and pip installable (the GitHub repository can be found here). I tried it out and found it quick and easy to use. Using a reaction randomly selected from the USPTO data:

from rxnmapper import RXNMapper
rxnmapper = RXNMapper()
# provide SMILES of reactants and products separated by >>
example_reaction = 'CC(C)(C)c1ccc([N+](=O)[O-])cc1Br.CN1CCNCC1>>CN1CCN(c2cc(C(C)(C)C)c(Br)cc2[N+](=O)[O-])CC1'
res = rxnmapper.get_attention_guided_atom_maps([example_reaction])

This then gives us the mapped reaction SMARTS pattern (plus confidence score).

[{'confidence': 0.46977652256203417,
  'mapped_rxn': '[cH:6]1[cH:7][c:8]([C:9]([CH3:10])([CH3:11])[CH3:12])[c:13]([Br:14])[cH:15][c:16]1[N+:17](=[O:18])[O-:19].[CH3:1][N:2]1[CH2:3][CH2:4][NH:5][CH2:20][CH2:21]1>>[CH3:1][N:2]1[CH2:3][CH2:4][N:5]([c:6]2[cH:7][c:8]([C:9]([CH3:10])([CH3:11])[CH3:12])[c:13]([Br:14])[cH:15][c:16]2[N+:17](=[O:18])[O-:19])[CH2:20][CH2:21]1'}]

RDKit makes it easy to visualize the mapping:

rxn_smarts = res[0]['mapped_rxn']
rxn = AllChem.ReactionFromSmarts(rxn_smarts)

Note: to get the map number of a particular atom in RDKit, we can use GetAtomMapNum() like so:

# to get the map numbers of atoms in one of the reactants
reactant1 = rxn.GetReactants()
for atom in reactant1.GetAtoms():
    print(atom.GetAtomMapNum())

There are lots of different algorithms available for atom mapping — hopefully this provides one useful example!

References

  • (1) Lin, A. et al. Atom-to-atom mapping: a benchmarking study of popular mapping algorithms and consensus strategies. Molecular Informatics https://doi.org/10.1002/minf.202100138 (2021).
  • (2) Schwaller, P. et al. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. Science Advances https://doi.org/10.1126/sciadv.abe4166 (2022).

Author