Finding and testing a reaction SMARTS pattern for any reaction

Have you ever needed to find a reaction SMARTS pattern for a certain reaction but don’t have it already written out? Do you have a reaction SMARTS pattern but need to test it on a set of reactants and products to make sure it transforms them correctly and doesn’t allow for odd reactants to work? I recently did and I spent some time developing functions that can:

  1. Generate a reaction SMARTS for a reaction given two reactants, a product, and a reaction name.
  2. Check the reaction SMARTS on a list of reactants and products that have the same reaction name.

What is SMARTS?

SMARTS stands for SMiles ARbitrary Target Specification, and is a language used for pattern searching in molecules. It was invented by Daylight and they have much more information on their site here. Reaction SMARTS are SMARTS used to define reactions, they will usually have this basic structure:

[reactant_1].[reactant_N]>>[product_1].[product_N]

For the rest of the functions you’ll need to install and import the neccessary libraries.

!pip install reaction-utils rxnmapper rdkit

from rxnutils.chem.reaction import ChemicalReaction
from rxnmapper import RXNMapper
from rdkit import Chem
from rdkit.Chem import AllChem, DataStructs, Draw

Example 1: Generate a reaction SMARTS given the reactants and product

Let’s say we need a reaction smarts for the bimolecular reaction, reductive amination. I have these two reactants and product.

reactant1 = Chem.MolFromSmiles('O=Cc1ccc2c(c1)OCO2')
reactant2 = Chem.MolFromSmiles('Cc1nnc2c(N)nccn12')
product = Chem.MolFromSmiles('Cc1nnc2c(NCc3ccc4c(c3)OCO4)nccn12')
captions = ['Reactant 1', 'Reactant 2', 'Product']
Draw.MolsToGridImage([reactant1, reactant2, product], molsPerRow=3, subImgSize=(300,300), legends=captions)

Now lets define the make_rxn_template function, which takes the two reactant mol objects, product mol object, and radius for the reaction template search as inputs. The radius argument specifies the radius of the template. It uses RXNmapper which labels exact atoms in the reactants that are kept in the product. Steph (another lovely OPIG member) wrote a helpful post on this tool, check it out!

This make_rxn_template function outputs the SMARTS reaction template.

def make_rxn_template(reactants: list, product: Chem.Mol, radius=1):
    """
    Given mol objects of two reactants and product, make a SMARTS reaction template.
    """
    reactants = [Chem.MolToSmiles(reactant) for reactant in reactants]
    product = Chem.MolToSmiles(product)
    # Make string of reaction
    reaction = f"{reactants[0]}.{reactants[1]}>>{product}"
    rxnmapper = RXNMapper()
    # Get atom mapped forward template
    res = rxnmapper.get_attention_guided_atom_maps([reaction])[0]
    # Convert to ChemicalReaction object
    rxn = ChemicalReaction(res['mapped_rxn'])
    # Get reaction templates with radius
    mapping = rxn.generate_reaction_template(radius=radius, expand_ring=False, expand_hetero=False)
    forward_template = mapping[0].smarts
    return forward_template
# create reaction SMARTS
first_reductive_amination_mapping = make_rxn_template(reactants, product, radius=1)
print(first_reductive_amination_mapping)
'O=[CH;D2;+0:1]-[c:2].[#7;a:3]:[c:4](-[NH2;D1;+0:5]):[c:6](:[#7;a:7]):[#7;a:8]>>[#7;a:3]:[c:4](-[NH;D2;+0:5]-[CH2;D2;+0:1]-[c:2]):[c:6](:[#7;a:7]):[#7;a:8]'

Great, that is our reaction SMARTS for reductive amination! To check if this reaction SMARTS will work for other reductive amination examples lets test it in Example 2.

Example 2: Check a reaction SMARTS on many reaction examples

Now we need to check if this SMARTS works for other examples of this same reaction in our dataset. I have a dataframe of 10 other reductive amination examples that looks like this:

product reactants name
Cc1nnc2c(NCc3ccc4c(c3)OCO4)nccn12 (O=Cc1ccc2c(c1)OCO2, Cc1nnc2c(N)nccn12) Reductive_amination
O[C@@H](CNCc1nnc2ccccn12)c1ccc2c(c1)OCO2 (O=Cc1nnc2ccccn12, NCC(O)c1ccc2c(c1)OCO2) Reductive_amination
Cc1cc(C(=O)N[C@H](C)[C@H](C)NCc2ccccn2)no1 (O=Cc1ccccn1, Cc1cc(C(=O)NC(C)C(C)N)no1) Reductive_amination
CCOC(=O)C[C@@H](c1cc(C)on1)N1CCCCC1 (CCOC(=O)CC(=O)c1cc(C)on1, C1CCNCC1) Reductive_amination
C[C@@H](F)CCNC1CCN(C(=O)CCN2CCCCC2)CC1 (O=C1CCN(C(=O)CCN2CCCCC2)CC1, CC(F)CCN) Reductive_amination
c1cncc([C@@H]2CCC[C@@H](NCc3ccc4c(c3)OCO4)C2)c1 (O=C1CCCC(c2cccnc2)C1, NCc1ccc2c(c1)OCO2) Reductive_amination
COc1cccc(CN[C@H](Cc2ccc3c(c2)OCO3)C(=O)O)n1 (NC(Cc1ccc2c(c1)OCO2)C(=O)O, COc1cccc(C=O)n1) Reductive_amination
Cc1cc(C(=O)NCC[C@@H](C)NCC(=O)N2CCCCC2)no1 (NCC(=O)N1CCCCC1, CC(=O)CCNC(=O)c1cc(C)on1) Reductive_amination
FC(F)(F)C(F)(F)[C@@H]1CCCN(CCCN2CCCCC2)C1 (O=CCCN1CCCCC1, FC(F)(F)C(F)(F)C1CCCNC1) Reductive_amination
CC(C)(C)c1cncc(CN[C@@H](CN2CCCCC2)C(=O)O)c1 (NC(CN1CCCCC1)C(=O)O, CC(C)(C)c1cncc(C=O)c1) Reductive_amination

Here I define the following functions to test my generated reductive amination template on my examples. Basically check_smarts is the entry function that takes the dataframe of the product and reactants as SMILES as input. For each row in the dataframe check_rxn_template_works applies the given reaction template to reactants and product using check_rxn. Both combinations of reactants are checked since the order is specific for each reaction template.

The check_rxn function applies the reaction SMARTS to the reactants and if a product is found, that product is checked if it matches the original product by tanimoto similarity = 1. There are three possible outputs from check_rxn:

  1. The reaction SMARTS is applicable and produces the expected product with Tanimoto similarity of 1.0.
  2. The reaction SMARTS is applicable but does not produce any of the expected products with Tanimoto similarity of 1.0.
  3. The reaction SMARTS is not applicable to the provided reactants, therefore there is no product.
"""
This code block contains functions for reaction enumeration and reaction mapping. NOTE: This is intended to be run in a jupyter notebook, hence the use of the display() functionality.
"""
def remove_chirality(smiles: str):
    return smiles.replace('@@', '').replace('@', '').replace('/', '').replace('\\', '')

def check_rxn(rxn: AllChem.ChemicalReaction, reactants: list, products: list, verbose=False):
    """
    Check if reaction template works for a combination of reactants
    using Tanimoto similarity to compare against a list of products.
    
    INPUT:
        rxn: AllChem.ReactionSmarts object
        reactants: list of mol objects of reactants
        products: list of smiles of possible products
    """
    predicted_products_flat = []
    if rxn.RunReactants(reactants):
        predicted_products_list = rxn.RunReactants(reactants)
        predicted_products_flat = [remove_chirality(Chem.MolToSmiles(mol)) for product_set in predicted_products_list for mol in product_set]
        # Check Tanimoto similarity with each product in the provided list
        for predicted_smiles in predicted_products_flat:
            predicted_product_mol = Chem.MolFromSmiles(predicted_smiles)
            predicted_product_fp = AllChem.GetMorganFingerprint(predicted_product_mol, 2)
            for product_smiles in products:
                product_mol = Chem.MolFromSmiles(product_smiles)
                product_fp = AllChem.GetMorganFingerprint(product_mol, 2)
                similarity = DataStructs.TanimotoSimilarity(product_fp, predicted_product_fp)
                if similarity == 1.0:
                    if verbose:
                        print("The reaction SMARTS is applicable and produces the expected product with Tanimoto similarity of 1.0.")
                    return 1, predicted_products_flat
        if verbose:
            print("The reaction SMARTS is applicable but does not produce any of the expected products with Tanimoto similarity of 1.0.")
        return -1, predicted_products_flat
    else:
        if verbose:
            print("The reaction SMARTS is not applicable to the provided reactants, therefore there is no product.")
        return 0, predicted_products_flat

def check_rxn_template_works(rxn_temp: str, reactants: list, products: list, verbose=False):
    """
    Check if reaction template works. LOOKS AT BOTH COMBINATIONS OF REACTANTS.
    """
    rxn = AllChem.ReactionFromSmarts(rxn_temp)
    reactants = [Chem.MolFromSmiles(reactant) for reactant in reactants]
    products = [remove_chirality(product) for product in products]
    AllChem.SanitizeRxn(rxn)
    outcome = 0
    # Check if the number of reactants is applicable
    if len(reactants) != len(rxn.GetReactants()):
        if verbose: print("The reaction SMARTS is not applicable to the provided reactants.")
        return 0, []
    # Check if the reaction is applicable to both combinations of reactants
    if verbose: print('Outcome of first combo:')
    outcome, predicted_products_flat = check_rxn(reactants=reactants, rxn=rxn, products=products, verbose=verbose)
    if len(reactants) == 1:
        return outcome, predicted_products_flat
    reactants2 = [reactants[1], reactants[0]]
    if verbose: print('Outcome of second combo:')
    outcome2, predicted_products_flat2 = check_rxn(reactants=reactants2, rxn=rxn, products=products, verbose=verbose)
    predicted_products_flat2
    if outcome == 1 or outcome2 == 1:
        outcome = 1
        predicted_products_flat.extend(predicted_products_flat2)
        return outcome, predicted_products_flat
    if outcome == -1 or outcome2 == -1:
        outcome = -1
        predicted_products_flat.extend(predicted_products_flat2)
        return outcome, predicted_products_flat
    else:
        outcome = 0
        predicted_products_flat.extend(predicted_products_flat2)
        return outcome, predicted_products_flat

def check_smarts(df: pd.DataFrame, rxn_smarts: str):
    success = 0
    failure = 0
    total = len(df)
    for index, row in df.iterrows():
        reactants = row['reactants']
        product = [row['product']]
        print('ROW:', index)
        outcome = check_rxn_template_works(rxn_smarts, reactants, product, verbose=True)
        if outcome[0] != 1:
            failure += 1
            if outcome[0] == -1:
                print('FAILED: THE PREDICTED PRODUCT(S) IS/ARE WRONG')
                for i in outcome[1]:
                    display(Chem.MolFromSmiles(i))
            if outcome[0] == 0:
                print('FAILED: THE REACTION SMARTS IS NOT APPLICABLE TO THE PROVIDED REACTANTS')
            print('REACTANT(S) BELOW:')
            display(Chem.MolFromSmiles(reactants[0]))
            display(Chem.MolFromSmiles(reactants[1]))
            print('PROPER PRODUCT(S) BELOW')
            products = [Chem.MolFromSmiles(i) for i in product]
            for i in products:
                display(i)
        else:
            success += 1
            print('SUCCESS: THE PREDICTED PRODUCT(S) IS/ARE RIGHT')
            for i in outcome[1]:
                display(Chem.MolFromSmiles(i))
    print(f'SUCCESS: {success} / {total} ({(success/len(df)*100)}%)')
    print(f'FAILURE: {failure} / {total} ({failure/len(df)*100}%)')
check_smarts(df, first_reductive_amination_mapping)

Etc…

These results mean the reaction template only worked for one example (the one it was originally generated for) and did not work for any other examples 😥. While not a great result, this is most likely due to its high specificity.

After some manual editing using the helpful SMARTS visualization tool SMARTS PLUS, made by the University of Hamburg, we have a more general reductive amination template.

first_reductive_amination = 'O=[CH;D2;+0:1]-[c:2].[#7;a:3]:[c:4](-[NH2;D1;+0:5]):[c:6](:[#7;a:7]):[#7;a:8]>>[#7;a:3]:[c:4](-[NH;D2;+0:5]-[CH2;D2;+0:1]-[c:2]):[c:6](:[#7;a:7]):[#7;a:8]'

general_reductive_amination_mapping = '[#6:2](=[#8])(-[#6:1]).[#7;H2,H1:3]>>[#6:2](-[#6:1])-[#7:3]'
check_smarts(df, general_reductive_amination_mapping)

Success!!

Acknowledgements

Please check out these tools that were indespensible for this!

Author