Molecular conformation generation with a DL-based force field

Deep learning (DL) methods in structural modelling are outcompeting force fields because they overcome the two main limitations to force fields methods – the prohibitively large search space for large systems and the limited accuracy of the description of the physics [4].

However, the two methods are also compatible. DL methods are helping to close the gap between the applications of force fields and ab initio methods [3]. The advantage of DL-based force fields is that the functional form does not have to be specified explicitly and much more accurate. Say goodbye to the 12-6 potential function.

In principle DL-based force fields can be applied anywhere where regular force fields have been applied, for example conformation generation [2]. The flip-side of DL-based methods commonly is poor generalization but it seems that force fields, when properly trained, generalize well. ANI trained on molecules with up to 8 heavy atoms is able to generalize to molecules with up to 54 atoms [1]. Excitingly for my research, ANI-2 [2] can replace UFF or MMFF as the energy minimization step for conformation generation in RDKit [5].

So let’s use Auto3D [2] to generated low energy conformations for the four molecules caffeine, Ibuprofen, an experimental hybrid peptide, and Imatinib:

CN1C=NC2=C1C(=O)N(C(=O)N2C)C CFF
CC(C)Cc1ccc(cc1)C(C)C(O)=O IBP
Cc1ccccc1CNC(=O)[C@@H]2C(SCN2C(=O)[C@H]([C@H](Cc3ccccc3)NC(=O)c4cccc(c4C)O)O)(C)C JE2
Cc1ccc(cc1Nc2nccc(n2)c3cccnc3)NC(=O)c4ccc(cc4)CN5CCN(CC5)C STI

The first attempt was sobering. UFF does work on my laptop in seconds but without a GPU, auto3d reports it would take hours to finish – I stopped the program after 28 minutes.

~/Projects/auto3d > python auto3D.py "example/files/mymols.smi" --use_gpu=False --k=1

         _              _             _____   ____  
        / \     _   _  | |_    ___   |___ /  |  _ \ 
       / _ \   | | | | | __|  / _ \    |_ \  | | | |
      / ___ \  | |_| | | |_  | (_) |  ___) | | |_| |
     /_/   \_\  \__,_|  \__|  \___/  |____/  |____/  2.0
        // Automatic generation of the low-energy 3D structures                                      
    
Checking input file...
        There are 4 SMILES in the input file example/files/mymols.smi. 
        All SMILES and IDs are valid.
Suggestions for choosing isomer_engine and optimizing_engine: 
        Isomer engine options: RDKit and Omega.
        Optimizing engine options: ANI2x, ANI2xt and AIMNET.
The available memory is 16 GB.
The task will be divided into 1 jobs.
Job1, number of inputs: 4


Isomer generation for job1
Enumerating cis/tran isomers for unspecified double bonds...
Enumerating R/S isomers for unspecified atomic centers...
Removing enantiomers...
Enumerating conformers/rotamers, removing duplicates...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.04it/s]


Optimizing on job1
Preparing for parallel optimizing... (Max optimization steps: 5000)
Total 3D conformers: 81
 10%|█████████▍                                                                                    | 499/5000 [22:21<2:44:24,  2.19s/it]Total 3D structures: 81  Converged: 5   Dropped(Oscillating): 0    Active: 76
 13%|████████████▏                                                                                 | 650/5000 [28:13<2:48:11,  2.32s/it]

Next, I tried on a beefy GPU and the runtime was just about 1 minute.

bash-5.2$ python auto3D.py "example/files/mymols.smi" --k=1

         _              _             _____   ____  
        / \     _   _  | |_    ___   |___ /  |  _ \ 
       / _ \   | | | | | __|  / _ \    |_ \  | | | |
      / ___ \  | |_| | | |_  | (_) |  ___) | | |_| |
     /_/   \_\  \__,_|  \__|  \___/  |____/  |____/  2.0
        // Automatic generation of the low-energy 3D structures                                      
    
Checking input file...
        There are 4 SMILES in the input file example/files/mymols.smi. 
        All SMILES and IDs are valid.
Suggestions for choosing isomer_engine and optimizing_engine: 
        Isomer engine options: RDKit and Omega.
        Optimizing engine options: ANI2x, ANI2xt and AIMNET.
The available memory is 80 GB.
The task will be divided into 1 jobs.
Job1, number of inputs: 4


Isomer generation for job1
Enumerating cis/tran isomers for unspecified double bonds...
Enumerating R/S isomers for unspecified atomic centers...
Removing enantiomers...
Enumerating conformers/rotamers, removing duplicates...
100%|██████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00,  1.21it/s]


Optimizing on job1
Preparing for parallel optimizing... (Max optimization steps: 5000)
Total 3D conformers: 84
 10%|██████▊                                                              | 496/5000 [00:11<01:35, 46.92it/s]Total 3D structures: 84  Converged: 3   Dropped(Oscillating): 0    Active: 81
 20%|█████████████▊                                                       | 999/5000 [00:21<01:17, 51.58it/s]Total 3D structures: 84  Converged: 18   Dropped(Oscillating): 0    Active: 66
 30%|████████████████████▎                                               | 1498/5000 [00:29<00:50, 68.81it/s]Total 3D structures: 84  Converged: 38   Dropped(Oscillating): 2    Active: 44
 40%|███████████████████████████▏                                        | 1999/5000 [00:36<00:36, 82.38it/s]Total 3D structures: 84  Converged: 49   Dropped(Oscillating): 3    Active: 32
 50%|█████████████████████████████████▉                                  | 2496/5000 [00:42<00:27, 90.32it/s]Total 3D structures: 84  Converged: 57   Dropped(Oscillating): 5    Active: 22
 60%|████████████████████████████████████████▋                           | 2996/5000 [00:47<00:21, 93.65it/s]Total 3D structures: 84  Converged: 68   Dropped(Oscillating): 8    Active: 8
 70%|███████████████████████████████████████████████▌                    | 3497/5000 [00:52<00:15, 94.99it/s]Total 3D structures: 84  Converged: 70   Dropped(Oscillating): 9    Active: 5
 80%|██████████████████████████████████████████████████████▎             | 3990/5000 [00:57<00:10, 92.39it/s]Total 3D structures: 84  Converged: 72   Dropped(Oscillating): 10    Active: 2
 88%|████████████████████████████████████████████████████████████        | 4420/5000 [01:02<00:08, 70.75it/s]
Optimization finished at step 4421:   Total 3D structures: 84  Converged: 73   Dropped(Oscillating): 11    Active: 0
Beggining to select structures that satisfy the requirements...
Energy unit: Hartree if implicit.
Program running time: 1 minutes

And there we have it: UFF and Auto3D optimized molecule conformations.

Figure: UFF minimized conformations in blue, Auto3D minimized conformations in yellow

We could evaluate the energies of the generated conformations next but staring at the caffeine molecule has made me realize it is coffee break time.

References

[1] J. S. Smith, O. Isayev, and A. E. Roitberg, “ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost,” Chem Sci, vol. 8, no. 4, pp. 3192–3203, 2017, doi: 10.1039/C6SC05720A.

[2] Z. Liu, T. Zubatiuk, A. Roitberg, and O. Isayev, “Auto3D: automatic generation of the low-energy 3D structures with ANI neural network potentials,” J Chem Inf Model, vol. 62, no. 22, pp. 5373–5382, Nov. 2022, doi: 10.1021/acs.jcim.2c00817.

[3] O. T. Unke et al., “Machine learning force fields,” Chem Rev, vol. 121, no. 16, pp. 10142–10186, Aug. 2021, doi: 10.1021/acs.chemrev.0c01111.

[4] M. Baek and D. Baker, “Deep learning and protein structure modeling,” Nat Methods, vol. 19, no. 1, pp. 13–14, Jan. 2022, doi: 10.1038/s41592-021-01360-8.

[5] G. Landrum et al., “RDKit Q3 2022 Release.” Zenodo, Feb. 23, 2023. doi: 10.5281/ZENODO.7671152.

Author