ChatGPT can now use RDKit!

All chemistry LLM enthusiasts were treated to a pleasant surprise on Friday when Greg Brockman tweeted that ChatGPT now has access to RDKit. I’ve spent a few hours playing with the updated models and I have summarized some of my findings in this blog.

Assessing the Chemical Intelligence of Large Language Models

I will take this opportunity to advertise that our latest preprint “Assessing the Chemical Intelligence of Large Language Models” is now available to read on arXiv https://arxiv.org/abs/2505.07735. We evaluated the o3-mini reasoning model and demonstrated it has the ability to directly reason about molecules without access to external tools such as RDKit.

This blog focuses briefly on agentic models, which are the same models as previous except with additional access to external tools such as web search or a code interpreter. The latest update integrates RDKit into ChatGPT’s Python environment, enabling the models to write and execute cheminformatics code to address chemistry questions.

First impressions of ChatGPT with RDKit access

My initial impression is that these models don’t suddenly appear “smarter” now that they have access to RDKit. For reasoning tasks like NMR elucidation, it seems the models’ performance is roughly the same as it was previous (however more work is needed to verify this).

Draw structures from common name

The example in Greg’s tweet shows ChatGPT drawing a molecular structure from the common name. Naturally, I tried replicating this using my favorite drug and was a bit disappointed when both GPT-4o and o4-mini-high entirely hallucinated the structure. It seems these models tried to recall the SMILES string of sildenafil from their training alone, resulting in the hallucination.

The o3 model was successful at this task. It appears the model automatically used web search to find the correct SMILES and used this in the answer. This is consistent with the expectations of an agentic reasoning model, although the model did reason for nearly 2 minutes which makes this task very expensive to run (my estimate is around £0.60).

Prompt: Draw the structure of sildenafil using RDKit

GPT-4oo4-mini-higho3

However, one does not need to use the expensive reasoning models to get the correct structure. Since GPT-4o has web search enabled, I prompted the model to first “search for the SMILES string of sildenafil”, then “use RDKit to visualize this molecule”. The model gave both responses far quicker than the o3 model, and the structure was correct.

Draw a reaction scheme

In all honesty, I don’t know how you draw a reaction scheme using RDKit; but I do know how you synthesize aspirin. I asked o3 to “Use RDKit to draw the Aspirin reaction scheme” and was given a lovely figure:

Of course, this question can be solved with a simple look up. I followed with a slightly more complicated prompt testing whether the model understands the molecules it is talking about:

Prompt: “In this scheme I want you to: (1) Para to the hydroxyl group add a chlorine (2) Para to the acid add a bromine (3) Rather than create the methyl ester, I want you to make the ethyl ester.

There is an obvious error in the placement of the chlorine – placing it meta to the hydroxyl rather than para. Everything else in the answer was correct (or at least what I was expecting). Any astute chemist will have hopefully noticed the error in my prompt, with the acetyl group being incorrectly referred to as a methyl ester. Pleasingly, o3 was able to recognize this error and respond appropriately in the reasoning excerpt:

Reasoning excerpt: “[The user] may be confusing “methyl ester” with acetylation; the original scheme adds an acetyl group (CH3CO-) to the phenolic OH, not a methyl ester. The user likely wants to replace this with an ethyl (propionyl) group instead of the acetyl group.

The model recognized my intention, and correctly changed the (1) reagent (2) product, and (3) by-product to all be consistent with the modification intended by my prompt.

NMR elucidation

I asked o3 some NMR elucidation questions. The model didn’t seem substantially more capable that what I have seen previously. Here is an example question from ChemIQ where o3 gets the answer correct:

Prompt
Use RDKit to draw the structure of this molecule:
Formula: C6H14N2O
1H NMR: δ 1.40-1.64 (4H, 1.47 (tt, J = 7.4, 7.3 Hz), 1.58 (quint, J = 7.4 Hz)), 1.86 (3H, s), 2.63 (2H, t, J = 7.3 Hz), 3.17 (2H, t, J = 7.4 Hz).
13C NMR: δ 23.1 (1C, s), 26.9 (1C, s), 27.9 (1C, s), 40.0 (1C, s), 41.0 (1C, s), 179.7 (1C,s).

Response

How the spectral data map onto the structure

SignalAssignmentReasoning
¹H NMR 1.86 ppm (s, 3 H)CH₃–C=OAcetyl methyl next to carbonyl appears as an isolated singlet.
3.17 ppm (t, 2 H, J ≈ 7.4)–NH–CH₂–C=Oα-CH₂ to the amide N is deshielded by both N and the carbonyl.
2.63 ppm (t, 2 H, J ≈ 7.3)CH₂–NH₂α-CH₂ to the terminal amine is slightly upfield of the amide α-CH₂.
1.47 ppm (tt, 2 H) & 1.58 ppm (quint, 2 H)The two internal –CH₂– groupsCoupling pattern (triplet-of-triplets & quintet) matches a –CH₂–CH₂–CH₂–CH₂– spine flanked by two triplet termini.
¹³C NMR δ (ppm)Assignment
179.7C=O of the acetamide
41.0 / 40.0CH₂ α to N (one beside amide N, one beside amine N)
27.9 / 26.9The two internal methylenes
23.1CH₃ of the acetyl group

Choose something crazy to do with rdkit then do it

Having no sense of creativity, I asked o3 to “choose something crazy to do with rdkit then do it”. The model spent nearly 13 minutes working on the response and gave me a pretty impressive cheminformatics analysis. The model chose a set of ten substituents and enumerated all meta substituted benzene molecules. The model generated 10 × 9 = 90 molecules, although it missed that half of these are duplicated.

The model then calculated QED and LogP (what it described as two quick-and-dirty metrics) and plotted this data on a scatter plot.

o3’s self assessment was that this analysis is “chemically bonkers”. I don’t necessarily agree with that claim, but nonetheless I do still find this impressive. From a single open ended prompt the o3 model generated a “research question”, wrote the cheminformatics scripts, and performed a full analysis. The comparison of LogP and QED is not particularly valuable, however being able to write and execute cheminformatics code like this is a promising sign that these models might soon be able able to do more advanced tasks.


Do something totally different, that’s also amazing, using rdkit.

Continuing with my lack of creativity, I asked o3 to “Do something totally different, that’s also amazing, using rdkit”. The model responded with:

Per-atom hydrophobicity heat-maps for nine famous drugs
* Red circles ➜ atoms that increase logP (make the molecule greasier)
* Blue circles ➜ atoms that lower logP (more polar / H-bond friendly)

These pictures come straight from RDKit’s Crippen atom-type parameters: for each atom I summed its individual contribution to the compound’s overall logP, normalised everything, and painted the atom using a red-white-blue gradient. […]

Again, the model (1) generated an exploratory question (2) collected some molecules (3) wrote the cheminformatics analysis scripts, and (4) visualized the results. The final figure is is a nice visualization and would have easily taken me a few hours to generate in the days before LLMs.

That said, there are some major inaccuracies in the molecular structures, as these do not match the corresponding labels. Just as seen earlier with sildenafil, it’s possible o3 tried to recall the SMILES strings of these molecules from memory leading to hallucinations. Nonetheless, the ability for ChatGPT to generate this entire analysis and figure from a single open ended prompt is pretty impressive.


Do something totally different again. Make it epic!

Trying to push o3 even further, I asked: “Do something totally different again. Make it epic!“… I believe I just got trolled by ChatGPT:

Behold an EPIC molecular mosaic!
Every tile that forms the letters “EPIC” is a different meta-disubstituted benzene, spun randomly on the canvas. Ninety tiny molecules, one giant shout-out to chemistry. This blends art with cheminformatics: SMILES strings become brush-strokes, RDKit the paintbox.


Conclusion

As a chemist, the updated ChatGPT is certainly fun to play with. It doesn’t seem like these models are inherently smarter now that they have access to RDKit, but they will likely be more helpful to a typical chemistry user. Visualizing molecules in the chat interface is certainly more convenient that dealing with SMILES strings. This blog hasn’t highlighted any specific use cases where this new functionality is particularly useful, but I’m sure they will exist.

If you decide to play with these models, make sure you use the reasoning models like o3 and o4-mini, as these are far more capable than the default GPT-4o.

I’d love to hear from fellow chemists about their experiences with ChatGPT. Please share your interesting use cases, whether highlighting impressive successes or significant mistakes made by the model.

Lastly, if you’ve read this far, I’d like to remind you about our preprint, “Assessing the Chemical Intelligence of Large Language Models” which you might finding interesting: https://arxiv.org/abs/2505.07735

– Nicholas

Author