{"id":12852,"date":"2025-08-08T17:38:11","date_gmt":"2025-08-08T16:38:11","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=12852"},"modified":"2025-08-08T17:50:19","modified_gmt":"2025-08-08T16:50:19","slug":"gpt-5-achieves-state-of-the-art-chemical-intelligence","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2025\/08\/gpt-5-achieves-state-of-the-art-chemical-intelligence\/","title":{"rendered":"GPT-5 achieves state-of-the-art chemical intelligence"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">I have run ChemIQ (our chemical reasoning benchmark) on GPT-5. The model achieves state-of-the-art performance with substantial improvements in the ability to interpret SMILES strings. Read my analysis and initial findings below. Scroll to the end for some cool demos. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_bar.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"500\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_bar.png?resize=625%2C500&#038;ssl=1\" alt=\"\" class=\"wp-image-12856\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_bar.png?resize=1024%2C819&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_bar.png?resize=300%2C240&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_bar.png?resize=768%2C614&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_bar.png?resize=1536%2C1229&amp;ssl=1 1536w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_bar.png?resize=624%2C499&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_bar.png?w=1950&amp;ssl=1 1950w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_bar.png?w=1250&amp;ssl=1 1250w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_bar.png?w=1875&amp;ssl=1 1875w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><em><strong>Figure 1: <\/strong>Success rates for each model on the ChemIQ reasoning benchmark. Horizontal brackets between adjacent bars indicate the result of a two-tailed McNemar\u2019s test comparing paired outcomes for the same questions. Significance levels are shown as: <code>n.s.<\/code> (not significant, p \u2265 0.05), <code>*<\/code> (p &lt; 0.05), <code>**<\/code> (p &lt; 0.01), and <code>***<\/code> (p &lt; 0.001). <\/em><\/p>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\">ChemIQ is a chemical reasoning benchmark<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In May we released our large language model (LLM) chemical reasoning benchmark which assesses whether LLMs understand the structure of molecules. The benchmark includes a range of questions, starting with easy counting tasks, and progressing to very difficult tasks such as structure elucidation from 2D NMR data. In July we released an updated preprint where we benchmarked additional reasoning models and improved our NMR questions. You can read the paper on arxiv here: Assessing the Chemical Intelligence of Large Language Models (<a href=\"https:\/\/arxiv.org\/abs\/2505.07735\">https:\/\/arxiv.org\/abs\/2505.07735<\/a>).<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">GPT-5 reaches SOTA performance on ChemIQ<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">I ran our benchmark within 2 minutes of gaining access to GPT-5. The headline result is that GPT-5 achieves state-of-the-art (SOTA) performance on ChemIQ, scoring 70.2% and exceeding the previous best models by ~14%. It is important to highlight that our benchmark is designed to test the base LLM without tool use &#8211; in each case the LLM has to work through the question step-by-step without the aid of tools or code interpreters.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">GPT-5 is not stupid<\/h1>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_radar-1.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"533\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_radar-1.png?resize=625%2C533&#038;ssl=1\" alt=\"\" class=\"wp-image-12858\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_radar-1.png?resize=1024%2C874&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_radar-1.png?resize=300%2C256&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_radar-1.png?resize=768%2C656&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_radar-1.png?resize=1536%2C1311&amp;ssl=1 1536w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_radar-1.png?resize=2048%2C1749&amp;ssl=1 2048w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_radar-1.png?resize=624%2C533&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_radar-1.png?w=1250&amp;ssl=1 1250w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ChemIQ_radar-1.png?w=1875&amp;ssl=1 1875w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><strong><em>Figure 2: <\/em><\/strong><em>Radar plot showing model performance by question sub category. Additional model results can be found in the ChemIQ preprint (<a href=\"https:\/\/arxiv.org\/pdf\/2505.07735\">https:\/\/arxiv.org\/pdf\/2505.07735<\/a>) <\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When presenting my research, I always use the line &#8220;<em>large language models are extremely smart, but also really stupid<\/em>&#8220;. I say this because LLMs struggle with apparently trivial tasks of counting how many letters are in a word (i.e. carbon counting), however can also do really impressive tasks like NMR elucidation. I need to update my research presentations now though: in our benchmark, GPT-5 was essentially perfect at the carbon counting, ring counting, shortest path, and Free-Wilson analysis tasks. <strong>It appears GPT-5 rarely makes stupid mistakes. <\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Of greatest surprise to me is that GPT-5 achieved 99% on the shortest path tasks. These questions are genuinely difficult, requiring interpretation of the graph structure of molecules from the SMILES string and then doing a path finding algorithm to find the shortest path between two points. Previously o3-mini struggled to answer this question when presented with a randomized SMILES representation (in this question, canonical SMILES is substantially easier to solve). In these results, GPT-5 only made a single mistake in all 108 shortest path questions.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Gemini 2.5 pro is still best at NMR elucidation<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">My favorite subcategory in our benchmark is the 2D NMR elucidation questions. As a chemist, the concept of an LLM being able to solve the structure of a molecule from NMR data is mind blowing (and I think most chemists feel the same way). The results of our test show that GPT-5 has not had a substantial gain in its NMR elucidation capabilities, and Gemini 2.5 Pro is still in the lead. Specifically on the 2D NMR subset, both o3-mini and GPT-5 scored 6% (3\/50) whereas Gemini 2.5 Pro scored 20% (10\/50). I would have expected GPT-5 to do much better at these questions; I will try to investigate this further.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Time for ChemIQ-2?<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">When I started working on reasoning models in January, I anticipated the performance of LLMs in chemistry to improve rapidly over the course of the year. My supervisors and I expected ChemIQ to be saturated by the end of the year. While GPT-5 only reaches 70% overall success, it has achieved essentially perfect performance on four out of eight question categories. Don&#8217;t worry though, we anticipated this would happen. The questions in ChemIQ are algorithmically generated meaning we can quickly create a new set of even harder questions when needed. I also have a few more novel benchmark questions that we didn&#8217;t include in the original paper, which I might include in ChemIQ-2.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Demo 1: Interactive kinetics and titration dashboard<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">This is kinda crazy. I gave the model the prompt &#8220;<em>Create an impressive chemistry demo that showcases your capabilities. I need to embed this in our wordpress blog&#8221;<\/em>. In one go, GPT-5 generated an interactive web app (embedded below). You can enter Reaction Kinetics or Acid\u2013Base Titration parameters and the tool will plot the expected curves. (I didn&#8217;t do any prompt optimization, I didn&#8217;t try multiple times, this is what came out first try.)<\/p>\n\n\n<div id=\"chemlab-embed\" class=\"chemlab\">\n  <div class=\"wrap\" role=\"region\" aria-label=\"ChemLab Interactive\">\n    <header>\n      <h1>\ud83d\udd2c ChemLab Interactive<\/h1>\n      <span class=\"tag\" aria-label=\"demo tag\">Kinetics \u2022 Titration<\/span>\n    <\/header>\n\n    <nav class=\"tabs\" role=\"tablist\" aria-label=\"ChemLab Tabs\">\n      <button class=\"tab-btn\" role=\"tab\" id=\"chem-tab-kin\" aria-controls=\"chem-panel-kin\" aria-selected=\"true\">Reaction Kinetics<\/button>\n      <button class=\"tab-btn\" role=\"tab\" id=\"chem-tab-titr\" aria-controls=\"chem-panel-titr\" aria-selected=\"false\">Acid\u2013Base Titration<\/button>\n    <\/nav>\n\n    <section id=\"chem-panel-kin\" class=\"panel\" role=\"tabpanel\" aria-labelledby=\"chem-tab-kin\" aria-hidden=\"false\">\n      <div class=\"grid\">\n        <div class=\"card\" aria-live=\"polite\">\n          <h3>Arrhenius & First-Order Decay<\/h3>\n          <div class=\"row\">\n            <label>Temperature (\u00b0C)\n              <input id=\"kin-T\" type=\"number\" value=\"25\" step=\"1\" \/>\n            <\/label>\n            <input id=\"kin-T-range\" type=\"range\" min=\"-20\" max=\"200\" value=\"25\" \/>\n          <\/div>\n          <div class=\"row\">\n            <label>Activation Energy Ea (kJ\/mol)\n              <input id=\"kin-Ea\" type=\"number\" value=\"50\" step=\"1\" \/>\n            <\/label>\n            <input id=\"kin-Ea-range\" type=\"range\" min=\"10\" max=\"120\" value=\"50\" \/>\n          <\/div>\n          <div class=\"row\">\n            <label>Pre\u2011exponential A (s\u207b\u00b9)\n              <input id=\"kin-A\" type=\"number\" value=\"1e13\" step=\"1e12\" \/>\n            <\/label>\n            <input id=\"kin-A-range\" type=\"range\" min=\"1e9\" max=\"1e14\" value=\"1e13\" \/>\n          <\/div>\n          <div class=\"row\">\n            <label>Initial [A]\u2080 (mol\u00b7L\u207b\u00b9)\n              <input id=\"kin-A0\" type=\"number\" value=\"1\" step=\"0.1\" \/>\n            <\/label>\n            <input id=\"kin-A0-range\" type=\"range\" min=\"0.1\" max=\"2\" value=\"1\" step=\"0.1\" \/>\n          <\/div>\n          <label class=\"inline\" for=\"kin-cat\">\n            <span>Apply catalyst (\u0394Ea = \u221220 kJ\u00b7mol\u207b\u00b9)<\/span>\n            <span class=\"switch\"><input id=\"kin-cat\" type=\"checkbox\" \/><span class=\"slider\"><\/span><\/span>\n          <\/label>\n\n          <div class=\"meta\" style=\"margin-top:.5rem\">\n            <div class=\"pill\"><strong>k<\/strong> = <span id=\"kin-k\">\u2013<\/span> s\u207b\u00b9<\/div>\n            <div class=\"pill\"><strong>t<sub>1\/2<\/sub><\/strong> = <span id=\"kin-t12\">\u2013<\/span><\/div>\n            <div class=\"pill\"><strong>t<sub>90%<\/sub><\/strong> = <span id=\"kin-t90\">\u2013<\/span><\/div>\n          <\/div>\n        <\/div>\n\n        <div>\n          <div class=\"card\">\n            <canvas id=\"kin-chart\" aria-label=\"Concentration vs Time\"><\/canvas>\n            <div class=\"caption\">First\u2011order decay: [A](t) = [A]\u2080\u00b7e^(\u2212kt). Time axis scales to ~5 half\u2011lives.<\/div>\n          <\/div>\n          <div class=\"card\" style=\"margin-top:1rem\">\n            <canvas id=\"energy-chart\" aria-label=\"Reaction Energy Profile\"><\/canvas>\n            <div class=\"caption\">Energy profile with and without catalyst. Barrier height is Ea.<\/div>\n          <\/div>\n        <\/div>\n      <\/div>\n    <\/section>\n\n    <section id=\"chem-panel-titr\" class=\"panel\" role=\"tabpanel\" aria-labelledby=\"chem-tab-titr\" aria-hidden=\"true\">\n      <div class=\"grid\">\n        <div class=\"card\">\n          <h3>Strong\/Weak Acid vs Strong Base<\/h3>\n          <label>Acid type\n            <select id=\"titr-type\">\n              <option value=\"strong\">Strong acid (e.g., HCl)<\/option>\n              <option value=\"weak\" selected>Weak acid (e.g., CH\u2083COOH)<\/option>\n            <\/select>\n          <\/label>\n          <div class=\"row\">\n            <label>Acid concentration C<sub>a<\/sub> (mol\u00b7L\u207b\u00b9)\n              <input id=\"titr-Ca\" type=\"number\" value=\"0.10\" step=\"0.01\" \/>\n            <\/label>\n            <input id=\"titr-Ca-range\" type=\"range\" min=\"0.01\" max=\"1.0\" step=\"0.01\" value=\"0.10\" \/>\n          <\/div>\n          <div class=\"row\">\n            <label>Acid volume V<sub>a<\/sub> (mL)\n              <input id=\"titr-Va\" type=\"number\" value=\"25\" step=\"1\" \/>\n            <\/label>\n            <input id=\"titr-Va-range\" type=\"range\" min=\"5\" max=\"100\" step=\"1\" value=\"25\" \/>\n          <\/div>\n          <div class=\"row\">\n            <label>Base concentration C<sub>b<\/sub> (mol\u00b7L\u207b\u00b9)\n              <input id=\"titr-Cb\" type=\"number\" value=\"0.10\" step=\"0.01\" \/>\n            <\/label>\n            <input id=\"titr-Cb-range\" type=\"range\" min=\"0.01\" max=\"1.0\" step=\"0.01\" value=\"0.10\" \/>\n          <\/div>\n          <div class=\"row\" id=\"ka-row\">\n            <label>Acid pK<sub>a<\/sub>\n              <input id=\"titr-pKa\" type=\"number\" value=\"4.76\" step=\"0.01\" \/>\n            <\/label>\n            <input id=\"titr-pKa-range\" type=\"range\" min=\"2\" max=\"9\" step=\"0.01\" value=\"4.76\" \/>\n          <\/div>\n          <button class=\"btn\" id=\"titr-plot\">Plot curve<\/button>\n\n          <div class=\"meta\" style=\"margin-top:.5rem\">\n            <div class=\"pill\">V<sub>eq<\/sub> = <span id=\"titr-Veq\">\u2013<\/span> mL<\/div>\n            <div class=\"pill\">pH<sub>eq<\/sub> \u2248 <span id=\"titr-pHeq\">\u2013<\/span><\/div>\n            <div class=\"pill\">Suggested indicator: <span id=\"titr-ind\">\u2013<\/span><\/div>\n          <\/div>\n        <\/div>\n\n        <div>\n          <div class=\"card\">\n            <canvas id=\"titr-chart\" aria-label=\"pH vs Volume\"><\/canvas>\n            <div class=\"caption\">Titration curve from 0 to 2\u00d7 equivalence volume. Strong acid \u2192 equivalence at pH 7. Weak acid \u2192 pH > 7 at equivalence due to conjugate base hydrolysis.<\/div>\n          <\/div>\n          <div class=\"card\" style=\"margin-top:1rem\">\n            <div class=\"small\">Tip: hover\/click on the chart to read values. Use the sliders\/inputs to explore scenarios.<\/div>\n          <\/div>\n        <\/div>\n      <\/div>\n    <\/section>\n\n    <div class=\"footer\">Made for education. Assumes ideal behavior (dilute solutions, strong electrolytes). \u00a9 ChemLab Interactive<\/div>\n  <\/div>\n<\/div>\n\n\n\n<h1 class=\"wp-block-heading\">Demo 2: Do something really impressive using RDKit. Create an info graphic using the results.<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">As in my previous blog post, I have very little creativity and have no idea what to ask these models. I used the above prompt and this is what I got (single prompt, first try). Previously this multi-step analysis would have taken a short conversation and iterative prompting; GPT-5 did this all by itself in one go. You can click on the info graphic below to see it in full; I have also uploaded the figures separately so they render better.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/rdkit_infographic-1-scaled.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"392\" height=\"1024\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/rdkit_infographic-1.png?resize=392%2C1024&#038;ssl=1\" alt=\"\" class=\"wp-image-12888\" style=\"width:99px;height:auto\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/rdkit_infographic-1-scaled.png?resize=392%2C1024&amp;ssl=1 392w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/rdkit_infographic-1-scaled.png?resize=115%2C300&amp;ssl=1 115w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/rdkit_infographic-1-scaled.png?resize=768%2C2005&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/rdkit_infographic-1-scaled.png?resize=588%2C1536&amp;ssl=1 588w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/rdkit_infographic-1-scaled.png?resize=784%2C2048&amp;ssl=1 784w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/rdkit_infographic-1-scaled.png?resize=624%2C1629&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/rdkit_infographic-1-scaled.png?w=981&amp;ssl=1 981w\" sizes=\"auto, (max-width: 392px) 100vw, 392px\" \/><\/a><\/figure>\n<\/div>\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/molecule_grid.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"750\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/molecule_grid.png?resize=625%2C750&#038;ssl=1\" alt=\"\" class=\"wp-image-12870\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/molecule_grid.png?resize=853%2C1024&amp;ssl=1 853w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/molecule_grid.png?resize=250%2C300&amp;ssl=1 250w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/molecule_grid.png?resize=768%2C922&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/molecule_grid.png?resize=1280%2C1536&amp;ssl=1 1280w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/molecule_grid.png?resize=624%2C749&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/molecule_grid.png?w=1500&amp;ssl=1 1500w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/chemical_space_scatter.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"406\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/chemical_space_scatter.png?resize=625%2C406&#038;ssl=1\" alt=\"\" class=\"wp-image-12871\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/chemical_space_scatter.png?resize=1024%2C666&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/chemical_space_scatter.png?resize=300%2C195&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/chemical_space_scatter.png?resize=768%2C499&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/chemical_space_scatter.png?resize=1536%2C999&amp;ssl=1 1536w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/chemical_space_scatter.png?resize=624%2C406&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/chemical_space_scatter.png?w=1693&amp;ssl=1 1693w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/chemical_space_scatter.png?w=1250&amp;ssl=1 1250w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ro5_violations_bar.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"403\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ro5_violations_bar.png?resize=625%2C403&#038;ssl=1\" alt=\"\" class=\"wp-image-12872\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ro5_violations_bar.png?resize=1024%2C661&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ro5_violations_bar.png?resize=300%2C194&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ro5_violations_bar.png?resize=768%2C496&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ro5_violations_bar.png?resize=1536%2C992&amp;ssl=1 1536w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ro5_violations_bar.png?resize=624%2C403&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ro5_violations_bar.png?w=1686&amp;ssl=1 1686w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2025\/08\/ro5_violations_bar.png?w=1250&amp;ssl=1 1250w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Demo 3: Molecular orbital theory app<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">As my last demo, I tried making an app for visualizing molecular orbitals. Unfortunately, the model never managed to visualize 3D molecular orbitals (which is what I really wanted). After a bit of vibe coding, I arrived at this app for simple molecular orbital theory. There are clear errors in the chemistry, and I could iterate further to improve the app, but as a quick few prompt vibe-coded project, this is awesome. (wordpress didn&#8217;t allow rendering of unicode characters, so GPT-5 has tried to draw it&#8217;s own electrons that look a bit questionable). <\/p>\n\n\n  \n<div id=\"mo-app\" class=\"mo-wrap\" role=\"application\" aria-label=\"Interactive MO energy diagram app\">\n  <div class=\"header\">\n    <div>\n      <div class=\"title\">Molecular Orbital Theory - Energy Diagrams<\/div>\n    <\/div>\n    <div class=\"legend\">\n      <span class=\"dot sigma\"><\/span> sigma bonding\n      <span class=\"dot pi\"><\/span> pi bonding\n      <span class=\"dot star\"><\/span> sigma*\/pi* antibonding\n    <\/div>\n  <\/div>\n\n  <section class=\"card\">\n    <h2>MO Diagram Builder<\/h2>\n    <div class=\"controls\">\n      <div>\n        <label for=\"molecule\">Molecule<\/label>\n        <select id=\"molecule\">\n          <optgroup label=\"1s diatomics\">\n            <option value=\"H2\">H2<\/option>\n            <option value=\"He2\">He2<\/option>\n          <\/optgroup>\n          <optgroup label=\"2nd period homonuclear\">\n            <option value=\"Li2\">Li2<\/option>\n            <option value=\"Be2\">Be2<\/option>\n            <option value=\"B2\">B2<\/option>\n            <option value=\"C2\">C2<\/option>\n            <option value=\"N2\" selected>N2<\/option>\n            <option value=\"O2\">O2<\/option>\n            <option value=\"F2\">F2<\/option>\n            <option value=\"Ne2\">Ne2<\/option>\n          <\/optgroup>\n        <\/select>\n      <\/div>\n      <div>\n        <label for=\"electrons\">Valence electrons (auto; editable for ions)<\/label>\n        <input id=\"electrons\" type=\"number\" min=\"0\" max=\"24\" step=\"1\" \/>\n      <\/div>\n      <div>\n        <label for=\"spin\">Apply Hund's rule<\/label>\n        <select id=\"spin\">\n          <option value=\"on\" selected>On (maximize spins in degenerate pi)<\/option>\n          <option value=\"off\">Off (pair as you go)<\/option>\n        <\/select>\n      <\/div>\n      <div>\n        <label for=\"reset\" class=\"sr-only\">Reset<\/label>\n        <button id=\"reset\" class=\"ghost\" type=\"button\">Reset<\/button>\n      <\/div>\n    <\/div>\n\n    <div class=\"diagram-wrap\" aria-live=\"polite\">\n      <div class=\"axis\" aria-hidden=\"true\"><\/div>\n      <svg id=\"mo-svg\" viewBox=\"0 0 1000 560\" role=\"img\" aria-label=\"Energy level diagram showing atomic and molecular orbitals\"><\/svg>\n    <\/div>\n\n    <div class=\"controls\" style=\"margin-top:.75rem;grid-template-columns:1fr 1fr 1fr\">\n      <div class=\"card\" style=\"padding:.6rem\"><small>Bond order<\/small><strong id=\"bo\">-<\/strong><\/div>\n      <div class=\"card\" style=\"padding:.6rem\"><small>Magnetism<\/small><strong id=\"mag\">-<\/strong><\/div>\n      <div class=\"card\" style=\"padding:.6rem\"><small>Stability<\/small><strong id=\"stab\">-<\/strong><\/div>\n    <\/div>\n  <\/section>\n\n  <section class=\"card readme\">\n    <h2>What an MO energy diagram shows<\/h2>\n    <p><strong>Vertical axis = energy.<\/strong> Left\/right columns are atomic orbitals (AOs); the centre column contains molecular orbitals (MOs) formed by symmetry-matched combinations of AOs.<\/p>\n    <p><strong>Bonding vs antibonding:<\/strong> Bonding MOs are lower in energy than the parent AOs; antibonding MOs are higher. Electrons fill from low to high (Aufbau), one per degenerate pi before pairing (Hund), max two with opposite spins (Pauli).<\/p>\n    <p><strong>2p ordering crossover:<\/strong> For B2\u2013N2, strong 2s\u20132p mixing raises sigma(2p) relative to pi(2p), so pi(2p) lies below sigma(2p). For O2\u2013F2, mixing weakens with Z and the order inverts: sigma(2p) below pi(2p). Oxygen's two unpaired electrons in pi*(2p) make O2 paramagnetic.<\/p>\n    <p><strong>Bond order<\/strong> = (bonding e - antibonding e) \/ 2. Any unpaired electrons imply paramagnetism.<\/p>\n    <details class=\"tests\">\n      <summary>Developer tests (click to run)<\/summary>\n      <div style=\"margin:.5rem 0\">\n        <button id=\"run-tests\">Run tests<\/button>\n        <button id=\"fill-expected\">Fill expected results (H2 to Ne2)<\/button>\n      <\/div>\n      <pre id=\"test-out\" aria-live=\"polite\"><\/pre>\n    <\/details>\n  <\/section>\n<\/div>\n\n  \n\n\n\n<h2 class=\"wp-block-heading\">The Pauling Principle &#8211; Can language models understand chemistry?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">I was recently a guest on the The Pauling Principle podcast where Javier and I discussed language models for chemistry. You can listen to our chat wherever you listen to podcasts (e.g. spotify (<a href=\"https:\/\/open.spotify.com\/episode\/44PC0FDk0EPyPXsbGS6ulK?si=43f68c77dc254da1\">https:\/\/open.spotify.com\/episode\/44PC0FDk0EPyPXsbGS6ulK?si=43f68c77dc254da1<\/a>), or on youtube: <\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"The Pauling Principle Episode 13 - Can language models understand chemistry?\" width=\"625\" height=\"352\" src=\"https:\/\/www.youtube.com\/embed\/POkyfawdoc0?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<h1 class=\"wp-block-heading\">Conclusion<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">It has been 2 months since I wrote my previous blog post about ChatGPT having access to RDKit (<a href=\"https:\/\/www.blopig.com\/blog\/2025\/06\/chatgpt-can-now-use-rdkit\/\">https:\/\/www.blopig.com\/blog\/2025\/06\/chatgpt-can-now-use-rdkit\/<\/a>). The progress that has been made in such a short time is truly astounding. I can&#8217;t possibly capture in a single blog post all the capabilities of GPT-5, you really must try it out for yourself. As previous, I would love for chemists to trying asking these models to do crazy tasks like interpreting their data and generating hypotheses. These models are getting very smart very quick, and I think they will now be helpful in scientific discovery. Please share your findings with me! I&#8217;m interested to see what others get these models to do. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8211; Nicholas<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have run ChemIQ (our chemical reasoning benchmark) on GPT-5. The model achieves state-of-the-art performance with substantial improvements in the ability to interpret SMILES strings. Read my analysis and initial findings below. Scroll to the end for some cool demos. Figure 1: Success rates for each model on the ChemIQ reasoning benchmark. Horizontal brackets between [&hellip;]<\/p>\n","protected":false},"author":133,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[187,29,123,263],"tags":[],"ppma_author":[830],"class_list":["post-12852","post","type-post","status-publish","format-standard","hentry","category-cheminformatics","category-code","category-commentary","category-news"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":830,"user_id":133,"is_guest":0,"slug":"nicholas","display_name":"Nicholas Runcie","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/0e750509b2d35cd15a7d1a304722cab7dd4601643e63e604b835d0b3ea14a45a?s=96&d=mm&r=g","author_category":"","user_url":"","last_name":"Runcie","first_name":"Nicholas","job_title":"","description":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/12852","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/133"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=12852"}],"version-history":[{"count":5,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/12852\/revisions"}],"predecessor-version":[{"id":12918,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/12852\/revisions\/12918"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=12852"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=12852"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=12852"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=12852"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}