Are you curious about how scientists design small molecules to treat disease using computational tools, but the words RDKit, docking, and QED mean nothing to you? Look no further than these tutorials for learning the fundamentals of computational small molecule drug design through interactive tutorials that introduce the key tools, concepts, and workflows. From generating compounds to evaluating their drug-likeness and binding potential, by the end you’ll be ready to explore how computational methods can result in the discovery of your very own (virtual) drug candidates to cure Zika!
Find the materials here: https://github.com/oxpig/dtc-struc-bio-smolecules/tree/main.
These materials are designed to help you learn the basics of small molecule drug design. Small molecules are compounds smaller than proteins, typically containing around 20–70 heavy (non-hydrogen) atoms. Using structure-based techniques, you will explore key concepts through a series of interactive Jupyter notebooks that cover:
- Introducing RDKit (Landrum et al., 2025)
- Calculating molecular properties
- Using reinforcement learning to generate compounds with optimal drug-like properties with REINVENT (Löffler et al., 2024)
- Visualizing an experimentally resolved compound bound to a protein in 3D using py3Dmol (Github)
- Generate compounds using LibINVENT (Fialková et al., 2021)
- How to think about filtering and prioritizing compounds
- Predict the conformations and energies of generated compounds with docking using SMINA (Koes et al., 2013)
- Calculate protein-ligand interactions using ProLIF (Bouysset & Fiorucci, 2021).
- Choose your own adventure to design your very own small molecule! 🧑🔬
⚠️ Disclaimer
This repository is not a comprehensive guide to small molecule drug design. It reflects the perspectives and tool preferences of the authors and is intended for educational purposes only. These materials were developed with support from the Doctoral Training Centre at the University of Oxford.
P.S. To help out those who are still unsure what those three words mean.
- RDKit: Research and development kit. The most popular open-source cheminformatic toolkit for molecular representation and manipulation. (Landrum et al., 2025)
- Docking: A compuational method used to predict the preferred conformation and binding energy of a molecule bound to a protein target.
- QED: Quantitative Estimate of Druglikeness. A metric that quantifies how similar a compound is to known drugs based on various physicochemical properties. A higher QED means that compound is more druglike and vice versa (Bickerton GR et al., 2012).
