1. Introduction
Molecular docking with graph neural networks works by representing the molecules as featurized graphs. In DiffDock, each ligand becomes a graph of atoms (nodes) and bonds (edges), with features assigned to every atom using chemical properties such as atom type, implicit valence and formal charge.
We recently discovered that a change in RDKit versions significantly reduces performance on the PoseBusters benchmark, due to changes in the “implicit valence” feauture. This post walks through:
- How DiffDock featurises ligands
- What happened when we upgraded RDKit 2022.03.3 → 2025.03.1
- Why training with zero-only features and testing on non-zero features is so bad
TL:DR: Use the dependencies listed in the environment.yml file, especially in the case of DiffDock, or your performance could half!
Continue reading