In this post I’ll walk through how to set up the CCDC Python API and use the CSD Geometry Analyser to evaluate the geometric quality of molecules from three representative structure-based de novo design models. I’ve put together a small GitHub repo with the full analysis code where we look at bond lengths, angles, torsions, and ring conformations across the three methods, and compare these against their PoseBusters validity scores to see what each metric is really capturing.
First, register. I used my university email just in case.
Once you register, login and activate your account. You’ll want to visit the License Portal which you can find under My Account. Here, enter the activation key along with the customer number, both of which you can obtain from your supervisor. Keep a note of these, you’ll need them in the next several steps.
Now, the next part is kind of weird but you have to go and request a download link, despite being registered, It is not automatically available to you. The download links expire within 24 hours, so be quick.
You will receive the email with a link to download everything (see image below). Here you can choose what you want to download. It is best to click the recommended ‘small download’ for your machine. Keep in mind, even the small download takes up a good chunk of space on your machine. Once you start the download, you can untick and tick what you’d like to download. Personally, I did not download the GUIs and only kept the python API and datasets, but it’s up to you. In my Master’s days, I quite enjoyed using Hermes.

This youtube video is helpful if you get confused with the download process.
Environment Setup
The python API comes with a ready to go environment and depending on where you saved it during the download will resemble something like this :
[...]/CCDC/ccdc-software/csd-python-api/run_csd_python_api
If you click the run_csd_python_api, a terminal will open with the environment ready to go but it’s not my favourite to work this way so I use python scripts in cursor or vscode and just activate the env before running them. Be aware you don’t get RDKit in this env and although this is upsetting, we can just have a split terminal open with each env.
# Activate the CSD Python API conda environment # Don't try to use Conda activate - it won't work source ~/CCDC/ccdc-software/csd-python-api/miniconda/bin/activate
Geometry Analyser Snippet
The Geometry Analyser works by comparing molecular features against distributions observed in the CSD. The Cambridge Structural Database (CSD) contains over 1.3 million small molecule crystal structures and forms the knowledge base that Mogul draws on when assessing whether a given bond length, angle, torsion, or ring conformation is chemically reasonable. You can find more on the conformer API here and some nice examples here.
Okay, now for some short example code where set up the geometry analyser and analyse a single molecule.
from ccdc.io import MoleculeReader
from ccdc.conformer import GeometryAnalyser
# simple enough, build the geometry analyser instance
def build_analyser():
analyser = GeometryAnalyser()
analyser.settings.generalisation = False # MUCH faster — skips generalised CSD searches but results less accurate and may give more unusual fractions
analyser.settings.organometallic_filter = 'Organic' # organic mols only
return analyser
# --- try it on a single molecule ---
analyser = build_analyser()
# load the first molecule from an SDF file
mol = MoleculeReader('data/drugflow_extracted/1a2g-A-rec-4jmv-1ly-lig-tt-min-0-pocket10_1a2g-A-rec-4jmv-1ly-lig-tt-min-0.sdf')[0]
# standardise bond types and add hydrogens — recommended before any Mogul analysis
# this ensures the fragment matching against the CSD is as accurate as possible
mol.assign_bond_types(which='unknown') # assign any untyped bonds
mol.standardise_aromatic_bonds() # normalise aromatic bond representations
mol.standardise_delocalised_bonds() # normalise delocalised systems e.g. carboxylates
mol.add_hydrogens() # Mogul needs explicit hydrogens for fragment matching
# run the full geometry analysis — returns the molecule with analysis attributes attached
analysed = analyser.analyse_molecule(mol)
# for each geometry feature, filter to those where Mogul found enough CSD hits
# to make a confident judgement (default thresholds predefined by mogul but may be changed)
# then compute the fraction flagged as unusual
for feature_name, features in [
('Bond lengths', analysed.analysed_bonds),
('Bond angles', analysed.analysed_angles),
('Torsions', analysed.analysed_torsions),
('Rings', analysed.analysed_rings),
]:
# exclude features without enough CSD data — not enough hits to call unusual
valid = [f for f in features if f.enough_hits]
if not valid:
print(f"{feature_name}: no features with enough CSD hits")
continue
# fraction of valid features flagged as geometrically unusual by Mogul
unusual_fraction = sum(1 for f in valid if f.unusual) / len(valid)
print(f"{feature_name}: {unusual_fraction:.2f} unusual ({len(valid)} features checked)")
Short Analysis
PoseBusters validity evaluates structural validity of generated molecules and is an extremely useful benchmark in our field. Here I complement this with CSD Mogul geometry analysis, which adds a chemistry-aware layer by comparing bond lengths, angles, torsions and ring conformations against experimentally determined crystal structures. Together these two metrics paint a more complete picture of generated molecule quality.
For this short analysis I used one generated molecule per pocket across 100 pockets from the CrossDocked2020 test set, giving 100 molecules per method. The three methods, Pocket2Mol (autoregressive), TargetDiff (diffusion), and DrugFlow (flow matching), were chosen because their architectures represent a natural progression in generative modelling for structure-based drug design. What’s interesting is that the unusual fractions decrease across the architectural progression, with DrugFlow consistently showing the lowest values across all four geometry features.

Now, let’s see if these results are reflected in PoseBusters pass rates:
| Method | PoseBusters Pass Rate % |
| DrugFlow | 90% |
| Pocket2Mol | 100% |
| TargetDiff | 45% |
We won’t draw any hard conclusions since this is only a fun blog post. It is worth noting that we are comparing two different things. PoseBusters applies hard pass/fail thresholds to individual bond lengths and angles, whereas the CCDC Geometry Analyser takes a knowledge-based approach, flagging features that fall outside distributions observed in the CSD, making it a softer signal rather than a strict failure criterion. I think it would be interesting to look at how the minimised poses may do from each method, alongside some minimised and raw crystal structure poses as well!
I provide the scripts and the data used to generate the plot above on a Github repo: https://github.com/SanazKaz/molgeom-eval
The data for all models was obtained from the DrugFlow Github which linked to: https://zenodo.org/records/14919171
Happy CCDC’ing!
