Handling electron density in python for fun and profit

No-one ever observed a pdb file in nature. The experimental data we build our protein models from is not quite so nicely paramterised. The vast makority of models are fit into electron density maps, mostly produced from macromolecular crystalography or cryo-em.

Now admitedly even these electron density maps are not the raw experimental data either, but they’re a lot closer than models. So its worth knowing how to handle them.

We’ll be covering a few simple ways that you can load this data into friendly python formats to fiddle with using the standard python numpy based scientific stack.

Electron density data is primarily served in the .ccp4 format. If you’re coming from crystalography you are also likely to need to work with the reflections, which is primarily in .mtz. There are three main libraries in python for this:

ccp4mtz
cctbxYesYes
GridDataFormatsYes (limited)No
clipperYesYes

CCTBX

CCTBX is the oldest and most feature complete library for dealing with electron desnity data. It can do everything you want, but getting it to do it is going to be hard. If you want to get any good at this theres going to be a lot of emailing the bullitin boards for help.

ProsCons
Oldest and most complete crystallographic library Hard to install
Hard to use
Python 2.7 only
Needs special version of python incompatible with many other libraries
Functionally no documentation: you basically need to email the author if you want to know how something works (or even what its arguments are) 

Installing cctbx can be… non-trivial, and beyond the scope of this tutorial.

Loading a ccp4 map in cctbx is realtively simple:

from iotbx.file_reader import any_file
f = any_file(file)
xmap = f.file_object

GridDataFormats

GridDataFormats is in many ways the opposite of cctbx. Only a few years old, well documented, pythonic. Unfortuantely it is not very feature complete, lacking any functionality for dealing with mtzs and only limited functionality for ccp4 files.

ProsCons
Easy to install Does not handle symmetry
Easy to use: pythonic A little slow
Good documentation No reflection data

Grid data formats is also easy to install! Simply

conda config --add channels condo-forge
conda install griddataformats

And you’re good! Loading a map is similarly intuitive:

from gridData import CCP4
g = CCP4()
g.read(file)

Clipper

Clipper is a

ProsCons
Easy-ish to install Low level compared to cctbx
Easy-ish to use Less easy to use than GridDataFormats
Complete low level functionality for reflections and maps Little dedicated python docs
Very fast and well tested
Excellent c++ docs that are applicable to python

Clipper is very easy to get if you don’t mind the slightly older SWIG wrapped version.

pip install clipper-python

Tristan Croll’s pybind11 wrapped clipper is preferable, but requires installing from source from: https://github.com/clipper-python/clipper-python/tree/pybind11

Loading a map in clipper is very straightforward if you are a C++ programmer, but requires a little thinking if you are used to python.

import clipper_python as clipper
Xmap = clipper.Xmap()
F = clipper.CCP4MAPFile()
f.open_read(file)
f.import_xmap(xmap)
f.close_read()
xmap.export_numpy()

Summary: probably jsut used GDF or clipper

CCTBX is a real pain to work with: I’d only use it if it was to interface to legacy code I didn’t want to reimplement

If you don’t need symmetry or electron density map specific functionality and don’t mind things being a little slow use GridDataFormats

Otherwise use Clipper

Author