Crystallographic programming: Super short tour of the cctbx

Two of the leading packages in crystallography are Phenix and CCP4. For most practicing crystallographers they will interact via with these to progress a single crystallographic data-set from diffraction images, through integration, merging, phasing, model building and hopefully deposition.

However, if you want to develop crystallographic software, you will likely need to decide on a framework to build upon. Phenix is built on the comprehensive cctbx library, whereas CCP4 programs are typically standlone, although common crystallographic libraries such as clipper and cctbx are utilised.

CCTBX is written mainly in python, with core crystallographic functionality written in C++. My usual starting place for understanding functionality is through the pdb parser tutorial. This introduces the concept of a hierarchy, a iterative way to represent a macromolecule:

from iotbx.pdb import hierarchy
pdb_in = hierarchy.input(file_name="model.pdb")
for chain in pdb_in.hierarchy.only_model().chains() :
  for residue_group in chain.residue_groups() :
    for atom_group in residue_group.atom_groups() :
      for atom in atom_group.atoms() :
        if (atom.element.strip().upper() == "ZN") :
          atom_group.remove_atom(atom)
      if (atom_group.atoms_size() == 0) :
        residue_group.remove_atom_group(atom_group)
    if (residue_group.atom_groups_size() == 0) :
      chain.remove_residue_group(residue_group)
f = open("model_Zn_free.pdb", "w")
f.write(pdb_in.hierarchy.as_pdb_string(
  crystal_symmetry=pdb_in.input.crystal_symmetry()))
f.close()

Although there are many ways to parse a pdb file, the introduction to iotbx.pdb, gives a view of how xray structure data can be associated to the model. The tour of the cctbx can be helpful starting place, especially for understanding how the python and c++ functionality interact through boost and the scitbx.array_family.flex. Unfortunately, documentation on cctbx tends to vary in quality and quantity throughout the modules:

Other components of the library include ways to simulate crystallographic data through simtbx,  and tools for processing xfel data.

As the library is open source, github hosted source code allows exploration of previously written routines, which can be very helpful for understanding the inner workings of the library. Note that there are also bulletin boards for users and developers of phenix and cctbx respectively. A few tutorials can also be found.

Hopefully this post will give someone other than me a reminder of where to find resources to get started developing within CCTBX.