Unclear documentation? ChatGPT can help!

The PyMOL Python API is a useful resource for most people doing research in OPIG, whether focussed on antibodies, small molecule drug design or protein folding. However, the documentation is poorly structured and difficult to interpret without first having understood the structure of the module. In particular, the differences between use of the PyMOL command line and the API can be unclear, leading to a much longer debugging process for code than you’d like.

While I’m reluctant to continue the recent theme of ChatGPT-related posts, this is a use for ChatGPT that would have been incredibly useful to me when I was first getting to grips with the PyMOL API.

While there are plenty of articles on how ChatGPT can write code for you if you instruct it correctly, ChatGPT can also interpret documentation and generate code examples for particular modules, including PyMOL. For example, if I ask ChatGPT to tell me about the PyMOL API and some of its useful functions, the following is returned:

PyMOL is a powerful open-source molecular visualization tool widely used in structural biology and drug design. While PyMOL provides a user-friendly graphical interface, many researchers prefer to use Python scripts to automate tasks or perform custom analyses. In this blog post, we’ll outline some useful functions for PyMOL that can be used in a Python script.

  1. cmd.load(): This function loads a molecular structure into PyMOL. It takes the path to the file containing the structure as an argument. For example, cmd.load('protein.pdb') will load a PDB file called ‘protein.pdb’.
  2. cmd.select(): This function selects atoms or residues based on a given selection string. For example, cmd.select('ligand', 'resn LIG') will select all atoms with residue name ‘LIG’ and label them as ‘ligand’.
  3. cmd.color(): This function assigns a color to a selection of atoms or residues. It takes the selection and the color name as arguments. For example, cmd.color('green', 'ligand') will color the ‘ligand’ selection green.
  4. cmd.hide(): This function hides a selection of atoms or residues from the PyMOL display. It takes the selection as an argument. For example, cmd.hide('everything', 'ligand') will hide all atoms of the ‘ligand’ selection.
  5. cmd.show(): This function shows a selection of atoms or residues in the PyMOL display. It takes the selection and the representation type as arguments. For example, cmd.show('sticks', 'ligand') will show the ‘ligand’ selection as sticks.
  6. cmd.align(): This function aligns two structures based on a specified selection. It takes two selections and an optional RMSD cutoff value as arguments. For example, cmd.align('reference', 'mobile', cutoff=1.0) will align the ‘mobile’ selection to the ‘reference’ selection with an RMSD cutoff of 1.0 Å.
  7. cmd.center(): This function centers the PyMOL display on a specified selection. It takes the selection as an argument. For example, cmd.center('ligand') will center the display on the ‘ligand’ selection.
  8. cmd.save(): This function saves the current PyMOL display to a file. It takes the file name and file format as arguments. For example, cmd.save('image.png', 'png') will save the PyMOL display as a PNG image.
  9. cmd.extract(): This function creates a new object from a selection of atoms or residues. It takes the selection and the new object name as arguments. For example, cmd.extract('protein', 'chain A') will extract chain A from the loaded structure and label it as ‘protein’.
  10. cmd.super(): This function superimposes two structures based on a specified selection. It takes two selections and an optional RMSD cutoff value as arguments. For example, cmd.super('reference', 'mobile', cutoff=1.0) will superimpose the ‘mobile’ selection to the ‘reference’ selection with an RMSD cutoff of 1.0 Å.
  11. cmd.zoom(): This function zooms the PyMOL display in or out. It takes a zoom factor as an argument. For example, cmd.zoom(2) will zoom the display in by a factor of 2.
  12. cmd.label(): This function labels atoms or residues in the PyMOL

This gives you most of the basic functions that are required, in a much more digestible and easily-implementable format than the original documentation. Additionally, if you’d like to do something specific, you can ask ChatGPT and it will give you a possible solution. For example, if I ask how to get a list of residues and their corresponding numbers, ChatGPT returns the following:

You can use the PyMOL API function cmd.iterate() to iterate over all residues in your protein and extract their residue numbers and names.

Here’s an example Python script that demonstrates this:

import pymol

# Start PyMOL and load the protein structure
pymol.finish_launching()
cmd = pymol.cmd
cmd.load('protein.pdb')

# Create empty lists for residue numbers and names
residue_numbers = []
residue_names = []

# Iterate over all residues and extract their numbers and names
cmd.iterate('all', 'resi.append(resi_number); resn.append(resi_name)', space=locals())

# Print the list of residue numbers and names
for i in range(len(residue_numbers)):
    print(f'Residue {residue_numbers[i]}: {residue_names[i]}')

This script first starts PyMOL and loads the protein structure from a PDB file. It then creates empty lists for the residue numbers and names. The cmd.iterate() function is then used to iterate over all residues in the protein, and for each residue, the residue number and name are appended to the respective lists.

Finally, the script prints the list of residue numbers and names using a for loop.

Note that in the cmd.iterate() function, 'all' specifies that all residues in the protein should be iterated over, and 'resi.append(resi_number); resn.append(resi_name)' specifies the actions to be taken for each residue. resi_number and resi_name are temporary variables that are defined in the locals() namespace and are used to store the residue number and name for each iteration.

This is just an example of how ChatGPT can read the documentation for you, and allow you to find useful functions and write code without you having to spend lots of time understanding the module.

Of course, ChatGPT should always be used with caution, as it has been shown to often produce inaccurate or incomplete information, but it can provide a great starting point to understanding a new field, particularly where available documentation is difficult to interpret.

Author