Category Archives: Python

Advanced PyMOL Visualization for Weighted Structural Ensembles (Part 1): Ensemble Comparison

When working with structural ensembles from molecular dynamics, AlphaFold2 subsampling, or ensemble reweighting against experimental data, you quickly run into visualization problems. Many of these problems standard PyMOL tutorials don’t address: what do you do when there’s no single reference structure?

In this two-part series, I’ll share the PyMOL techniques I’ve developed for visualizing weighted ensembles where multiple conformational states coexist. Part 1 covers reference state handling, RMSD-based coloring, and cluster visualization. Part 2 will tackle efficient SASA surface generation for large ensembles. To the best of my knowledge, this is the most advanced PyMOL guide EVER.

The code snippets here are extracted from full scripts attached at the end of this post. All examples use two systems: TeaA (a membrane transporter with distinct open/closed states) and MoPrP (mouse Prion Protein with partially unfolded forms).

Continue reading

Design your very own drug: An introduction to structure-based small molecule drug design

Are you curious about how scientists design small molecules to treat disease using computational tools, but the words RDKit, docking, and QED mean nothing to you? Look no further than these tutorials for learning the fundamentals of computational small molecule drug design through interactive tutorials that introduce the key tools, concepts, and workflows. From generating compounds to evaluating their drug-likeness and binding potential, by the end you’ll be ready to explore how computational methods can result in the discovery of your very own (virtual) drug candidates to cure Zika!

Find the materials here: https://github.com/oxpig/dtc-struc-bio-smolecules/tree/main.

Continue reading

Exploring the Protein Data Bank programmatically

The Worldwide Protein Data Bank (wwPDB or just the PDB to its friends) is a key resource for structural biology, providing a single central repository of protein and nucleic acid structure data. Most researchers interact with the PDB either by downloading and parsing individual entries as mmCIF files (or as legacy PDB files), or by downloading aggregated data, such as the RCSB‘s collection in a single FASTA file of all polymer entity sequences. All too often, researchers end up laboriously writing their own file parsers to digest these files. In recent years though, more sophisticated tools have been made available that make it much easier to access only the data that you need.

Continue reading

Controlling PyMol from afar

Do you keep downloading .pdb and .sdf files and loading them into PyMol repeatedly?

If yes, then PyMol remote might be just for you. With PyMol remote, you can control a PyMol session running on your laptop from any other machine. For example, from a Jupyter Notebook running on your HPC cluster.

Continue reading

Testing python (or any!) command line applications

Through our work in OPIG, many of our projects come in the form of code bases written in Python. These can be many different things like databases, machine learning models, and other software tools. Often, the user interface for these tools is developed as both a web app and a command line application. Here, I will discuss one of my favourite tools for testing command-line applications: prysk!

Continue reading

Tanimoto similarity of ECFPs with RDKit: Common pitfalls

A common measure for the similarity of two molecules is the Tanimoto similarity of their ECFPs (Extended Connectivity FingerPrint). However, there is no clear standard in literature for what kind of ECFPs should be used when calculating the Tanimoto similarity, and that choice can lead to substantially different results. In this post I wish to shed light on some results you should know about before you jump into your calculations.

A blog post on how ECFPs are generated was written by Marcus Dablander in 2022 so please take a look at that. In short, ECFPs have a hyperparameter called the radius r, and sometimes a fingerprint length L. Each entry in the fingerprint indicates the presence or absence of a particular substructure in the molecule of interest, and the radius r defines how large the substructures that you consider are. If you have r=3 then you consider substructures made by going up to three hops away from each atom in your molecule. This is best explained by this figure from Marcus’ post:

Continue reading

I really hope my compounds get the green light

As a cheminformatician in a drug discovery campaign or an algorithm developer making the perfect Figure 1, when one generates a list of compounds for a given target there is a deep desire that the compounds are well received by the reviewer, be it a med chemist on the team or a peer reviewer. This is despite scientific rigour and training and is due to the time invested. So to avoid the slightest shadow of med chem grey zone, here is a hopefully handy filter against common medchem grey-zone groups.

Continue reading

Making your code pip installable

aka when to use a CutomBuildCommand or a CustomInstallCommand when building python packages with setup.py

Bioinformatics software is complicated, and often a little bit messy. Recently I found myself wading through a python package building quagmire and thought I could share something I learnt about when to use a custom build command and when to use a custom install command. I have also provided some information about how to copy executables to your package installation bin. **ChatGPT wrote the initial skeleton draft of this post, and I have corrected and edited.

Next time you need to create a pip installable package yourself, hopefully this can save you some time!

Continue reading

Open Source PyMOL installation on Windows

A year ago, I used Gheorghe Rotaru’s helpful blog post to install PyMOL. Unfortunately, after resetting my computer, I have just discovered that some of the links are broken. Here are the installation steps with new links provided by Christoph Gohlke, who generously offers pre-compiled Windows versions of the latest PyMOL software along with all its requirements.

Install the latest version of Python 3 for Windows:
Download the Windows Installer (x-bit) for Python 3 from their website, with x being your Windows architecture – 32 or 64.

Follow the instructions provided on how to install Python. You can confirm the installation by running ‘py’ in PowerShell.

Continue reading