Monthly Archives: April 2022

Women in Computing: past, present and what we can do to improve the future.

Computing is one of the only scientific fields which was once female-dominated. In the 30s and 40s, women made up the bulk of the workforce doing complex, tedious calculations in the fields including ballistics, astrophysics, aeronautics (think Hidden Figures) and code-breaking. Engineers themselves found that the female computers were far more reliable than themselves in doing such calculations [9]. As computing machines became available, there was no precedent set for the gender of a computer operator, and so the women previously doing the computing became the computer operators [10].

However, this was not to last. As computing became commercialised in the 50s, the skill required for computing work was starting to be recognised. As written in [1]:

“Software company System Development Corp. (SDC) contracted psychologists William Cannon and Dallis Perry to create an aptitude assessment for optimal programmers. Cannon and Perry interviewed 1,400 engineers — 1,200 of them men — and developed a “vocational interest scale,” a personality profile to predict the best potential programmers. Unsurprisingly given their male-dominated test group, Cannon and Perry’s assessment disproportionately identified men as the ideal candidates for engineering jobs. In particular, the test tended to eliminate extroverts and people who have empathy for others. Cannon and Perry’s paper concluded that typical programmers “don’t like people,” forming today’s now pervasive stereotype of a nerdy, anti-social coder.”

Continue reading →

OpenMM Setup: Start Simulating Proteins in 5 Minutes

Molecular dynamics (MD) simulations are a good way to explore the dynamical behaviour of a protein you might be interested in. One common problem is that they often have a relatively steep learning curve when using most MD engines.

What if you just want to run a simple, one-off simulation with no fancy enhanced sampling methods? OpenMM Setup is a useful tool for exactly this. It is built on the open-source OpenMM engine and provides an easy to install (via conda) GUI that can have you running a simulation in less than 5 minutes. Of course, running a simulation requires careful setting of parameters and being familiar with best practices and while this is beyond the scope of this post, there are many guides out there that can easily be found. Now on to the good stuff: using OpenMM Setup!

When you first run OpenMM Setup, you’ll be greeted by a browser window asking you to choose a structure to use. This can be a crystal structure or a model. Remember, sometimes these will have problems that need fixing like missing density or charged, non-physiological termini that would lead to artefacts, so visual inspection of the input is key! You can then choose the force field and water model you want to use, and tell OpenMM to do some cleaning up of the structure. Here I am running the simulation on hen egg-white lysozyme:

Continue reading →

How to prepare a molecule for RDKit

RDKit is very fussy when it comes to inputs in SDF format. Using the SDMolSupplier, we get a significant rate of failure even on curated datasets such as the PDBBind refined set. Pymol has no such scruples, and with that, I present a function which has proved invaluable to me over the course of my DPhil. For reasons I have never bothered to explore, using pymol to convert from sdf, into mol2 and back to sdf format again (adding in missing hydrogens along the way) will almost always make a molecule safe to import using RDKit:

from pathlib import Path
from pymol import cmd

def py_mollify(sdf, overwrite=False):
    """Use pymol to sanitise an SDF file for use in RDKit.

    Arguments:
        sdf: location of faulty sdf file
        overwrite: whether or not to overwrite the original sdf. If False,
            a new file will be written in the form <sdf_fname>_pymol.sdf
            
    Returns:
        Original sdf filename if overwrite == False, else the filename of the
        sanitised output.
    """
    sdf = Path(sdf).expanduser().resolve()
    mol2_fname = str(sdf).replace('.sdf', '_pymol.mol2')
    new_sdf_fname = sdf if overwrite else str(sdf).replace('.sdf', '_pymol.sdf')
    cmd.load(str(sdf))
    cmd.h_add('all')
    cmd.save(mol2_fname)
    cmd.reinitialize()
    cmd.load(mol2_fname)
    cmd.save(str(new_sdf_fname))
    return new_sdf_fname

Oxford Protein Informatics Group

or "OPIG" to friends

Monthly Archives: April 2022

Women in Computing: past, present and what we can do to improve the future.

OpenMM Setup: Start Simulating Proteins in 5 Minutes

How to prepare a molecule for RDKit