Writing “vector trajectories” with cpptraj

The program cpptraj, written by Daniel Roe (https://github.com/Amber-MD/cpptraj) and distributed Open Source with the AmberTools package (https://ambermd.org/AmberTools.php), is a powerful tool for analysis of molecular dynamics simulations. In addition to all of the expected analyses like Root Mean Square Deviation and native contacts, cpptraj also includes a suite of vector algebra functions.

While this vector algebra functionality is fairly well known and easy to find in the documentation, I think it is less well known that cpptraj can write trajectories of the computed vectors. These trajectories can then be loaded into Visual Molecular Dynamics (VMD) alongside the analysed trajectory and played as a movie. This functionality is a valuable tool for debugging your vector calculations to make sure they are doing precisely what you intend. It may also prove useful for generating visualizations of vectors alongside molecular structures for publications.

The cpptraj script below reads in an Amber parameter file and coordinate file and then calculates the angle between two planes.

parm 7bbg_fixed.prmtop
trajin 7bbg_fixed.rst7
vector v1 mask :65@NH1 :65@NH2
vector v2 mask :65@NH1 :65@NE
vector v3 mask :64@CA :66@CA
vector v4 mask :66@CA :68@CA
vectormath vec1 v1 vec2 v2 crossproduct name n1
vectormath vec1 v3 vec2 v4 crossproduct name n2
vectormath vec1 n1 vec2 n2 dotangle out 7bbg_ref_plane_angle.dat

The first plane is defined by two vectors in the plane of the guanidino group of a R65 residue (v1 and v2); the second plane is defined by two vectors between CA atoms of amino acids in the alpha helix containing R65 (v3 and v4). The first two vectormath calls determine the normal vectors to the planes and the final vectormath line computes the angle between the normal vectors. Taken together, these commands compute the angle between the arginine side chain and a plane passing through the CA atoms of the alpha helix. Let’s check that the vectors {v1, v2, v3, v4} are being computed correctly.

parm 7bbg_fixed.prmtop
trajin 7bbg_fixed.rst7
vector v1 mask :65@NH1 :65@NH2
vector v2 mask :65@NH1 :65@NE
vector v3 mask :64@CA :66@CA
vector v4 mask :66@CA :68@CA
run
writedata vectors.mol2 v1 v2 v3 v4 vectraj trajfmt mol2

The resulting vector trajectory vectors.mol2 can be loaded directly into VMD without a topology. Note that in this case we only analyzed a single frame, but you can run this same procedure on DCD files, too. This is what I get when I load the vectors into VMD alongside the structure:

The vectors are shown as red/pink line segments. The right structure is identical to the left but with the alpha helix cartoon model removed. The blue spheres indicate the locations of the CA atoms used to define the plane of the helix.

I hope this vector trajectory functionality will be helpful to a few people who like to neurotically check their analyses like I do. You can download the example prmtop and rst7 files below. Note that you should rename them to remove the extra “.txt” file extension before attempting to use them for anything.

The information in this blog post is adapted from an Amber Archive post from Daniel Roe, dated 30-Oct-2018: http://archive.ambermd.org/201811/0058.html

Files for the example:

The Most ReLU-iable Activation Function?

The Rectified Linear Unit (ReLU) activation function was first used in 1975, but its use exploded when it was used by Nair & Hinton in their 2010 paper on Restricted Boltzmann Machines. ReLU and its derivative are fast to compute, and it has dominated deep neural networks for years. The main problem with the activation function is the so-called dead ReLU problem, where significant negative input to a neuron can cause its gradient to always be zero. To rectify this (har har), modified versions have proposed, including leaky ReLU, GeLU and SiLU, wherein the gradient for x < 0 is not always zero.

A 2020 paper by Naizat et al., which builds upon ideas set out in a 2014 Google Brain blog post seeks to explain why ReLU and its variants seem to be better in general for classification problems than sigmoidal functions such as tanh and sigmoid.

Continue reading

SnackGPT

One of the most treasured group meeting institutions in OPIG is snackage. Each week, one group member is asked to bring in treats for our sometimes lengthy 16:30 meeting. However, as would be the case for any group of 25 or so people, there are various dietary requirements and preferences that can make this snack-quisition a somewhat tricky process: from gluten allergies to a mild dislike of cucumber, they vary in importance, but nevertheless are all to be taken into account when the pre-meeting supermarket sweep is carried out.

So, the question on every researcher’s mind: can ChatGPT help me? Turns out, yes: I gave it a list of the group’s dietary requirements and preferences, and it gave me a handy little list of snacks I might be able to bring along…

When pushed further, it even provided me with an itemised list of the ingredients required. During this process it seemed to forget a couple of the allergies I’d mentioned earlier, but they were easy to spot; almost more worryingly, it suggested I get a beetroot and mint hummus (!) for the veggie platter:

I don’t know if I’ll actually be using many of these suggestions—judging by the chats I’ve had about the above list, I think bringing in a platter of veggies as the group meeting snack may get me physically removed from the premises—but ChatGPT has once again proven itself a handy tool for saving a few minutes of thinking!

SUMO wrestling with developability

When engineering antibodies into effective biotherapeutics, ideally, factors such as affinity, specificity, chemical stability and solubility should all be optimised. In practice, we know that it’s often not feasible to co-optimise all of these, and so compromises are made, but identifying these developability issues early on in the antibody drug discovery process could save costs and reduce attrition rates. For example, we could avoid choosing a candidate that expresses poorly, which would make it expensive to manufacture as a drug, or one with a high risk of aggregation that would drive unwanted immunogenicity.

On this theme, I was interested to read recently a paper by the Computational Chemistry & Biologics group at Merck (Evers et al., 2022 https://www.biorxiv.org/content/10.1101/2022.11.19.517175v1). They have developed a pipeline called SUMO (In Silico Sequence Assessment Using Multiple Optimization Parameters), that brings together publicly-available software for in silico developability assessment and creates an overall developability profile as a starting point for antibody or VHH optimisation.

Read more: SUMO wrestling with developability

For each sequence assessed, they report factors such as sequence liabilities (residues liable to chemical modifications that can alter properties such as binding affinity or aggregation propensity), surface hydrophobicity, sequence identity compared to most similar human germline and predicted immunogenicity (based on MHC-II binding). Also provided are an annotated sequence viewer and 3D visualisation of calculated properties. Profiles are annotated with a red-yellow-green colour-coding system to indicate which sequences have favourable properties.

Overall, this approach is a useful way to discriminate between candidates and steer away from those with major developability issues prior to the optimisation stage. Given that the thresholds for their colour-coding system are based on data from marketed therapeutic antibodies, and that the software used has primarily been designed for use on antibody datasets, I would be interested to see whether the particular descriptors chosen for SUMO translate well to VHHs, or whether there are other properties that are stronger indicators of nanobody developability.

Datamining Wikipedia and writing JS with ChatGTP just to swap the colours on university logos…

I am not sure the University of Oxford logo works in the gold from the University of Otago…

A few months back I moved from the Oxford BRC to OPIG, both within the university of Oxford, but like many in academia I have moved across a few universities. As this is my first post here I wanted to do something neat: a JS tool that swapped colours in university logos!
It was a rather laborious task requiring a lot of coding, but once I got it working, I ended up tripping up at the last metre. So for technical reasons, I have resorted to hosting it in my own blog (see post), but nevertheless the path towards it is worth discussing.

Continue reading

Creating a Personal Website

Personal websites are a great and increasingly important way to build your online presence. Along with professional social media pages, such as on LinkedIn and Twitter, a website can provide a boost to your career and/or job search.

This blog post is based on my recent experience creating a personal website, following guidelines from Lewis’ talk at the OPIG Retreat last year (thank you Lewis!). The method I used and will cover here, based on an HTML5 UP! template and GitHub pages, is free and fast.

Why have a personal website?

  • Improves your online presence and brand
  • Boost for your career, including by allowing potential future employers to find you
  • Share things you have accomplished or are interested in
Continue reading

Some ponderings on generalisability

Now that machine learning has managed to get its proverbial fingers into just about every pie, people have started to worry about the generalisability of methods used. There are a few reasons for these concerns, but a prominent one is that the pressure and publication biases that have led to reproducibility issues in the past are also very present in ML-based science.

The Center for Statistics and Machine Learning at Princeton University hosted a workshop last July highlighting the scale of this problem. Alongside this, they released a running list of papers highlighting reproducibility issues in ML-based science. Included on this list are 20 papers highlighting errors from 17 distinct fields, collectively affecting a whopping 329 papers.

Continue reading

Happy 10th Birthday, Blopig!

OPIG recently celebrated its 20th year; and on 10 January 2023 I gave a talk just a day before the 10th anniversary of BLOPIG’s first blog post. It’s worth reflecting on what’s stayed the same and what’s changed since then.

Continue reading

SAbBox in 2023: ImmuneBuilder and more!

For several years now, we have distributed the SAbDab database and SAbPred tools as a virtual machine, SAbBox, via Oxford University Innovation. This virtual machine allows a user to utilise the tools and database locally, allowing for high-throughput analysis and keeping confidential data within a local network. Initially distributed under a commercial licence, the platform proved popular and, in 2020, we introduced a free academic licence to enable our academic colleagues to use our tools and database locally.

Following requests from users, in 2021 we released a new version of the platform packaged as a Singularity container. This included all of the features of SAbBox, allowing Linux users to take advantage of the near bare-metal performance of Singularity when running SAbPred tools. Over the past year, we have made lots of improvements to both SAbBox platforms, and have more work planned for the coming year. I’ll briefly outline these developments below.

Continue reading