Monthly Archives: February 2020

Molecular dynamics analysis in MDAnalysis

Any opportunity to use rigorously tested and supported analysis tools rather than in-house code is, in my opinion, an opportunity you owe it to yourself to explore.

My preferred tool for analyzing the output of molecular dynamics (MD) simulations is MDAnalysis, a Python library that provides robust and easy-to-use tools for analyzing most common files output by MD packages (including PDB, DCD, COR, and XTC file formats). But, of course, MDAnalysis can analyze any PDB file, not just one output from an MD simulations. There may be an opportunity in your workflow to incorporate MDAnalysis to save time or to provide more robust error handling than whatever in-house code you currently use.

Continue reading

State of the art in AI for drug discovery: more wet-lab please

The reception of ML approaches for the drug discovery pipeline, especially when focused on the hit to lead optimization process, has been rather skeptical by the medchem community. One of the main drivers for that is the way many ML publications benchmark their models: Historic datasets are split into two parts, with the larger part used to train and the smaller to test ML models. In order to standardize that validation process, computational chemists have constructed widely used benchmark datasets such as the DUD-E set, which is commonly used as a standard for protein-ligand binding classification tasks. Common criticism from medicinal chemists centers on the main problem associated with benchmark datasets: the absence of direct lab validation.

Continue reading

Using SLURM a little bit more efficiently

Your research group slurmified their servers? You basically have two options now.

Either you install all your necessary things on one of the slurm nodes within an interactive session, e.g.:

srun -p funkyserver-debug --pty --nodes=1 --ntasks-per-node=1 -t 00:10:00 --wait=0 /bin/bash

and always specify this node by adding the ‘#SBATCH –nodelist=funkyserver.cpu.do.work’ line to your sbatch scripts or you set up some template scripts that will help you to install all your requirements on multiple nodes so you can enjoy the benefits of the slurm system.

Here is how I did it; comments and suggestions welcome!

Step 1: Create an sbatch template file (e.g. sbatch_job_on_server.template_sh) on the submission node that does what you want. In the ‘#SBATCH –partition’ or ‘–nodelist’ lines use a placeholder, e.g. ‘<server>’, instead of funkyserver. 

For example, for installing the same conda environment on all nodes that you want to work on:

Continue reading

Robust networks to study omics data

One of the challenges that biology-related sciences are facing is the exponential increase of data. Nowadays, thanks to all the sequencing techniques which are available, we are generating more data than the amount we can study. We all love all the genomic, epigenomic, transcriptomic, proteomic, … , glycomic, lipidomic, and metagenomic studies because of the rich they are. However, most of the times, the analysis of the results uses only a fraction of all the generated data. For example, it is quite frequent to study the transcriptome of an organism in different environments and then just focus on identifying which 2 or 3 genes are upregulated. This type of analyses do not exploit the data to its maximum extent and here is where network analysis makes its appearance!

Continue reading

Cooking Up a (Deep)STORM with a Little Cup of Super Resolution Microscopy

Recently, I attended the Quantitative BioImaging (QBI) Conference 2020, served right here in Oxford. Amongst the many methods on the menu were new recipes for spicing up your Cryo-EM images with a bit of CiNNamon with a peppering of Poisson point processes in the inhomogeneous spatial case amongst many others. However, like many of today’s top tier restaurants most of the courses on offer were on the smaller side, nano-scale in fact, serving up the new field of Super Resolution Microscopy!

Continue reading

Finding The Gene Responsible for Huntington’s Disease – The Story of Nancy Wexler.

Huntington’s Disease – an inherited disorder, which will result in the lack of movement and speech, dementia and ultimately death. Earliest symptoms include lack of coordination and unsteady gait; physical abilities worse until the complete physiological breakdown of the patient’s body. Meanwhile, the mental abilities worsen as well into dementia. Overall, Huntington’s disease results in the death of brain cells.

Continue reading

Effect of Debiasing Protein-Ligand binding data on Generalization

Virtual screening is a computational technique used in drug discovery to search libraries of small molecules in order to identify those structures that bind tightly and specifically to a given protein target. Many machine learning (ML) models have been proposed for virtual screening, however, it is not clear whether these models can truly predict the molecular properties accurately across chemical space or simply overfit the training data. As chemical space contains clusters of molecules around scaffolds, memorising the properties of a few scaffolds can be sufficient to perform well, masking the fact that the model may not generalise beyond close analogue. Different debiasing algorithms have been introduced to address this problem. These algorithms systematically partition the data to reduce bias and provide a more accurate metric of the model performance.

Continue reading