Monthly Archives: October 2020

3 Useful UNIX commands you might not know

nohup

The command nohup (stands for “No hang up”) allows your script to run even if you quit the terminal. It can be very useful, especially if your terminal has been opened through ssh and you have a dodgy connection. It can be used as follows:

nohup python my_script.py > log.out &

nohup will automatically append the output from your script to a file named nohup.out. By adding the > log.out part of the command you can save the output to a different file of your choice.

Continue reading

NeurIPS 2020: Chemistry / Biology papers

Another blog post, another look at accepted papers for a major ML conference. NeurIPS joins the other major machine learning conferences (and others) in moving virtual this year, running from 6th – 12th December 2020. In a continuation of past posts (ICML 2020, NeurIPS 2019), I will highlight several of potential interest to the chem-/bio-informatics communities

The list of accepted papers can be found here, with 1,903 papers accepted out of 9,467 submissions (20% acceptance rate).

In addition to the main conference, there are several workshops highly related to the type of research undertaken in OPIG: Machine Learning in Structural Biology and Machine Learning for Molecules.

The usual caveat: given the large number of papers, these were selected either by “accident” (i.e. I stumbled across them in one way or another) or through a basic search (e.g. Ctrl+f “molecule”). If you find any I have missed, please reach out and I will update accordingly.

Continue reading

Speaking about Sequence and Structure at a Summit

A couple of weeks ago I was lucky enough to be asked to speak at the 5th Computational Drug Discovery & Development for Biologics Summit. This was my first virtual conference – it was a shame I didn’t get to visit Boston, and presenting to my empty room was slightly bizarre, but it was great to hear what people have been working on, and there’s definitely something to be said for attending a conference in fluffy socks…

A, antibody structure. An antibody is made up of four chains: two light (orange) and two heavy (blue). Each chain is made up of a series of domains—the variable domains of the light and heavy chains together are known as the Fv region (shown on the right; PDB entry 12E8). The Fv features six loops known as complementarity determining regions or CDRs (shown in dark blue); these are mainly responsible for antigen binding. B, example sequences for the VH and VL, highlighting the CDR regions and the genetic composition. It is estimated that the human antibody repertoire contains up to 1013 unique sequences, enabling the immune system to respond to almost any antigen. This is possible through the recombination of V, D and J gene segments, junctional diversification, and somatic hypermutation.
Continue reading

Improving your Python code quality using git pre-commit hooks

Intro

I recently completed an internship during which I spent a considerable amount of time doing software engineering. One of my main take-aways from this experience was that in industry, a lot more attention is spent on ensuring that code committed to a GitHub repo is clean and bug-free.

This is achieved through several means like code review (get other people to read your code), test-driven development (make sure your code works as you are adding functionality) or paired development (have two people work together on the same piece of code). Here, I will instead focus on a useful tool that is easy to integrate into your existing git workflow: Pre-commit hooks.

Continue reading

The right tool for the job – The Joy of Excel

Excel’s pervasiveness has resulted in it being used (correctly or incorrectly) in just about every area of science.

Unfortunately, Excel has some traps for the new player and unless you’ve fallen for them before, they are not entirely obvious. They stem from the fact that Excel will try to help the user by reformatting data into what it thinks you mean.

Continue reading

Constrained docking for bump and hole methodology

Selectivity is an important trait to consider when designing small molecule probes for chemical biology. If you wish to use a small molecule to study a particular protein, but that small molecule is fairly promiscuous in its binding habits, there are risks that any effects you observe may be due to it binding other proteins with similarly shaped binding pockets, instead of your protein of interest.

Continue reading