Category Archives: Hints and Tips

Making pwd redundant

I’m going to keep this one brief, because I am mid-confirmation-and-paper-writing madness. I have seen too many people – both beginners and seasoned veterans – wandering around their Linux filesystem blindfolded:

Isn’t it hideous?

Whenever you want to see where you are, you have to execute pwd (present working directory), which will print your absolute location to stdout. If you have many terminals open at the same time, it is easy to lose track of where you are, and every other command becomes pwd; surely, I hear you cry, there has to be a better way!

Well, fear not! With a little tinkering with ~/.bashrc, we can display the working directory as part of the special PS1 environment variable, responsible for how your username and computer are displayed above. Putting the following at the top of ~/.bashrc

me=`id | awk -F\( '{print $2}' | awk -F\) '{print $1}'`
export PS1="`uname -n |  /bin/sed 's/\..*//'`{$me}:\$PWD$ "

… saving, and starting a new termanal window results in:

Much better!

I haven’t used pwd in 3 years.

Non-linear Dependence? Mutual Information to the Rescue!

We are all familiar with the idea of a correlation. In the broadest sense of the word, a correlation can refer to any kind of dependence between two variables. There are three widely used tests for correlation:

Spearman’s r: Used to measure a linear relationship between two variables. Requires linear dependence and each marginal distribution to be normal.
Pearson’s ρ: Used to measure rank correlations. Requires the dependence structure to be described by a monotonic relationship
Kendall’s 𝛕: Used to measure ordinal association between variables.

While these three measures give us plenty of options to work with, they do not work in all cases. Take for example the following variables, Y1 and Y2. These might be two variables that vary in a concerted manner.

Perhaps we suspect that a state change in Y1 leads to a state change in Y2 or vice versa and we want to measure the association between these variables. Using the three measures of correlation, we get the following results:

Continue reading →

How to Blopig

Blopig has a wealth of knowledge, with everything from a Bayesian answer to the question “should ketchup be stored in the fridge?“* to the Nobel-Prize-Winner-approved analysis of AlphaFold2. Blopig runs on WordPress and uses blocks, components for adding different types of content to a post. These are blocks like paragraphs, headers, images, image galleries, and videos. Here are some hints and tips for getting the most out of WordPress.

One of the first blocks worth mentioning is the “Read More…” block…

Continue reading →

Python’s Data Classes

When writing code, you have inevitably needed to store data throughout your pipeline. In these cases you store your value, list or data frame as a variable to easily use it elsewhere in your code. However, sometimes your data has an awkward form, consisting of a number of different length lists or data of different types and sizes. While it is still doable to work with, and using tuples or dictionaries can help, accessing different elements in your data quickly becomes messy and it is less intuitive what your code is actually doing.

To solve the above stated problem, data classes were introduced as a new feature in Python 3.7. A data class is a regular Python class, but with certain methods already implemented for you. This makes them easy to create and removes a lot of boilerplate (repeated code) making them simpler, more intuitive and pretty. Further, as data classes are part of the standard library, you can directly import it without needing to install any external dependencies (noice).

With the sales pitch out of the way, let us look at how we can use data classes.

from dataclasses import dataclass
from typing import Any

@dataclass
class Antibody:
    vgene: str
    jgene: None
    sequence: Any = 'EVQ'

Continue reading →

Tracking Changes in LaTeX

Tracking changes in Microsoft Word is easy – we just click the ‘Track Changes’ button. It requires a little more work in LaTeX but here is a quick guide to doing it as painlessly as possible!

First, we want our original document to be stored in a *.tex file in an Overleaf project (as shown below)

Continue reading →

Solving WORDLE with grep

People seem to have become obsessed with wordle, just like they became obsessed with sudoku. After my initial burst of “oh a new game!” had waned, I was left thinking “my time is precious and this is exactly what we have computers for”. With this in mind, below is my quick and dirty way of solving these. I’m sure the regexp gurus amongst you will have a more elegant solution.

Step 1: Make sure you’ve got /usr/share/dict/words installed. This is just a huge list of words in a specific language and for me, this required installing the British words list.

sudo apt-get install wbritish

Step 2: Go to wordle

Step 3: Pick a random 5-letter word as your starting point. This is where grep and /usr/share/dict/words comes in:

Continue reading →

Simplify your life with SLURM and sync

For my first blog post of the year, we’re talking about SLURM, everyone’s favorite job manager. If like me, you have the joy of running a literal boat-load of jobs with all kinds of parameters and command-line arguments you’ll know there are a few tips and tricks that make the process of managing these tasks and results as painless as possible. Now, I do expect most people reading this will already be aware of these tricks but for those who don’t, I hope this is helpful. After all, it’s impossible to know what you don’t know you need to know, you know? Any alternatives, improvements, or suggestions are welcome!

Array Jobs

Job arrays are perfect for the times you want to run the same job several times with slight differences each time. Imagine you need to repeat a job 10 times with slightly different arguments with each run. Rather than submit 10 (slightly different) batch scripts you can submit 1 script with all the information needed to complete all 10 jobs.

Continue reading →

snakeMAKE better workflows with your code

When developing your pipeline for processing, annotating and/or analyzing data, you will probably find yourself needing to continuously re-run it, as you play around with your code. This can become a problem when working with long pipelines, large datasets and cpu’s begging you not to run some pieces of code again.

Luckily, you are not the first one to have been annoyed by this and other related struggles. Some people were actually so annoyed that they created Snakemake. Snakemake can be used to create workflows and help solve problems, such as the one mentioned above. This is done using a Snakefile, which helps you split your pipeline into “rules”. To illustrate how this helps you create a better workflow, we will be looking at the example below.

Continue reading →

Packaging with Conda

If you are as happy for the big snake as I am, you have probably wondered how you can create a Conda package with your amazing code. Fear not, in the following text you will learn how to make others go;

conda install -c coolperson amazingcode

Roughly, the only thing needed to create a Conda package, is a ‘meta.yaml’ file specific for your code. This file contains all the metadata needed to create your package and is highly customizable. While this means the meta.yaml can be written to allow your Conda package to work on any operating system and with any dependencies (doesn’t have to be python) it can be annoying to write from scratch (here is a guide for manually writing this file). Since we just want to create a simple Conda package, we will in this guide avoid fiddling around with the meta.yaml file and instead create the file based on a PyPI package. This will also give you a nice template, if you later need to adapt your meta.yaml file.
Note: Conda packages can also be made from GitHub repositories, which is likely favorable in most cases, but it also requires some manual work on the meta.yaml.

1. Create a PyPI package of your code

Continue reading →

Five Nuggets of Wisdom for Chairing at a Conference

I recently spoke at the Festival of Biologics 2021 conference in Basel (in-person, just in time!), and was lucky enough to be offered the chance to chair a session of talks. As this was the first time I’d ever been asked to do this, I asked Charlotte for some hints to make things go more smoothly. I found her advice very useful, so I thought I’d share it here for other first-time “chairers”!

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Category Archives: Hints and Tips

Making pwd redundant

Non-linear Dependence? Mutual Information to the Rescue!

How to Blopig

Python’s Data Classes

Tracking Changes in LaTeX

Solving WORDLE with grep

Simplify your life with SLURM and sync

Array Jobs

snakeMAKE better workflows with your code

Packaging with Conda

1. Create a PyPI package of your code

Five Nuggets of Wisdom for Chairing at a Conference