Monthly Archives: March 2024

Under-rated or overlooked, these libraries might be helpful.

Discovering a library that massively simplifies the exact thing you just did right after you’ve finished doing the thing you needed to do has to be one of the top 14 worst things about writing code. You might think it’s a part of the life we’ve all chosen, but it doesn’t have to be. Beyond the popular libraries you already know lies a treasure trove of under appreciated packages waiting to be wielded. Being the saint I am, I’ve scoured the depths of pypi.org to find some underrated and hopefully useful packages to make your life a little easier.

Continue reading →

Pitfalls of using Pearson’s correlation for comparing model performance

Pearson’s R (correlation coefficient) is a measure of the linear correlation between two variables, giving a value between -1 and 1, where 1 is total positive linear correlation, 0 is no linear correlation, and -1 is total negative linear correlation. While it’s a useful statistic for understanding the relationship between two variables, it is often used to compare the performance of two or more models. For example, imagine we had experimental values that we are predicting and several models’ predictions. Obviously, we would prefer the model with the highest Pearson’s R … or perhaps not?

Continue reading →

Open Source PyMOL installation on Windows

A year ago, I used Gheorghe Rotaru’s helpful blog post to install PyMOL. Unfortunately, after resetting my computer, I have just discovered that some of the links are broken. Here are the installation steps with new links provided by Christoph Gohlke, who generously offers pre-compiled Windows versions of the latest PyMOL software along with all its requirements.

Install the latest version of Python 3 for Windows:
Download the Windows Installer (x-bit) for Python 3 from their website, with x being your Windows architecture – 32 or 64.

Follow the instructions provided on how to install Python. You can confirm the installation by running ‘py’ in PowerShell.

Continue reading →

An Open-Source CUDA for AMD GPUs – ZLUDA

Lots of work has been put into making AMD designed GPUs to work nicely with GPU accelerated frameworks like PyTorch. Despite this, getting performant code on non-NVIDIA graphics cards can be challenging for both users and developers. Even in the case where the developer has appropriately optimised for each platform there are often gaps in performance where, at the driver-level, instructions to the GPU may not be optimised fully. This is because software developed using CUDA can benefit from optimisations like operation-fusing without having to specify in many cases.

This may not be much of a concern for most researchers as we simply use what is available to us. Most of the time this is usually NVIDIA GPUs and there is hardly a choice to it. NVIDIA is aware of this and prices their products accordingly. Part of the problem is that system designers just dont have an incentive to build AMD platfroms other than for highly specialised machines.

Continue reading →

Optimising for PR AUC vs ROC AUC – an intuitive understanding

When training a machine learning (ML) model, our main aim is usually to get the ‘best’ model out the other end in an unbiased manner. Of course, there are other considerations such as quick training and inference, but mostly we want to be good at predicting the right answer.

A number of factors will affect the quality of our final model, including the chosen architecture, optimiser, and – importantly – the metric we are optimising for. So, how should we pick this metric?

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Monthly Archives: March 2024

Under-rated or overlooked, these libraries might be helpful.

Pitfalls of using Pearson’s correlation for comparing model performance

Open Source PyMOL installation on Windows

An Open-Source CUDA for AMD GPUs – ZLUDA

Optimising for PR AUC vs ROC AUC – an intuitive understanding