Tag Archives: machine learning

Using JAX and Haiku to build a Graph Neural Network


JAX

Last year, I had an opportunity to delve into the world of JAX whilst working at InstaDeep. My first blopig post seems like an ideal time to share some of that knowledge. JAX is an experimental Python library created by Google’s DeepMind for applying accelerated differentiation. JAX can be used to differentiate functions written in NumPy or native Python, just-in-time compile and execute functions on GPUs and TPUs with XLA, and mini-batch repetitious functions with vectorization. Collectively, these qualities place JAX as an ideal candidate for accelerated deep learning research [1].

JAX is inspired by the NumPy API, making usage very familiar for any Python user who has already worked with NumPy [2]. However, unlike NumPy, JAX arrays are immutable; once they are assigned in memory they cannot be changed. As such, JAX includes specific syntax for index manipulation. In the code below, we create a JAX array and change the 1^{st} element to a 4:

Continue reading

Tracking the change in ML performance for popular small molecule benchmarks

The power of machine learning (ML) techniques has captivated the field of small molecule drug discovery. Increasingly, researchers and organisations have employed ML to create more accurate algorithms to improve the efficiency of the discovery process.

To be published, methods have to prove they have improved upon others. Often, methods are tested against the same benchmarks within a field, allowing us to track progress over time. To explore the rate of improvement, I curated the performance on three popular benchmarks. The first benchmark is CASF 2016, used to test the accuracy of methods that predict the binding affinity of experimental determined protein-ligand complexes. Accuracy was measured using the Pearson’s R value between predicted and experimental affinity values.

Continue reading

Optimising for PR AUC vs ROC AUC – an intuitive understanding

When training a machine learning (ML) model, our main aim is usually to get the ‘best’ model out the other end in an unbiased manner. Of course, there are other considerations such as quick training and inference, but mostly we want to be good at predicting the right answer.

A number of factors will affect the quality of our final model, including the chosen architecture, optimiser, and – importantly – the metric we are optimising for. So, how should we pick this metric?

Continue reading

Navigating the world of GNN layers with PyTorch Geometric

Data can often naturally be represented in a graph format and being able to directly employ a deep learning architecture on that data without finding a different representation is an appealing idea. Graph neural networks (GNNs) have become a standard part of the ML toolbox but navigating the world of different architectures available out-of-the-box can be a daunting task. A great place to start looking for architectures is with PyTorch Geometric, which provides an extensive list of readily available GNN layers and tutorials on how to use them in your standard PyTorch models. There are many things to consider when choosing a GNN layer, but the two considerations that I think are a great place to start are expressiveness and edge feature handling. In general, it is hard to predict what will work best for the task at hand and hence it’s optimal to try a wide range of different layers. This blogpost is meant as a brief introduction for what I would find useful to know before I started using GNNs, and a starting point for exploring the GNN literature.

Continue reading

AlphaGeometry: are computers taking over math?

Last week, Google DeepMind announced AlphaGeometry, a novel deep learning system that is able to solve geometry problems of the kind presented at the International Mathematics Olympiad (IMO). The work is described in a recent Nature paper, and is accompanied by a GitHub repo including full code and weights.

This paper has caused quite a stir in some circles. Well, at least the kind of circles that you tend to get in close contact with when you work at a Department of Statistics. Like folks in structural in biology wondered three years ago, those who earn a living by veering into the mathematical void and crafting proofs, were wondering if their jobs may also have a close-by expiration date. I found this quite interesting, so I decided to read the paper and try to understand it — and, to motivate myself, I set to present this paper at an upcoming journal club, and also write this blog post.

So, let’s ask, what has actually been achieved and how powerful is this model?

What has been achieved

The image that has been making the rounds this time is the following benchmark:

Continue reading

Taking Equivariance in deep learning for a spin?

I recently went to Sheh Zaidi‘s brilliant introduction to Equivariance and Spherical Harmonics and I thought it would be useful to cement my understanding of it with a practical example. In this blog post I’m going to start with serotonin in two coordinate frames, and build a small equivariant neural network that featurises it.

Continue reading

What the heck are TPUs?

I recently became curious about TPUs, a specialised hardware for training Machine- and Deep-Learning models, where TPU stands for Tensor Processing Unit. This fancy chip can provide very high gains for anyone aiming to perform really massive parallelisation of AI tasks such as training, fine-tuning, and inference.

In this blog post, I will touch on what a TPU is, why it could be useful for AI applications when compared to GPUs and briefly discuss associated opportunity costs.

What’s a TPU?

Continue reading

Understanding positional encoding in Transformers

Transformers are a very popular architecture in machine learning. While they were first introduced in natural language processing, they have been applied to many fields such as protein folding and design.
Transformers were first introduced in the excellent paper Attention is all you need by Vaswani et al. The paper describes the key elements, including multiheaded attention, and how they come together to create a sequence to sequence model for language translation. The key advance in Attention is all you need is the replacement of all recurrent layers with pure attention + fully connected blocks. Attention is very efficeint to compute and allows for fast comparisons over long distances within a sequence.
One issue, however, is that attention does not natively include a notion of position within a sequence. This means that all tokens could be scrambled and would produce the same result. To overcome this, one can explicitely add a positional encoding to each token. Ideally, such a positional encoding should reflect the relative distance between tokens when computing the query/key comparison such that closer tokens are attended to more than futher tokens. In Attention is all you need, Vaswani et al. propose the slightly mysterious sinusoidal positional encodings which are simply added to the token embeddings:

Continue reading

Current strategies to predict structures of multiple protein conformational states

Since the release of AlphaFold2 (AF2), the problem of protein structure prediction is widely believed to be solved. Current structure prediction tools, such as AF2, are able to model most proteins with high accuracy. These methods, however, have a major limitation as they have been trained to predict a single structure for a given protein. Proteins are highly dynamic molecules, and their function often depends on transitions between several conformational states. Despite research focusing on the task of predicting the structures of multiple conformations of a protein, currently, no accurate and reliable method is available. In this blog post, I will provide a short overview of the strategies developed for predicting protein conformations. I have grouped these into three sets of related approaches. To conclude, I will also demonstrate how to run one of these strategies on your own.

Continue reading

Machine learning strategies to overcome limited data availability

Machine learning (ML) for biological/biomedical applications is very challenging – in large part due to limitations in publicly available data (something we recently published about [1]). Substantial amounts of time and resources may be required to generate the types of data (eg protein structures, protein-protein binding affinity, microscopy images, gene expression values) required to train ML models, however.

In cases where there is sufficient data available to provide signal, but not enough for the desired performance, ML strategies can be employed:

Continue reading