Tag Archives: Pytorch

Einops: Powerful library for tensor operations in deep learning

Tobias and I recently gave a talk at the OPIG retreat on tips for using PyTorch. For this we created a tutorial on Google Colab notebook (link can be found here). I remember rambling about the advantages of implementing your own models against using other peoples code. Well If I convinced you, einops is for you!!

Basically, einops lets you perform operations on tensors using the Einstein Notation. This package comes with a number of advantages a few of which I will try and summarise here:

Continue reading

How to turn a SMILES string into a molecular graph for Pytorch Geometric

Despite some of their technical issues, graph neural networks (GNNs) are quickly being adopted as one of the state-of-the-art methods for molecular property prediction. The differentiable extraction of molecular features from low-level molecular graphs has become a viable (although not always superior) alternative to classical molecular representation techniques such as Morgan fingerprints and molecular descriptor vectors.

But molecular data usually comes in the sequential form of labeled SMILES strings. It is not obvious for beginners how to optimally transform a SMILES string into a structured molecular graph object that can be used as an input for a GNN. In this post, we show how to convert a SMILES string into a molecular graph object which can subsequently be used for graph-based machine learning. We do so within the framework of Pytorch Geometric which currently is one of the best and most commonly used Python-based GNN-libraries.

We divide our task into three high-level steps:

  1. We define a function that maps an RDKit atom object to a suitable atom feature vector.
  2. We define a function that maps an RDKit bond object to a suitable bond feature vector.
  3. We define a function that takes as its input a list of SMILES strings and associated labels and then uses the functions from 1.) and 2.) to create a list of labeled Pytorch Geometric graph objects as its output.
Continue reading