Author Archives: Fergus Imrie

Fragment-to-Lead Successes in 2019

In this blogpost, I want to highlight the excellent work by Jahnke and collaborators. For the past 5 years, they have published an annual perspective covering fragment-to-lead success stories from the previous year. Very helpfully, their work includes a table detailing the hit fragment(s) and lead molecule, together with key experimental results and parameters.

Continue reading

NeurIPS 2020: Chemistry / Biology papers

Another blog post, another look at accepted papers for a major ML conference. NeurIPS joins the other major machine learning conferences (and others) in moving virtual this year, running from 6th – 12th December 2020. In a continuation of past posts (ICML 2020, NeurIPS 2019), I will highlight several of potential interest to the chem-/bio-informatics communities

The list of accepted papers can be found here, with 1,903 papers accepted out of 9,467 submissions (20% acceptance rate).

In addition to the main conference, there are several workshops highly related to the type of research undertaken in OPIG: Machine Learning in Structural Biology and Machine Learning for Molecules.

The usual caveat: given the large number of papers, these were selected either by “accident” (i.e. I stumbled across them in one way or another) or through a basic search (e.g. Ctrl+f “molecule”). If you find any I have missed, please reach out and I will update accordingly.

Continue reading

Learning from Biased Datasets

Both the beauty and the downfall of learning-based methods is that the data used for training will largely determine the quality of any model or system.

While there have been numerous algorithmic advances in recent years, the most successful applications of machine learning have been in areas where either (i) you can generate your own data in a fully understood environment (e.g. AlphaGo/AlphaZero), or (ii) data is so abundant that you’re essentially training on “everything” (e.g. GPT2/3, CNNs trained on ImageNet).

This covers only a narrow range of applications, with most data not falling into one of these two categories. Unfortunately, when this is true (and even sometimes when you are in one of those rare cases) your data is almost certainly biased – you just may or may not know it.

Continue reading

ICML 2020: Chemistry / Biology papers

ICML is one of the largest machine learning conferences and, like many other conferences this year, is running virtually from 12th – 18th July.

The list of accepted papers can be found here, with 1,088 papers accepted out of 4,990 submissions (22% acceptance rate). Similar to my post on NeurIPS 2019 papers, I will highlight several of potential interest to the chem-/bio-informatics communities. As before, given the large number of papers, these were selected either by “accident” (i.e. I stumbled across them in one way or another) or through a basic search (e.g. Ctrl+f “molecule”).

Continue reading

Journal Club: Is our data biased, and should it be?

Jia, X., Lynch, A., Huang, Y. et al. Anthropogenic biases in chemical reaction data hinder exploratory inorganic synthesis. Nature 573, 251–255 (2019) doi:10.1038/s41586-019-1540-5 https://www.nature.com/articles/s41586-019-1540-5

Last week I presented the above paper at group meeting. While a little different from a typical OPIG journal club paper, the data we have access to almost certainly suffers from the same range of (possible) biases explored in this paper.

Continue reading

NeurIPS 2019: Chemistry/Biology papers

NeurIPS is the largest machine learning conference (by number of participants), with over 8,000 in 2017. This year, the conference will be held in Vancouver, Canada from 8th-14th December.

Recently, the list of accepted papers was announced, with 1430 papers accepted. Here, I will highlight several of potential interest to the chem-/bio-informatics communities. Given the large number of papers, these were selected either by “accident” (i.e. I stumbled across them in one way or another) or through a basic search (e.g. Ctrl+f “molecule”).

Continue reading

Python Handout

Many OPIGlets extensively use Jupyter (in either Notebook or Lab flavour) to prototype and present their work. However, as project progress frequently notebooks are converted into regular python files for a number of reasons, losing the notebook functionality.

Wouldn’t it be nice if we could combine some of the benefits of Jupyter notebooks (not least the ability to present both code & results naturally) with regular python files?

Enter Python Handout.

Python Handout was recently (5th August 2019) released by Danijar Hafner and allows Python scripts to be converted into handouts with Markdown comments and inline figures (see above picture).

Installation is via pip (pip3 install -U handout) and Python Handout supports python 3 scripts.

While I’ve not used Handout much (yet), I will definitely be experimenting more in the coming weeks.

Graph-based Methods for Cheminformatics

In cheminformatics, there are many possible ways to encode chemical data represented by small molecules and proteins, such as SMILES, fingerprints, chemical descriptors etc. Recently, utilising graph-based methods for machine learning have become more prominent. In this post, we will explore why representing molecules as graphs is a natural and suitable encoding. Continue reading