Category Archives: Deep Learning

Lucubration or Gaslighting?​

Or: The best lies have a nugget of truth in them.​

Lucubration – The action or occupation of intensive study originally by candle or lamplight.

Gaslighting – Psychological abuse in which a person or group causes someone to question their own sanity, memories, or perception.

I was recently having a play with Google Bard. Bard, unlike ChatGPT has access to live data. It also undergoes live feedback and quality control. I was hoping to see if it would find me any journals with articles on prion research which I’d previously overlooked.

Me: Please show me some recent articles about prion research.
(Because always be polite to our AI overlords, they’ll remember!)

Continue reading

What can you do with the OPIG Immunoinformatics Suite? v3.0

OPIG’s growing immunoinformatics team continues to develop and openly distribute a wide variety of databases and software packages for antibody/nanobody/T-cell receptor analysis. Below is a summary of all the latest updates (follows on from v1.0 and v2.0).

Continue reading

The State of Computational Protein Design

Last month, I had the privilege to attend the Keystone Symposium on Computational Design and Modeling of Biomolecules in beautiful Banff, Canada. This conference gave an incredible insight into the current state of the protein design field, as we are on the precipice of advances catalyzed by deep learning.

Here are my key takeaways from the conference:

Continue reading

Can AlphaFold predict protein-protein interfaces?

Since its release, AlphaFold has been the buzz of the computational biology community. It seems that every group in the protein science field is trying to apply the model in their respective areas of research. Already we are seeing numerous papers attempting to adapt the model to specific niche domains across a broad range of life sciences. In this blog post I summarise a recent paper’s use of the technology for predicting protein-protein interfaces.

Continue reading

The Most ReLU-iable Activation Function?

The Rectified Linear Unit (ReLU) activation function was first used in 1975, but its use exploded when it was used by Nair & Hinton in their 2010 paper on Restricted Boltzmann Machines. ReLU and its derivative are fast to compute, and it has dominated deep neural networks for years. The main problem with the activation function is the so-called dead ReLU problem, where significant negative input to a neuron can cause its gradient to always be zero. To rectify this (har har), modified versions have proposed, including leaky ReLU, GeLU and SiLU, wherein the gradient for x < 0 is not always zero.

A 2020 paper by Naizat et al., which builds upon ideas set out in a 2014 Google Brain blog post seeks to explain why ReLU and its variants seem to be better in general for classification problems than sigmoidal functions such as tanh and sigmoid.

Continue reading

Universal graph pooling for GNNs

Graph neural networks (GNNs) have quickly become one of the most important tools in computational chemistry and molecular machine learning. GNNs are a type of deep learning architecture designed for the adaptive extraction of vectorial features directly from graph-shaped input data, such as low-level molecular graphs. The feature-extraction mechanism of most modern GNNs can be decomposed into two phases:

  • Message-passing: In this phase the node feature vectors of the graph are iteratively updated following a trainable local neighbourhood-aggregation scheme often referred to as message-passing. Each iteration delivers a set of updated node feature vectors which is then imagined to form a new “layer” on top of all the previous sets of node feature vectors.
  • Global graph pooling: After a sufficient number of layers has been computed, the updated node feature vectors are used to generate a single vectorial representation of the entire graph. This step is known as global graph readout or global graph pooling. Usually only the top layer (i.e. the final set of updated node feature vectors) is used for global graph pooling, but variations of this are possible that involve all computed graph layers and even the set of initial node feature vectors. Commonly employed global graph pooling strategies include taking the sum or the average of the node features in the top graph layer.

While a lot of research attention has been focused on designing novel and more powerful message-passing schemes for GNNs, the global graph pooling step has often been treated with relative neglect. As mentioned in my previous post on the issues of GNNs, I believe this to be problematic. Naive global pooling methods (such as simply summing up all final node feature vectors) can potentially form dangerous information bottlenecks within the neural graph learning pipeline. In the worst case, such information bottlenecks pose the risk of largely cancelling out the information signal delivered by the message-passing step, no matter how sophisticated the message-passing scheme.

Continue reading

5th Artificial Intelligence in Chemistry Symposium

The lineup for the Royal Society of Chemistry’s 5th “Artificial Intelligence in Chemistry” Symposium (Thursday-Friday, 1st-2nd September 2022) is now complete for both oral and poster presentations. It really is a fantastic selection of topics and speakers and it is clear this event is now a highlight of the scientific calendar. Our very own Prof. Charlotte M. Deane, MBE will be giving a keynote.

5th RSC-BMCS/RSC-CICAG Airtificial Intelligence in Chemistry Symposium, 1st-2nd September, Churchill College, Cambridge + Zoom broadcast.

It marks a return to in-person meetings: it will be held at Churchill College, Cambridge, with a conference dinner at Trinity Hall.

More details are here: https://www.rscbmcs.org/events/aichem22/.

Registration for in person attendance is open until Monday 29th August 17:00 (BST).

It is also possible to register for virtual attendance; the meeting will be broadcast on Zoom.

Cool ideas in Deep Learning and where to find more about them

I was planning on doing a blog post about some cool random deep learning paper that I have read in the last year or so. However, I keep finding that someone else has already written a way better blog post than what I could write. Instead I have decided to write a very brief summary of some hot ideas and then provide a link to some other page where someone describes it way better than me.

The Lottery Ticket Hypothesis

This idea has to do with pruning a model, which is when you remove a parts of your model to make it more computationally efficient while barely loosing accuracy. The lottery ticket hypothesis also has to do with how weight are initialized in neural networks and why larger models often achieve better performance.

Anyways, the hypothesis says the following: “Dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that—when trained in isolation—reach test accuracy comparable to the original network in a similar number of iterations.” In their analogy, the random initialization of a models weights is treated like a lottery, where some combination of a subset of these weight is already pretty close to the network you want to train (winning ticket). For a better description and a summary of advances in this field I would recommend this blog post.

SAM: Sharpness aware minimization

The key idea here has to do with finding the best optimizer to train a model capable of generalization. According to this paper, a model that has converged to a sharp minima will be less likely to generalize than one that has converged to a flatter minima. They show the following plot to provide an intuition of why this may be the case.

In the SAM paper (and ASAM for adaptive) the authors implement an optimizer that is more likely to converge to a flat minima. I found this blog post by the authors of ASAM gives a very good description of the field.

Continue reading

Exploring topological fingerprints in RDKit

Finding a way to express the similarity of irregular and discrete molecular graphs to enable quantitative algorithmic reasoning in chemical space is a fundamental problem in data-driven small molecule drug discovery.

Virtually all algorithms that are widely and successfully used in this setting boil down to extracting and comparing (multi-)sets of subgraphs, differing only in the space of substructures they consider and the extent to which they are able to adapt to specific downstream applications.

A large body of recent work has explored approaches centred around graph neural networks (GNNs), which can often maximise both of these considerations. However, the subgraph-derived embeddings learned by these algorithms may not always perform well beyond the specific datasets they are trained on and for many generic or resource-constrained applications more traditional “non-parametric” topological fingerprints may still be a viable and often preferable choice .

This blog post gives an overview of the topological fingerprint algorithms implemented in RDKit. In general, they count the occurrences of a certain family of subgraphs in a given molecule and then represent this set/multiset as a bit/count vector, which can be compared to other fingerprints with the Jaccard/Dice similarity metric or further processed by other algorithms.

Continue reading

Entering a Stable Relationship with your Neural Network

Over the past year, I have been working on building a graph-based paratope (antibody binding site) prediction tool – Paragraph. Fortunately, I have had moderate success with this and you can now check out the preprint of this work here.

However, for a long time, I struggled with a highly unstable network, where different random seeds yielded very different results. I believe this instability was largely due to the high class imbalance in my data – only ~10% of all residues in the Fv (variable region of the antibody) belong to the paratope.

I tried many different things in an attempt to stabilise my training, most of which failed. I will share all of these ideas with you though – successful or not – as what works for one person/network is never guaranteed to work for another. I hope that the below may provide some ideas to try out for others facing similar issues. Where possible, I also provide some example hyperparameter values that could act as sensible starting points.

Continue reading