Today is the day for another (potentially penultimate) blog post from me. Using this opportunity, I would like to introduce to you our recent update to the Observed Antibody Space (OAS) resource.
Continue readingNo labels, no problem! A quick introduction to Gaussian Mixture Models
Statistical Modelling Big Data AnalyticsTM is in vogue at the moment, and there’s nothing quite so fashionable as the neural network. Capable of capturing complex non-linear relationships and scalable for high-dimensional datasets, they’re here to stay.
For your garden-variety neural network, you need two things: a set of features, X, and a label, Y. But what do you do if labelling is prohibitively expensive or your expert labeller goes on holiday for 2 months and all you have in the meantime is a set of features? Happily, we can still learn something about the labels, even if we might not know what they are!
Continue readingK-Means clustering made simple
The 21st century is often referred to as the age of “Big Data” due to the unprecedented increase in the volumes of data being generated. As most of this data comes without labels, making sense of it is a non-trivial task. To gain insight from unlabelled data, unsupervised machine learning algorithms have been developed and continue to be refined. These algorithms determine underlying relationships within the data by grouping data points into cluster families. The resulting clusters not only highlight associations within the data, but they are also critical for creating predictive models for new data.
Continue readingReal Space Correlation Coefficient
Introduction
In crystalography we are often faced with the question of how well a part of our model fits the data. Now crystalography has well developed probability models for the reflection amplitudes given then entire fitted model, but these do not provide a metric for “how much of the ligand is inside the blob”. This is because the reflection based models are inherently global.
Continue readingICML 2020: Chemistry / Biology papers
ICML is one of the largest machine learning conferences and, like many other conferences this year, is running virtually from 12th – 18th July.
The list of accepted papers can be found here, with 1,088 papers accepted out of 4,990 submissions (22% acceptance rate). Similar to my post on NeurIPS 2019 papers, I will highlight several of potential interest to the chem-/bio-informatics communities. As before, given the large number of papers, these were selected either by “accident” (i.e. I stumbled across them in one way or another) or through a basic search (e.g. Ctrl+f “molecule”).
Continue readingUploading/downloading small files across systems
Sometimes you just want to quickly move a copy of a script, image or binary from, for example, your local (linux) machine to another (linux) machine. The usual tool would be SCP, but this can get complicated when there are several layers of ssh and sometimes it doesn’t work at all (as is the case for transfers between the Department of Statistics computers and the outside world).
Continue readingProCare: cavity similarity searching and its applications to fragment-based drug design
ProCare [1] is a package developed at the University of Strasbourg which is able to align and score the similarity of protein cavities. The aim is to find ligand binding sites between different proteins that are similar enough to bind the same ligand. The method used in ProCare is designed to look particularly at fragment (~⅓ size of a druglike ligand) binding sites. The aim is to predict potential fragment hits by comparing the cavities of the targets.
Continue readingJournal Club: the Dynamics of Affinity Maturation
Last week at our group meeting I presented on a paper titled “T-cell Receptor Variable beta Domains Rigidify During Affinity Maturation” by Monica L. Fernández-Quintero, Clarissa A. Seidler and Klaus R. Liedl. The authors use metadynamics simulations of the same T-cell Receptor (TCR) at different stages of affinity maturation to study the conformational landscape of the complementarity-determining regions (CDRs), and how this might relate to an increase in affinity. Not only do they conclude that affinity maturation leads to rigidification of CDRs in solution, but they also present some evidence for the conformational selection model of biomolecular binding events in TCR-antigen interactions.
Continue readingWhere do OPIGlets come from?
Now you might think the answer to this question is OSOWs, but in fact they come from a wide variety of Undergraduate degrees!
Continue readingEEGor on Proteins: A Brain-based Perspective on Crowd-sourced Protein Structure Prediction
EEG-based Brain-Computer Interfaces (BCIs) are becoming increasingly popular, with products such as the Muse Headband and g-tec’s Unicorn Hybrid Black taking off, while in the protein folding space, Fold It and distributed/crowd computing efforts like Fold@home, don’t seem to be talked about as much as they once were.
Game-ification is still just as effective a tool to harness human ingenuity as it once was, so perhaps what is needed is a new approach to crowd-folding efforts that can tap into the full potential of the human mind to manipulate and visualise new 3D structures, by drawing inspiration directly from the minds of users…