Finding the lowest energy conformation of given molecule!

Generating low-energy molecular conformers is important for many areas of computational chemistry, molecular modeling and cheminformatics. Many tools have been developed to generate conformers, including BALLOON (1), Confab (2), FROG2 (3),  MOE (4), OMEGA (5) and RDKit (6). The search algorithm implemented in these tools can be broadly classified as either systematic or stochastic. These algorithms primarily focus on generating geometrically diverse low-energy conformers. Here, we are interested in finding lowest energy conformation of a molecule instead of achieving geometric diversity and Bayesian optimization is used to find the lowest energy conformation (7). Continue reading

Check My Blob

A brief overview and discussion of: Automatic recognition of ligands in electron density by machine learning .This paper aims to reduce the bias of crystallographers fitting ligands into electron density for protein ligand complexes. The authors train a supervised machine learning model using known ligand sites across the whole protein databank, to produce a classifier that can identify which common ligands could fit to that electron density.

Continue reading

OPIG Putts Up

Tonight, post-OPIG Group Meeting, most of us visited the local crazy golf course “Junkyard Golf” for some serious fun. Three groups of us teed off at different times, negotiating dimly lit Heath-Robinson/Rube Goldberg-style courses leading into bathtubs, past bears and through volcanoes. We’re not competitive at all (Serenity & Crunch) so it was a great surprise to learn at the end of our games that CW had won…

Post-putting OPIGlets

Picture 1 of 5

What can you do with the OPIG Antibody Suite?

OPIG has now developed a whole range of tools for antibody analysis. I thought it might be helpful to summarise all the different tools we are maintaining (some of which are brand new, and some are not hosted at opig.stats), and what they are useful for.

Immunoglobulin Gene Sequencing (Ig-Seq/NGS) Data Analysis

1. OAS
Required Input: N/A (Database)

OAS (Observed Antibody Space) is a quality-filtered, consistently-annotated database of all of the publicly available next generation sequencing (NGS) data of antibodies. Here you can:

Continue reading

docopt for dummies

Parsing command line arguments is an annoying piece of boilerplate we all have to do. Documenting our code is either an absolutely essential part of software engineering, or a frivolous waste of research time, depending on who you ask. But what if I told you that we can combine the two? That you can handle your argument parsing simply by documenting how your code works? Well, the dream is now reality. Continue reading

How to get seasick without leaving your desk

It’s always easy to get caught up in computational work, so let me describe a quick experiment you can do to relieve the boredom. Sit down in your spinny chair and cross your legs. (You do have a spinny chair, right? If not, get one – they prevent repetitive strain injuries, and more importantly, they are great fun. This experiment works best if the chair has no arms.) Start spinning that spinny chair until you feel like you’ve achieved a stable rotation. Then – smartly – stop the rotation and sit up straight. Continue reading

Preparing a five minute conference talk: an honest account

On 26 September I had the opportunity to give a short talk at the COSTNET18 conference in Warsaw. I’d never done anything like it before, which made it both exciting and a tiny bit terrifying. I thought I’d share how I prepared for it, in the hope that other conference newbies might find some of it useful, or at least funny.

20 July

I register for the conference, and apply to give a talk. I use a version of my paper draft abstract, to which I add a couple of introductory sentences. I submit successfully, but at the end of the day accidentally delete this version of the abstract from my computer. I guess if I need it, I just need to wait until the conference programme becomes available. #fail Continue reading

So, you are interested in compound selectivity and machine learning papers?

At the last OPIG meeting, I gave a talk about compound selectivity and machine learning approaching to predict whether a compound might be selective. As promised, I hereby provide a list publications I would hand to a beginner in the field of compound selectivity and machine learning.  Continue reading

Mol2vec: Finding Chemical Meaning in 300 Dimensions

Embeddings of Amino Acids

2D projections (t-SNE) of Mol2vec vectors of amino acids (bold arrows). These vectors were obtained by summing the vectors of the Morgan substructures (small arrows) present in the respective molecules (amino acids in the present example). The directions of the vectors provide a visual representation of similarities. Magnitudes reflect importance, i.e. more meaningful words. [Figure from Ref. 1]

Natural Language Processing (NLP) algorithms are usually used for analyzing human communication, often in the form of textual information such as scientific papers and Tweets. One aspect, coming up with a representation that clusters words with similar meanings, has been achieved very successfully with the word2vec approach. This involves training a shallow, two-layer artificial neural network on a very large body of words and sentences — the so-called corpus — to generate “embeddings” of the constituent words into a high-dimensional space. By computing the vector from “woman” to “queen”, and adding it to the position of “man” in this high-dimensional space, the answer, “king”, can be found.

A recent publication of one of my former InhibOx-colleagues, Simone Fulle, and her co-workers, Sabrina Jaeger and Samo Turk, shows how we can embed molecular substructures and chemical compounds into a similarly high-dimensional, continuous vectorial representation, which they dubbed “mol2vec“.1 They also released a Python implementation, available on Samo Turk’s GitHub repository.


Continue reading