Author Archives: Garrett

OPIG Retreat, 2019

For the third year running, the Oxford Protein Informatics Group of Professors Deane and Morris traveled to a bucolic, remote location for a series of talks (long and lightning), journal clubs, and hands-on practicals—not to mention evenings of quizzes, board games, and an afternoon of exploration of local attractions.

Kington, Herefordshire

Thanks to the organization of OPIG Members Mark Chonofsky and Javier Prado Diaz, five hire cars and one motorbike, some two dozen of us traveled from Oxford to the rolling hills and orchard country of Herefordshire, and Kington, near the border with Wales. We had the whole YHA Kington to ourselves from Wednesday until Friday, September 18-20, 2019. Our schedule was packed with great talks, and a few opportunities to press, watch people press, or tell people to press, <shift><enter>.

Continue reading

Prof. Charlotte Deane on the World Service

Prof. Charlotte Deane, the new Deputy Executive Chair of the EPSRC, Deputy Head of Division of MPLS, and Head of the Oxford Protein Informatics Group, was interviewed by BBC World Service’s programme “Tech Tent”, about the role of AI in drug discovery; jump to about 13:30 to hear Charlotte, and the segment on AI in healthcare starts at 9:45:

https://www.bbc.co.uk/sounds/play/w3csymsv

On the Virtues of the Command Line

Wind the clock back about 50 years, and you would have found the DSKY interface—with a display (DS) and keyboard (KY)—quite familiar. It was frontend to the guidance computer used on the Apollo missions, that ultimately allowed Neil Armstrong to utter that celebrated, “One small step for [a] man, one giant leap for mankind.” The device effectively used a command line.

Continue reading

OPIG Putts Up

Tonight, post-OPIG Group Meeting, most of us visited the local crazy golf course “Junkyard Golf” for some serious fun. Three groups of us teed off at different times, negotiating dimly lit Heath-Robinson/Rube Goldberg-style courses leading into bathtubs, past bears and through volcanoes. We’re not competitive at all (Serenity & Crunch) so it was a great surprise to learn at the end of our games that CW had won…

Post-putting OPIGlets

Image 1 of 5

Mol2vec: Finding Chemical Meaning in 300 Dimensions

Embeddings of Amino Acids

2D projections (t-SNE) of Mol2vec vectors of amino acids (bold arrows). These vectors were obtained by summing the vectors of the Morgan substructures (small arrows) present in the respective molecules (amino acids in the present example). The directions of the vectors provide a visual representation of similarities. Magnitudes reflect importance, i.e. more meaningful words. [Figure from Ref. 1]

Natural Language Processing (NLP) algorithms are usually used for analyzing human communication, often in the form of textual information such as scientific papers and Tweets. One aspect, coming up with a representation that clusters words with similar meanings, has been achieved very successfully with the word2vec approach. This involves training a shallow, two-layer artificial neural network on a very large body of words and sentences — the so-called corpus — to generate “embeddings” of the constituent words into a high-dimensional space. By computing the vector from “woman” to “queen”, and adding it to the position of “man” in this high-dimensional space, the answer, “king”, can be found.

A recent publication of one of my former InhibOx-colleagues, Simone Fulle, and her co-workers, Sabrina Jaeger and Samo Turk, shows how we can embed molecular substructures and chemical compounds into a similarly high-dimensional, continuous vectorial representation, which they dubbed “mol2vec“.1 They also released a Python implementation, available on Samo Turk’s GitHub repository.

 

Continue reading