Category Archives: Group Meetings

What we discuss during cake at our Tuesday afternoon group meetings

Non-alcoholic fatty liver disease

In my new research project, I investigate Non-alcoholic fatty liver disease (NAFLD). This term describes a variety of conditions that are associated with fatty livers. While the early stages of this disease are not harmful, it can lead to cirrhosis (Cirrhosis is the scaring of liver tissue that prevents a liver to function properly). Ultimately, if a liver stops working it can be fatal unless treated, for example, with a liver transplant. NAFLD is the most common liver disease in developed countries and is expected to become the leading cause of liver transplant by 2020 [1].

The disease progresses in four stages: Continue reading →

Graph-based Methods for Cheminformatics

In cheminformatics, there are many possible ways to encode chemical data represented by small molecules and proteins, such as SMILES, fingerprints, chemical descriptors etc. Recently, utilising graph-based methods for machine learning have become more prominent. In this post, we will explore why representing molecules as graphs is a natural and suitable encoding. Continue reading →

Automated testing with doctest

One of the ways to make your code more robust to unexpected input is to develop with boundary cases in your mind. Test-driven code development begins with writing a set of unit tests for each class. These tests often includes normal and extreme use cases. Thanks to packages like doctest for Python, Mocha and Jasmine for Javascript etc., we can write and test codes with an easy format. In this blog post, I will present a short example of how to get started with doctest in Python. N.B. doctest is best suited for small tests with a few scripts. If you would like to run a system testing, look for some other packages!

Continue reading →

Because not all interesting biology is health-related!

Nowadays, biological research science spins around health: Cancer. Neuroscience. Immunology. Pharmacology. And many more health-related areas which are being deeply studied. It seems that everyone is keen to spend their lives looking for the cure of cancer or Alzheimer. What a drag! For this reason (and also to show that research in less popular and less founded sectors can also improve significantly human lives), I have decided to write about something completely different: plant microbiome!

Indeed, I am going to write about bacteria. And no, they are not related to health at all. These bacteria live the soil and infect plants. However, they are not “bad”. Actually, they favour the plant’s growth and development. This is possible thanks to a fascinating process which finishes (ALERT SPOILER!!) with the bacteria transforming the atmospheric nitrogen into ammonia that can be used by the plant (nitrogen fixation).

The process starts with some kind of small talk between Rhizobium (the bacteria) and the legume (the plant): Legumes secrete compounds through their roots that the bacteria living close by can detect. In response to this stimulus, bacteria approach the root hairs of the plant and attach and secrete lipo-chitooligosaccharides known as Nod factors.

It continues with some action: The plants sense the Nod factors, which induce the root hairs curling and trapping the bacteria. The bacteria continue to grow and eventually form an infection thread whose growth allows the bacteria to reach other plant cells.

And it finishes with a happily ever after ending: A structure called a nodule is formed. The bacteria in the nodule form an organelle called the symbiosome, within which the bacteria differentiate to a state called bacteroid. In this stage, the bacteroid fixes nitrogen for the plant.

I know… Everything has happened too fast (the process can take 1 – 2 weeks). And I have not been bothered to explain it in detail so you can enjoy reading this amazing review: https://www.ncbi.nlm.nih.gov/pubmed/23493145

But wait! I almost forget to say why is worth studying this… The point is that plants need nitrogen to grow and they cannot use atmospheric nitrogen. Therefore, the more nitrogen they receive from the bacteria, the more they will grow. Consequently, we may increase the quantity of food available by improving this process.

Vim and I

Vim is great. Despite its steep learning curve , it has many advantages and many loyal Vim followers will tell you that you should force yourself to use it.

Personally I started using Vim when I was ssh-ing into the group servers or into my computer in department. In such scenarios, I could not open the IDEs with the nice GUIs 🙁 However, as time passed, Vim started to grow on me. Now, I can list a few reasons why I think it is great, for example, it requires a small amount of memory to run, has a short start up time and can handle large files pretty well.

Although, I am definitely not a Vim expert, I will tell you about some of the things I have added to my .vimrc. The .vimrc file is very handy for containing all your favourite settings, such as key mappings, custom commands, formatting and syntax highlighting. The file uses vimscript which is a programming language in itself. However, there is a lot of help online that tells you with what lines to add to your .vimrc. I would recommend installing Vundle which is a Vim plugin manager.

Here I will list some cool things that I have discovered you can do with your .vimrc. It has certainly made my life a bit nicer.

Code Folding
Most IDEs provide a way to collapse functions and classes that results in only seeing the function/class definition and hiding the code. To do this in Vim add the following lines to your .vimrc
```
" Enable folding
set foldmethod=indent
set foldlevel=99
" Enable folding with the spacebar
nnoremap <space> za
```
Alternatively, you can install the Vim plugin SimpylFold.
Python indentation
Vim does not do auto indention like many IDEs. To automatically do PEP-8 indentation for Python, add the following to your .vimrc .
```
" PEP indentation
au BufNewFile,BufRead *.py
\ set tabstop=4    
\ set softtabstop=4    
\ set shiftwidth=4    
\ set textwidth=79    
\ set expandtab    
\ set autoindent    
\ set fileformat=unix
```
You can also install the Vim plugin vim-flake8 which is a static syntax and style checker for Python source code. It shows errors in a quickfix window and lets you jump to their location inside your code.
Turn line numbers on
Rather than typing in
:set nu
every time you open your files. You can always have them turned on by adding :set nu to your .vimrc
Autocompletion
When I switch from PyCharm to Vim I feel a bit lost without the autocompletion however, after a quick search I found many are using the Vim package Youcompleteme and it is awesome.

Finding the lowest energy conformation of given molecule!

Generating low-energy molecular conformers is important for many areas of computational chemistry, molecular modeling and cheminformatics. Many tools have been developed to generate conformers, including BALLOON (1), Confab (2), FROG2 (3), MOE (4), OMEGA (5) and RDKit (6). The search algorithm implemented in these tools can be broadly classified as either systematic or stochastic. These algorithms primarily focus on generating geometrically diverse low-energy conformers. Here, we are interested in finding lowest energy conformation of a molecule instead of achieving geometric diversity and Bayesian optimization is used to find the lowest energy conformation (7). Continue reading →

What can you do with the OPIG Antibody Suite?

OPIG has now developed a whole range of tools for antibody analysis. I thought it might be helpful to summarise all the different tools we are maintaining (some of which are brand new, and some are not hosted at opig.stats), and what they are useful for.

Immunoglobulin Gene Sequencing (Ig-Seq/NGS) Data Analysis

1. OAS
Link: http://antibodymap.org/
Required Input: N/A (Database)
Paper: http://www.jimmunol.org/content/201/8/2502

OAS (Observed Antibody Space) is a quality-filtered, consistently-annotated database of all of the publicly available next generation sequencing (NGS) data of antibodies. Here you can:

Continue reading →

Mol2vec: Finding Chemical Meaning in 300 Dimensions

2D projections (t-SNE) of Mol2vec vectors of amino acids (bold arrows). These vectors were obtained by summing the vectors of the Morgan substructures (small arrows) present in the respective molecules (amino acids in the present example). The directions of the vectors provide a visual representation of similarities. Magnitudes reflect importance, i.e. more meaningful words. [Figure from Ref. 1]

Natural Language Processing (NLP) algorithms are usually used for analyzing human communication, often in the form of textual information such as scientific papers and Tweets. One aspect, coming up with a representation that clusters words with similar meanings, has been achieved very successfully with the word2vec approach. This involves training a shallow, two-layer artificial neural network on a very large body of words and sentences — the so-called corpus — to generate “embeddings” of the constituent words into a high-dimensional space. By computing the vector from “woman” to “queen”, and adding it to the position of “man” in this high-dimensional space, the answer, “king”, can be found.

A recent publication of one of my former InhibOx-colleagues, Simone Fulle, and her co-workers, Sabrina Jaeger and Samo Turk, shows how we can embed molecular substructures and chemical compounds into a similarly high-dimensional, continuous vectorial representation, which they dubbed “mol2vec“.¹ They also released a Python implementation, available on Samo Turk’s GitHub repository.

Continue reading →

Cinder: Crystallographic Tinder

Protein structure determination is still dominated by xray diffraction. For diffraction studies structural biologists need to grow and optimise protein crystals until they diffract to an usable and optimal resolution. A purified protein sample is exposed to a number of crystallisation screens, each comprising a selection of chemical conditions that are designed to explore a reasonably wide area of potential crystallisation conditions.

Many crystallography labs routinely image these in large plate storage systems, which reduces the human interaction to viewing a set of usually 100-1000 images at various time points. This is a slow and laborious process, and highly applicable to machine learning approaches tailored to looking at images. TexRank, a texton analysis ranking software was developed by Jia Tsing in OPIG and is used at the Structural Genomics Consortium (SGC). This ranking reduces the number of images that a human needs to search through, providing a quicker review process. Continue reading →

Rasmus Fonseca and GetContacts

We welcomed Rasmus Fonseca to last week’s OPIG Group Meeting. Rasmus is currently a Visiting Scholar at Stanford. He gave a fascinating talk about the interaction analysis of molecular structures and ensembles using the GetContacts package, one of many projects that he has contributed to that you can find on his GitHub repo.

Rasmus was kind enough to share his slides with us:

https://docs.google.com/presentation/d/1HmN9AuU4gL-jMlJdR6cMleueQ-nRWOE_hiWtO8OQEoo/edit?usp=sharing.

He is looking for new users (and developers), so if you have questions, he would be very happy to help get you started.

Oxford Protein Informatics Group

or "OPIG" to friends

Category Archives: Group Meetings

Non-alcoholic fatty liver disease

Graph-based Methods for Cheminformatics

Automated testing with doctest

Because not all interesting biology is health-related!

Vim and I

Finding the lowest energy conformation of given molecule!

What can you do with the OPIG Antibody Suite?

Mol2vec: Finding Chemical Meaning in 300 Dimensions

Cinder: Crystallographic Tinder

Rasmus Fonseca and GetContacts