Tag Archives: Community Detection

Every Protein needs a Friend – Community Detection in Protein Interaction Networks

To make the OPIG soup, that has tasted of antibodies a lot lately, a little more diverse, I will try to spice things up with a dash of protein interaction networks, a pinch of community detection and a shot of functional similarity evaluation. I hope it remains edible!


In the 10 weeks I have spent at OPIG, my main focus has been on protein interaction networks, or more specifically, on this network:

View of the largest connected component of the HINT binary physical interaction network

View of the largest connected component of the HINT binary physical interaction network. Nodes represent proteins and edges are protein interactions.

Viewing this image, a popular German phrase comes to mind, which badly translated means: “As you see, you see nothing”. However, trying to “see” something in this, is what I’ve been trying to do. And as it turns out, I’m not the only person.

If we had a data set which says exactly which protein interacts with which other ones, then surely all biological pathway information must be incorporated in this data, and we should be able to cluster it into smaller modules or communities, which represent a biological function. This Gedankenexperiment is the theory which underlies my approach to these networks.

In reality, however, we don’t have this perfect data set. Protein interaction networks are very noisy with high estimated false positive and false negative rates for interactions, yet community detection algorithms have still been shown to be successful in outputting meaningful partitions of the network into communities. In this context “meaningful” refers to communities which group proteins together that have a similar biological function.

This brings us to a whole new problem. What is a “similar biological function” and how do you measure it? This question cannot be perfectly answered, but it seems the Gene Ontology annotations for biological process are a good place to start. In this framework, proteins are annotated with terms which describe the biological process they participate in. Of course there is not always a consensus about what term is to be assigned to a protein, and it is questionable how precisely a protein’s function within a process can be determined, but it wouldn’t be called work, if it was easy.

In my 10 weeks here, I’ve only scraped the tip of what is detection of functional communities in protein interaction networks, but it looks promising that the communities obtained may have some significance regarding biological modules. It is my hope that I can use data sets such as gene expression studies to further investigate this significance in the future, and maybe, if I’m very lucky, work towards helping people classify macrophage phenotypes or identify cancer in the distant future. The best place to do this, would definitely be in the friendly atmosphere that is OPIG!