Monthly Archives: November 2015

Isoform-resolved protein interaction networks and poker

Every time I talk about protein interaction networks, I put up a the nice little figure below. This figure suggests that experiments are done to detect which proteins interact with each other, and lines are drawn between them to create a network. Sounds pretty easy, right? It does, and it probably should, because otherwise you wouldn’t listen past the words “Yeast Two-Hybrid”. However, sometimes it’s important to hear about how it’s not at all easy, how there are compromises made, and how those compromises mean there are some inherent limitations. This post is one of those things it is important to listen to if you care about protein interaction networks (and if you don’t… they’re pretty cool, so don’t be a downer!).

Classical image showing the design of a protein interaction network

Schematic image showing the assembly of a human protein interaction network with ~25 000 interactions

So what’s wrong with that nice figure? Just to deal with the obvious: the colour scheme isn’t great… there’s a red protein and the interactions are also red… and then a yellow protein and red interactions aren’t that easy on the eyes. But you’re not here to judge my choice of colours (I hope… otherwise you’d be better off criticizing the graphs in this beast). You’re here to hear me rant about networks… much more fun ;). So here goes:

  1. Not all interactions come from the same experiments, they investigate different “types” of binding.
  2. Each experiment has experimental errors associated with it, the interactions are not all correct.
  3. The network is not complete (estimation is that there are somewhere between 150k and 600k interactions depending on what paper you read).
  4. People tend to investigate proteins that they know are associated with a disease, so more interactions are known for certain proteins and their neighbours resulting in an “inspection bias”.
  5. That’s a stupid depiction of a network, it’s just a blue blob and you can’t see anything (hence termed “ridiculogram” by Mark Newman).
  6. The arrow is wrong, you are not reporting interactions between proteins!

Points 1-5 should more or less make sense as they’re written. Point 6 however sounds a little cryptic. They’re called PROTEIN interaction networks, why would they not be interactions between proteins, you might ask.. and you’d be right in asking, because it’s really quite annoying. The problem is one of isoforms. The relation gene to protein is not 1-to-1. After a gene is transcribed into RNA, that piece of RNA is cut up and rearranged into mRNA, which is then translated into a protein. This rearranging process can occur in different ways so that a single gene may encode for different proteins, or as they are called, different protein isoforms. Testing for interactions between isoforms is difficult, so what tends to be done is that people test for interactions for THE ONE isoform of a protein to rule them all  (the “reference isoform”) and then report these interactions as interactions for the gene. Sneaky! What you end up seeing are interactions mostly tested between reference isoforms (or any that happened to be in the soup) and reported as interactions for the gene product.

So how much does it matter if we don’t know isoform interaction information? Are there even different interacting partners for different isoforms? Do they have different functions? Well… yes, they can have different interactions and different functions. As to whether they matter… according to Corominas et al that answer is also a resounding yes… or at least in Autism Spectrum Disorder (ASD) it is.

The paper is the result of a 5-year investigation which investigates isoform interactions and the effect of knowing them vs not knowing them on predicting candidate ASD genes of interest. And seeing as a bunch of people spent a lot of time on this stuff, it was definitely worth a read. Corominas et al found that in an ASD-related protein interaction network, there is a significant number of interactions that would not be found if only the reference isoform interactions were used. Furthermore, compared to a “high-quality” literature curated network, the generated isoform-resolved ASD network added a lot of interactions. They then went on to show that these additional interactions played an important role in the decision of which genes to prioritize as important “players in Autism”.

Should these results make us renounce the use of currently available non-isoform-resolved protein interaction networks lest we burn in the depths of bioinformatics hell? Well… probably not. While the paper is very interesting and shows the importance of isoforms, it does so in the context of Autism only. The paper itself states that ASD is a brain-related disease which is an environment known for many isoforms. In many cases, it will likely be the case that the “dominant isoform” is just that… dominant. Moreover, the results may sound a little stronger than they are. The literature curated network that was compared to, to say that this isoform-resolved network is really important, was quoted as being “high-quality”. It is likely that many of the isoform interactions would be included in lower quality networks, but they have simply not been as well-studied as dominant isoforms. Thus, their isoform-resolved network would just confirm lower quality interactions as high-quality ones. That being said, if you want to look at the specific mechanisms causing a phenotype, it is likely that isoform information will be necessary. It really depends on what you want to achieve.

Let’s say you’re playing Texas Hold’em poker and you have two pairs. You’d like to have that full house, but it’s elusive and you’re stuck with the hand you have when your opponent bids high. That’s the situation we were in with protein interaction networks: you know you’re missing something, but you don’t know how bad it is that you’re missing it. This paper addresses part of that problem. We now know that your opponent could have the flush, but possibly only if you’re in Vegas. If you only want to play in a local casino, you’ll likely be fine.

Molecular Diversity and Drug Discovery

reportdraft_2 copyFor my second short project I have developed Theox, molecular diversity software, to aid the selection of synthetically viable molecules from a subset of diverse molecules. The selection of molecules for synthesis is currently based on synthetic intuition. The developed software indicates whether the selection is an adequate representation of the initial dataset, or whether molecular diversity has been compromised. Theox plots the distribution of diversity indices for 10,000 randomly generated subsets of the same size as the chosen subset. The diversity index of the chosen subset can then be compared to the distributions, to determine whether the molecular diversity of the chosen subset is sufficient. The figure shows the distribution of the Tanimoto diversity indices with the diversity index of the subset of molecules shown in green.