Modeling of Protein Interaction Networks

In the group meeting on the 20th of August, I presented the paper by Vazquez et al (2002). This was one of the first papers proposing the duplication divergence model of evolution for protein interaction networks, and thus has had a significant impact on the field, inspiring many variants of the basic model. The paper starts out by noting that the yeast protein network has the ‘small world’ property – following along links in the network, it requires only a handful of steps to go from any one protein to any other. Another property is the manner in which the links are shared out among the various proteins: empirically, the probability that a protein interacts with k other proteins follows a power-law distribution.

Vazquez et al. show how evolution can produce scale-free networks. They explore a model for the evolution of protein networks that accurately reproduces the topological features seen in the yeast S. cerevisae. As the authors point out, proteins fall into families according to similarities in their amino-acid sequences and functions, and it is natural to suppose that such proteins have all evolved from a common ancestor. A favoured hypothesis views such evolution as taking place through a sequence of gene duplications – a relatively frequent occurrence during cell reproduction. Following each duplication, the two resulting genes are identical for the moment, and naturally lead to the production of identical proteins. But between duplications, random genetic mutations will lead to a slow divergence of the genes and their associated proteins. This repetitive, two-stage process can be captured in a relatively simple model for the growth of a protein interaction network .

Simulations by the author show that the model, starting out with a seed capture the degree distribution of the yeast network with high fidelity (Fig 1) and also possesses the quality of high tolerance to random node removal seen in biological networks. While the results are more qualitative in nature, the model still serves as the basis of most biologically rooted explanations of protein network evolution, with minor improvements. Some of these additions have been the use of asymmetric divergence, whole genome duplication events as well as interaction site modelling. As the jury is still out on what model (if any) best fits current interaction data, the basic model is still relevant as a benchmark for newer models.

Fig. 2. Zipf plot for the PIN and the DD model with p = 0.1, q = 0.7 with N = 1,825. k is the connectivity of a node and r is its rank in decreasing order of k. Error bars represent standard de- viation on a single network realization.

Fig. 1. Zipf plot for the PIN and the DD model with p = 0.1, q = 0.7 with N = 1,825. k is the connectivity of a node and r is its rank in decreasing order of k. Error bars represent standard deviation on a single network realization.

Author