December | 2014 | Oxford Protein Informatics Group

Protein-protein interaction (PPI) networks describe how proteins are connected to one another in terms of physical interactions. They can be used to aid our understanding of the individual roles of proteins (Sarajli ́c et al., 2013), the co-functioning properties of sets of proteins (West et al., 2013) and even the operation of the complete system (Janowski et al., 2014).

Different approaches have been proposed to analyse, describe and predict these PPI networks, such as network summary statistics, clustering methods, random graph models and machine learning methods. However, despite the large biological, computational and statistical interest in PPI net- works, current models insufficiently describe PPI networks (Winterbach et al., 2013; Ali et al., 2014; Rito et al., 2010). It is commonly accepted that proteins perform functions usually in conjunction with other proteins, forming a functional module (Lewis et al., 2010). Hence local structure is found to be important in protein-protein interaction networks (Planas-Iglesias et al., 2013).

Here we address the modelling problem locally by modelling the ego-networks of PPI networks by means of random graph models.

Random graph models

Loosely speaking, a random graph model is a set of rules that define an edge generation process among a set of nodes. Usually this set of rules relate to particular characteristics that are embedded in the network generation process. Here are three examples of such characteristics:

Independence (each edge has a probability $p$ of being present).
Preferential attachment (nodes form edges with highly interacting nodes).
Space-based interactions (an edge is present between two nodes if the distance between them small).

A classical model representing an independence structure is the ER(nv,p) model. This is a random graph on nv nodes, and where edges are present independently at random with probability p.

Now, the preferential attachment characteristic can be illustrated by the Chung-Lu model. That is, given an expected sequence of weights $\{d_1,d_2,...,d_{n_v}\}$ . The probability of obtaining an edge between nodes i and j is given by $P((i,j)\in E)=d_id_j / \sum_j d_j.$

Finally, a model representing a spaced based network generation process could be the Geometric model. Here, nodes are placed uniformly at random in a d-dimensional square $[0,1]^d$ . Now, given a radius or threshold distance (r), edges are drawn among nodes $v_i,\,v_j$ $i\neq j$ if $d(v_i,v_j)\leq r$ .

From the latter figures it can be seen that different models often lead to different network structures. Thus, although standard random graph models do not reproduce a sufficiently similar network structure to the one of PPI networks, they could still be good approximations for different local regions in a PPI network.

Finding a null model for PPI ego-networks

Our approach consist in finding local regions of the PPI networks that could be represented well by the random graph models. To do so, we propose to extract all 2-step ego-networks and classifying them according to some simple characteristic, network density for example.

Now, once the ego-networks of the PPI network have been extracted and binned according to their network density (ego-density). We assess the fit of the model to the PPI networks by comparing each bin of PPI ego-networks to the ego-networks extracted from a random graph model. This comparison is made by comparing the difference in the resulting number of subgraph counts, triangles for example, in each of the ego-networks within each bin.

The following figure illustrates the underlying idea of this procedure:

Following this approach we aim to find bins for which, possibly different models, reproduce similar subgraph counts as the ones obtained in the PPI ego-networks. However we expect to fin bins for which none of the standard models fit.

Motivation

Recently I published a paper on OOMMPPAA, my 3D matched-molecular pair tool to aid drug discovery. The tool is available to try here, download here and read about here. OOMMPPAA aims to tackle the big-data problem in drug discovery – where X-ray structures and activity data have become increasingly abundant. Below I will explain what a 3D MMP is.

What are MMPs?

MMPs are two compounds that are identical apart from one structural change, as shown in the example below. Across a set of a thousand compounds, tens of thousands of MMPs can be found. Each pair can be represented as a transformation. Below would be Br >> OH. Within the dataset possibly hundreds of Br >> OH will exist.

An example of an MMP

Each of these transformations will also be associated with a change in a measurable compound property, (solubility, acidity, binding energy, number of bromine atoms…). Each general transformation (e.g. Br>>OH) therefore would have a distribution of values for a given property. From this distribution we can infer the likely effect of making this change on an unmeasured compound. From all the distributions (for all the transformations) we can then predict the most likely transformation to improve the property we are interested in.

The issue with MMPs

Until recently the MMP approach had been of limited use for predicting compound binding energies. This is for two core reasons.

1) Most distributions would be pretty well normally distributed around zero. Binding is complex and the same change can have both positive and negative effects.

2) Those distributions that are overwhelmingly positive, by definition, produce increased binding against many proteins. A key aim of drug discovery is to produce selective compounds that increase binding energy only for the protein you are interested in. So increasing binding energy like that is not overall very useful.

3D MMPs to the rescue

3D MMPs aim to resolve these issues and allow MMP analysis to be applied to binding energy predictions. 3D MMPs use structural data to place the transformations in the context of the protein. One method, VAMMPIRE, asks what is the overall effect of Br>>OH when it is near to a Leucine, Tyrosine and Tryptophan (for example). In this way selective changes can be found.

Another method by BMS aggregates these changes across a target class, in their cases kinases. From this they can ask questions like, “Is it overall beneficial to add a cyclo-proply amide in this region of the kinase binding site”.

What does my tool do differently?

OOMMPPAA differs from the above in two core ways. First, OOMMPPAA uses a pharmacophore abstraction to analyse changes between molecules. This is an effective way of increasing the number of observations for each transition. Secondly OOMMPPAA does not aggregate activity changes into general trends but considers positive and negative activity changes separately. We show in the paper that this allows for more nuanced analysis of confounding factors in the available data.

Oxford Protein Informatics Group

or "OPIG" to friends

Monthly Archives: December 2014

Looking for a null model of PPI ego-networks

OOMMPPAA: A tool to aid directed synthesis by the combined analysis of activity and structural data