Author Archives: Florian Klimm

Drawing Networks in LaTeX with tikz-network

While researching on protein interaction networks it is often important to illustrate networks. For this many different tools are available, for example, Python’s NetworkX and Matlab, that allow the export of figures as pixelated images or vector graphics. Usually, these figures are then incorporated in the papers, which are commonly written in LaTeX. In this post, I want to present `tikz-network’, which is a novel tool to code and illustrate networks directly in LaTeX.

To create an illustration you define the network’s nodes with their positions and edges between these nodes. An example of a simple network is

\begin{tikzpicture}
   \Vertex[color = blue]{A}
   \Vertex[x=3,y=1,color=red]{B}
   \Vertex[x=0,y=2,color=orange]{C}
   \Edge[lw=5pt](A)(B)
   \Edge[lw=3pt,bend=15,Direct](A)(C)
\end{tikzpicture}

The illustrations can be much more complex and allow dashed lines, opacity, and many other features. Importantly, the properties do not need to be specified in the LaTeX file itself but can also be saved in an external file and imported with the  \Vertices{data/vertices.csv}command. This allows the representation of more complex networks, for example the multilayer network below is created from the two files, the first representing the nodes

id, x, y ,size, color,opacity,label,layer 
A, 0, 0, .4 , green, .9 , a , 1
B, 1, .7, .6 , , .5 , b , 1
C, 2, 1, .8 ,orange, .3 , c , 1
D, 2, 0, .5 , red, .7 , d , 2
E,.2,1.5, .5 , gray, , e , 1
F,.1, .5, .7 , blue, .3 , f , 2
G, 2, 1, .4 , cyan, .7 , g , 2
H, 1, 1, .4 ,yellow, .7 , h , 2

and the second having the edge information:

u,v,label,lw,color ,opacity,bend,Direct
A,B, ab  ,.5,red   ,   1   ,  30,false
B,C, bc  ,.7,blue  ,   1   , -60,false
A,E, ae  , 1,green ,   1   ,  45,true
C,E, ce  , 2,orange,   1   ,   0,false
A,A, aa  ,.3,black ,  .5   ,  75,false
C,G, cg  , 1,blue  ,  .5   ,   0,false
E,H, eh  , 1,gray  ,  .5   ,   0,false
F,A, fa  ,.7,red   ,  .7   ,   0,true
D,F, df  ,.7,cyan  ,   1   ,   30,true
F,H, fh  ,.7,purple,   1   ,   60,false
D,G, dg  ,.7,blue  ,  .7   ,   60,false

For details, please see the extensive manual on the GitHub page of the project. It is a very new project and I only started using it but I like it so far for a couple of reasons:

  • it is easy to use, especially for small example graphs
  • the multilayer functionality is very convenient
  • included texts are automatically in the correct size and font with the rest of the LaTeX document
  • it can be combined with regular tikz commands to create more complex illustrations

Multiomics data analysis

Cells are the basic functional and structural units of living organisms. They are the location of many different biological processes, which can be probed by various biological techniques. Until recently such data sets have been analysed separately. The aim is to better understand the underlying biological processes and how they influence each other. Therefore techniques that integrate the data from different sources might be applicable [1].

In the image below you see the four main entities that are active throughout the cell: Genome, RNA, proteins, and metabolites. All of them are in constant interaction, for example, some proteins are transcription factors and influence the transcription of DNA into RNA. Metabolites that are present in the cell also influence the activity of proteins as ligands but at the same time are altered through enzymatic activity. This ambiguity of interactions makes it clear that probing the system at a single level gives only limited insight into the structure and function of the cellular processes.

 

multiomics_schematic

The different levels of biological information (genome, proteome, …) work mutually and influence each other through processes as transcription regulation through transcription factors. All levels are influenced by external factors, as drug treatment or nutrient availability. Multiomics is the measurement of multiple of those populations and their integrated analysis.

In the last years, different ways to integrate such data have been developed. Broadly speaking there are three levels of data integration: conceptual integration, statistical integration, and model-based integration [2]. Conceptual integration means that the data sets are analysed separately and the conclusions are compared and integrated. This method can easily use already existing analysis pipelines but the way in which conclusions are compared and integrated is non-trivial. Statistical Integration combines data sets and analyses them jointly, reaching conclusions that match all data and potentially finding signals that are not observable with the conceptual approach. Model-based integration indicates the joint analysis of the data in a combination of training of a model, which itself might incorporate prior beliefs of a system.

[1] Gehlenborg, Nils, Seán I. O’donoghue, Nitin S. Baliga, Alexander Goesmann, Matthew A. Hibbs, Hiroaki Kitano, Oliver Kohlbacher et al. “Visualization of omics data for systems biology.” Nature methods 7 (2010): S56-S68.

[2] Cavill, Rachel, Danyel Jennen, Jos Kleinjans, and Jacob Jan Briedé. “Transcriptomic and metabolomic data integration.” Briefings in bioinformatics 17, no. 5 (2016): 891-901.

A global genetic interaction network maps a wiring diagram of cellular function

In our last group meeting, I talked about a recent paper which presents a vast amount of genetic interaction data as well as some spatial analysis of the created data. Constanzo et al. used temperature-sensitive mutant alleles to measure the interaction of ~6000 genes in the yeast Saccharomyces cerevisiae [1]. A typical way to analyse such data would be the use of community detection to find groups of genes with similar interaction pattern, see for example [2] for a review. Instead, the authors of this paper created a two-dimensional embedding of the network with a spring-layout, which places nodes close to each other if they show similar interaction pattern.

The network layout is then compared with Gene Ontology by applying a spatial analysis of functional enrichment (SAFE) [3].  Clusters enriched are associated for example with cell polarity, protein degradation, and ribosomal RNA. By filtering the network for different similarities they find a hierarchical organisation of genetic function with small dense modules of pathways or complexes at the bottom and sparse clusters representing different cell compartments at the top.

In this extensive paper, they then go further into detail to quantify gene pleiotropy, predict gene function, and how the interaction structure differs between essential and non-essential genes. They also provide that data online under http://thecellmap.org/costanzo2016/ .

(Left) A network of genetic interaction was embedded into a two-dimensional space using a spring-layout (Right) This embedding was compared with Gene Ontology terms to find regions of spatial enrichment. Image from [1]

(Left) A network of genetic interaction was embedded into a two-dimensional space using a spring-layout (Right) This embedding was compared with Gene Ontology terms to find regions of spatial enrichment.
Image from [1]

References:

[1] Costanzo, Michael, et al. “A global genetic interaction network maps a wiring diagram of cellular function.” Science 353.6306 (2016): aaf1420.

[2] Fortunato, Santo. “Community detection in graphs.” Physics Reports 486.3 (2010): 75-174.

[3] Baryshnikova, Anastasia. “Systematic Functional Annotation and Visualization of Biological Networks.” Cell Systems (2016).

Community structure in multilayer networks

 

Multilayer networks are a generalisation of network that may incorporate different types of interactions [1]. This could be different time points in temporal data, measurements in different individuals or under different experimental conditions. Currently many measures and methods from monolayer networks are extended to be applicabile to multilayer networks. Those include measures of centrality [2], or methods that enable to find mesoscale structure in networks [3,4].

Examples of such mesoscale structure detection methods are stochastic block models and community detection. Both try to find groups of nodes that behave structurally similar in a network. In its most simplistic way you might think of two groups that are densely connected internally but only sparsely between the groups. For example two classes in a high school, there are many friendships in each class but only a small number between the classes. Often we are interested in how such patterns evolve with time. Here, the usage of multilayer community detection methods is fruitful.

mucha_senate_pic

From [4]: Multislice community detection of U.S. Senate roll call vote similarities across time. Colors indicate assignments to nine communities of the 1884 unique senators (sorted vertically and connected across Congresses by dashed lines) in each Congress in which they appear. The dark blue and red communities correspond closely to the modern Democratic and Republican parties, respectively. Horizontal bars indicate the historical period of each community, with accompanying text enumerating nominal party affiliations of the single-slice nodes (each representing a senator in a Congress): PA, pro-administration; AA, anti-administration; F, Federalist; DR, Democratic-Republican; W, Whig; AJ, anti-Jackson; A, Adams; J, Jackson; D, Democratic; R, Republican. Vertical gray bars indicate Congresses in which three communities appeared simultaneously.

Mucha et al. analysed the voting pattern in the U.S. Senate [4]. They find that the communities are oriented as the political party organisation. However, the restructuring of the political landscape over time is observable in the multilayered community structure. For example, the 37th Congress during the beginning of the American Civil War brought a major change in the voting patterns. Modern politics is dominated by a strong partition into Democrats and Republicans with third minor group that can be identified as the ‘Southern Democrats’ that had distinguishable voting patterns during the 1960.

Such multilayer community detection methods can be insightful for networks from other disciplines. For example they have been adopted to describe the reconfiguration in the human brain during learning [5]. Hopefully they will be able to give us insight in the structure and function of protein interaction.

[1] De Domenico, Manlio; Solé-Ribalta, Albert; Cozzo, Emanuele; Kivelä, Mikko; Moreno, Yamir; Porter, Mason A.; Gómez, Sergio; and Arenas, Alex [2013]. Mathematical Formulation of Multilayer NetworksPhysical Review X, Vol. 3, No. 4: 041022.

[2] Taylor, Dane; Myers, Sean A.; Clauset, Aaron; Porter, Mason A.; and Mucha, Peter J. [2016]. Eigenvector-based Centrality Measures for Temporal Networks

[3]  Tiago P. Peixoto; Inferring the mesoscale structure of layered, edge-valued, and time-varying networks. Phys. Rev. E 92, 042807

[4] Mucha, Peter J.; Richardson, Thomas; Macon, Kevin; Porter, Mason A.; and Onnela, Jukka-Pekka [2010]. Community Structure in Time-Dependent, Multiscale, and Multiplex NetworksScience, Vol. 328, No. 5980: 876-878.

[5] Bassett, Danielle S.; Wymbs, Nicholas F.; Porter, Mason A.; Mucha, Peter J.; Carlson, Jean M.; and Grafton, Scott T. [2011]. Dynamic Reconfiguration of Human Brain Networks During LearningProceedings of the National Academy of Sciences of the United States of America, Vol. 118, No. 18: 7641-7646.

 

At this week’s group meeting I presented on my second SABS short project, which is supervised by Charlotte Deane, Mason Porter, and Jonny Wray from e-Therapeutics. It has the title “Multilayer-Network Analysis of Protein Interaction Networks”.
Protein interactions can be represented using networks. Accordingly, approaches that have been developed in network science are appropriate for the analysis of protein interactions, and they can lead to the detection of new drug targets. Thus far, only ordinary (“monolayer”) protein interaction networks have been exploited for drug discovery. However, because “multilayer networks” allow the representation of multiple types of interactions and of time-dependent interactions, they have the potential to improve insight from network-based approaches [1].
Aim of my project was to employ known multilayer methods on well-established data to investigate potential use cases of multilayer protein interaction networks. We focussed on various community detection methods [3,4] to find groups of proteins as candidates of functional, biological modules. Additionally, temporal centrality [5] measures were used to identify important proteins across time.

References:
[1] Kivelä, Mikko, et al. “Multilayer networks.” Journal of Complex Networks (2014) [2] Calvano, Steve E., et al. “A network-based analysis of systemic inflammation in humans.” Nature (2005) [3] Peixoto, Tiago P. “Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models.” PRE (2014) [4] Mucha, Peter J., et al. “Community structure in time-dependent, multiscale, and multiplex networks.” Science (2010) [5] Taylor, Dane, et al. “Eigenvector-Based Centrality Measures for Temporal Networks.” arXiv preprint (2015).