Author Archives: Florian Klimm

Dimensions: The Mathematics of Symmetry and Space

Update: The exhibition was Highly Commended in Oxford’s The Vice-Chancellor’s Public Engagement with Research Awards 2019! [Link]

As part of outreach efforts, I have been involved with the exhibition “Dimensions: The Mathematics of Symmetry and Space” at the Ashmolean Museum. This exhibition is a great opportunity to explore a selection of the Ashmolean’s impressive collection from a mathematical (but very accessible) point of view.

Continue reading →

Non-alcoholic fatty liver disease

In my new research project, I investigate Non-alcoholic fatty liver disease (NAFLD). This term describes a variety of conditions that are associated with fatty livers. While the early stages of this disease are not harmful, it can lead to cirrhosis (Cirrhosis is the scaring of liver tissue that prevents a liver to function properly). Ultimately, if a liver stops working it can be fatal unless treated, for example, with a liver transplant. NAFLD is the most common liver disease in developed countries and is expected to become the leading cause of liver transplant by 2020 [1].

The disease progresses in four stages: Continue reading →

Interactive Illustration of Collaboration Networks with D3

D3 is a JavaScript Library that allows the creation of interactive data visualisations. D3 stands for Data-Driven Documents and its advantage is that it is using internet standards as HTML, CSS, and SVG as the foundation. This gives maximal compatibility across all moderns browsers. It is widely used by journalists, data scientists, and starts to be used by academic scientists, too.

Here I want to present a simple way of creating interactive illustrations of collaboration networks, e.g., for your own website. Before going into details, have a look at the example below, it presents some of Rosalind Franklin’s papers and her co-authors.

The network illustration is a so-called bipartite network because it consists of two types of nodes, blue nodes represent scientists and orange nodes represent publications. These nodes are connected if a scientist is an author on a publication. This way of presenting is a so-called force layout, which simulates repelling Coulomb forces between all nodes and spring-like attractive forces act on nodes that are connected via links. You can drag the nodes around and explore the behaviour of the network.

You will have noticed that you can also see the name of the publications and co-authors when you hover over the nodes. For some of them, a representative figure or photograph is shown, too. You can also double-click the nodes and will be directed to the webpage of the publication or author.

For larger collaboration networks the visualisation is still possible but might get a bit messy. Thus, not showing any figures is advised. Furthermore, the name of the nodes should be shown either below the illustration or as a tooltip (the later is not possible in WordPress but on normal web pages as here). The illustration below shows all 124 publications written by OPIG, together with the 310 authors. Here, I had to remove the Head of the Group Charlotte Deane, as she is on almost all of these papers and would clutter the illustration, unfortunately. In network science, we call such a node an ego node.

If you want to play around with this, check out my Github repository. The creation of the network is fairly simple. You only need a Bibtex file and can then execute a Python script that creates a JSON file that is read into the JavaSript. This can then be incorporated in all web pages easily.

PS: To enable D3 in WordPress you need a special plugin. Apparently, it is not up to date with the current stable D3 version 4 but you can load any missing functions manually.

Teaching Network Science to High School Students

In the recent years, a lot of effort went into outreach events, in particular for science and mathematics. Here, I am going to mention a summer course on Network Science which I organized and taught together with Benjamin F. Maier from the Humboldt University Berlin.

The course was part of an established German summer school called Deutsche Schülerakademie (German Pupils Academy), an extracurricular event for highly motivated pupils. It lasts sixteen days and the participants join one of six courses, which cover all ranges of academic disciplines, from philosophy over music to science.

Our course was titled Netzwerke und Komplexe Systeme (Networks and Complex Systems) and rather than going too much in depth in one particular area we covered a broad selection of topics, as we wanted to give students an overview and also an idea of how different disciplines approach complex phenomena. We discussed pure Mathematics topics as the colouring of graphs, algorithmic discussions as the travelling salesman problem, social network analysis, computational neuroscience, dynamical systems, and fractals.

A network of the former monastery in Rossleben, where the summer school was held. The students created the network themselves. To parallelise the task they split up into four groups, each covering one level of the building. They then used this network to simulate the spread of a contagious disease, starting at the biological lab (A35, in red).

A couple of thoughts on what went well and which parts might need improvement for further of such events:

We did a questionnaire before and asked the pupils some questions like “Do you know what a vector is?” and also concerning their motivation to join the course. This was very helpful in getting a rough idea about their knowledge level.
We gave them some material to read before the course. In retrospective, it probably would be better to give them something to read, as well as, some problems to solve, such that the learning outcome is clearer and more effective.
The students gave presentations on topics we choose for them based on their answers to the questionnaire. The presentations were good but a lot of students overrun the allocated time because they were very enthusiastic about the topics.
The students were also enthusiastic about the programming exercises, for which we used Python and the NetworkX library. One challenge was the heterogeneity in programming experience, this made the splitting up into two groups, beginner and advanced, necessary.
In contrast to courses covering similar topics at university-level, the students did not have the necessary mathematical background for the more complicated aspects of network science. Accordingly, it is better to choose less of these and allocate time to introduce the mathematical methods, for example, eigenvectors or differential equations, beforehand.
The students very much liked hand on exercises, for example, the creation of random networks of different connection probabilities with the help of dice or the creation of a network of the floor plan of the building in which the summer school was held, as shown in the Figure.

It was great fun to introduce the students to the topic of network science and I strongly can recommend others to organise similar outreach events! You can find some of our teaching materials, including the worksheets and programming exercises in the original German and a translated English version, online. A paper describing our endeavours is under review.

Four Erdős–Rényi random graphs as generated by the participants by rolling dice. A twenty-sided dice was used for the probabilities p = 1/20 and p = 10 and a six-sided dice for p = 1/6 and p=1/3. This fun exercise allows the discussion of degree distributions, the size of the largest connected component, and similar topics for ER random graphs.

Drawing Networks in LaTeX with tikz-network

While researching on protein interaction networks it is often important to illustrate networks. For this many different tools are available, for example, Python’s NetworkX and Matlab, that allow the export of figures as pixelated images or vector graphics. Usually, these figures are then incorporated in the papers, which are commonly written in LaTeX. In this post, I want to present `tikz-network’, which is a novel tool to code and illustrate networks directly in LaTeX.

To create an illustration you define the network’s nodes with their positions and edges between these nodes. An example of a simple network is

\begin{tikzpicture}
   \Vertex[color = blue]{A}
   \Vertex[x=3,y=1,color=red]{B}
   \Vertex[x=0,y=2,color=orange]{C}
   \Edge[lw=5pt](A)(B)
   \Edge[lw=3pt,bend=15,Direct](A)(C)
\end{tikzpicture}

The illustrations can be much more complex and allow dashed lines, opacity, and many other features. Importantly, the properties do not need to be specified in the LaTeX file itself but can also be saved in an external file and imported with the \Vertices{data/vertices.csv}command. This allows the representation of more complex networks, for example the multilayer network below is created from the two files, the first representing the nodes

id, x, y ,size, color,opacity,label,layer 
A, 0, 0, .4 , green, .9 , a , 1
B, 1, .7, .6 , , .5 , b , 1
C, 2, 1, .8 ,orange, .3 , c , 1
D, 2, 0, .5 , red, .7 , d , 2
E,.2,1.5, .5 , gray, , e , 1
F,.1, .5, .7 , blue, .3 , f , 2
G, 2, 1, .4 , cyan, .7 , g , 2
H, 1, 1, .4 ,yellow, .7 , h , 2

and the second having the edge information:

u,v,label,lw,color ,opacity,bend,Direct
A,B, ab  ,.5,red   ,   1   ,  30,false
B,C, bc  ,.7,blue  ,   1   , -60,false
A,E, ae  , 1,green ,   1   ,  45,true
C,E, ce  , 2,orange,   1   ,   0,false
A,A, aa  ,.3,black ,  .5   ,  75,false
C,G, cg  , 1,blue  ,  .5   ,   0,false
E,H, eh  , 1,gray  ,  .5   ,   0,false
F,A, fa  ,.7,red   ,  .7   ,   0,true
D,F, df  ,.7,cyan  ,   1   ,   30,true
F,H, fh  ,.7,purple,   1   ,   60,false
D,G, dg  ,.7,blue  ,  .7   ,   60,false

For details, please see the extensive manual on the GitHub page of the project. It is a very new project and I only started using it but I like it so far for a couple of reasons:

it is easy to use, especially for small example graphs
the multilayer functionality is very convenient
included texts are automatically in the correct size and font with the rest of the LaTeX document
it can be combined with regular tikz commands to create more complex illustrations

Multiomics data analysis

Cells are the basic functional and structural units of living organisms. They are the location of many different biological processes, which can be probed by various biological techniques. Until recently such data sets have been analysed separately. The aim is to better understand the underlying biological processes and how they influence each other. Therefore techniques that integrate the data from different sources might be applicable [1].

In the image below you see the four main entities that are active throughout the cell: Genome, RNA, proteins, and metabolites. All of them are in constant interaction, for example, some proteins are transcription factors and influence the transcription of DNA into RNA. Metabolites that are present in the cell also influence the activity of proteins as ligands but at the same time are altered through enzymatic activity. This ambiguity of interactions makes it clear that probing the system at a single level gives only limited insight into the structure and function of the cellular processes.

The different levels of biological information (genome, proteome, …) work mutually and influence each other through processes as transcription regulation through transcription factors. All levels are influenced by external factors, as drug treatment or nutrient availability. Multiomics is the measurement of multiple of those populations and their integrated analysis.

In the last years, different ways to integrate such data have been developed. Broadly speaking there are three levels of data integration: conceptual integration, statistical integration, and model-based integration [2]. Conceptual integration means that the data sets are analysed separately and the conclusions are compared and integrated. This method can easily use already existing analysis pipelines but the way in which conclusions are compared and integrated is non-trivial. Statistical Integration combines data sets and analyses them jointly, reaching conclusions that match all data and potentially finding signals that are not observable with the conceptual approach. Model-based integration indicates the joint analysis of the data in a combination of training of a model, which itself might incorporate prior beliefs of a system.

[1] Gehlenborg, Nils, Seán I. O’donoghue, Nitin S. Baliga, Alexander Goesmann, Matthew A. Hibbs, Hiroaki Kitano, Oliver Kohlbacher et al. “Visualization of omics data for systems biology.” Nature methods 7 (2010): S56-S68.

[2] Cavill, Rachel, Danyel Jennen, Jos Kleinjans, and Jacob Jan Briedé. “Transcriptomic and metabolomic data integration.” Briefings in bioinformatics 17, no. 5 (2016): 891-901.

A global genetic interaction network maps a wiring diagram of cellular function

In our last group meeting, I talked about a recent paper which presents a vast amount of genetic interaction data as well as some spatial analysis of the created data. Constanzo et al. used temperature-sensitive mutant alleles to measure the interaction of ~6000 genes in the yeast Saccharomyces cerevisiae [1]. A typical way to analyse such data would be the use of community detection to find groups of genes with similar interaction pattern, see for example [2] for a review. Instead, the authors of this paper created a two-dimensional embedding of the network with a spring-layout, which places nodes close to each other if they show similar interaction pattern.

The network layout is then compared with Gene Ontology by applying a spatial analysis of functional enrichment (SAFE) [3]. Clusters enriched are associated for example with cell polarity, protein degradation, and ribosomal RNA. By filtering the network for different similarities they find a hierarchical organisation of genetic function with small dense modules of pathways or complexes at the bottom and sparse clusters representing different cell compartments at the top.

In this extensive paper, they then go further into detail to quantify gene pleiotropy, predict gene function, and how the interaction structure differs between essential and non-essential genes. They also provide that data online under http://thecellmap.org/costanzo2016/ .

(Left) A network of genetic interaction was embedded into a two-dimensional space using a spring-layout (Right) This embedding was compared with Gene Ontology terms to find regions of spatial enrichment.
Image from [1]

References:

[1] Costanzo, Michael, et al. “A global genetic interaction network maps a wiring diagram of cellular function.” Science 353.6306 (2016): aaf1420.

[2] Fortunato, Santo. “Community detection in graphs.” Physics Reports 486.3 (2010): 75-174.

[3] Baryshnikova, Anastasia. “Systematic Functional Annotation and Visualization of Biological Networks.” Cell Systems (2016).

Community structure in multilayer networks

Multilayer networks are a generalisation of network that may incorporate different types of interactions [1]. This could be different time points in temporal data, measurements in different individuals or under different experimental conditions. Currently many measures and methods from monolayer networks are extended to be applicabile to multilayer networks. Those include measures of centrality [2], or methods that enable to find mesoscale structure in networks [3,4].

Examples of such mesoscale structure detection methods are stochastic block models and community detection. Both try to find groups of nodes that behave structurally similar in a network. In its most simplistic way you might think of two groups that are densely connected internally but only sparsely between the groups. For example two classes in a high school, there are many friendships in each class but only a small number between the classes. Often we are interested in how such patterns evolve with time. Here, the usage of multilayer community detection methods is fruitful.

From [4]: Multislice community detection of U.S. Senate roll call vote similarities across time. Colors indicate assignments to nine communities of the 1884 unique senators (sorted vertically and connected across Congresses by dashed lines) in each Congress in which they appear. The dark blue and red communities correspond closely to the modern Democratic and Republican parties, respectively. Horizontal bars indicate the historical period of each community, with accompanying text enumerating nominal party affiliations of the single-slice nodes (each representing a senator in a Congress): PA, pro-administration; AA, anti-administration; F, Federalist; DR, Democratic-Republican; W, Whig; AJ, anti-Jackson; A, Adams; J, Jackson; D, Democratic; R, Republican. Vertical gray bars indicate Congresses in which three communities appeared simultaneously.

Mucha et al. analysed the voting pattern in the U.S. Senate [4]. They find that the communities are oriented as the political party organisation. However, the restructuring of the political landscape over time is observable in the multilayered community structure. For example, the 37th Congress during the beginning of the American Civil War brought a major change in the voting patterns. Modern politics is dominated by a strong partition into Democrats and Republicans with third minor group that can be identified as the ‘Southern Democrats’ that had distinguishable voting patterns during the 1960.

Such multilayer community detection methods can be insightful for networks from other disciplines. For example they have been adopted to describe the reconfiguration in the human brain during learning [5]. Hopefully they will be able to give us insight in the structure and function of protein interaction.

[1] De Domenico, Manlio; Solé-Ribalta, Albert; Cozzo, Emanuele; Kivelä, Mikko; Moreno, Yamir; Porter, Mason A.; Gómez, Sergio; and Arenas, Alex [2013]. Mathematical Formulation of Multilayer Networks, Physical Review X, Vol. 3, No. 4: 041022.

[2] Taylor, Dane; Myers, Sean A.; Clauset, Aaron; Porter, Mason A.; and Mucha, Peter J. [2016]. Eigenvector-based Centrality Measures for Temporal Networks

[3] Tiago P. Peixoto; Inferring the mesoscale structure of layered, edge-valued, and time-varying networks. Phys. Rev. E 92, 042807

[4] Mucha, Peter J.; Richardson, Thomas; Macon, Kevin; Porter, Mason A.; and Onnela, Jukka-Pekka [2010]. Community Structure in Time-Dependent, Multiscale, and Multiplex Networks, Science, Vol. 328, No. 5980: 876-878.

[5] Bassett, Danielle S.; Wymbs, Nicholas F.; Porter, Mason A.; Mucha, Peter J.; Carlson, Jean M.; and Grafton, Scott T. [2011]. Dynamic Reconfiguration of Human Brain Networks During Learning, Proceedings of the National Academy of Sciences of the United States of America, Vol. 118, No. 18: 7641-7646.

At this week’s group meeting I presented on my second SABS short project, which is supervised by Charlotte Deane, Mason Porter, and Jonny Wray from e-Therapeutics. It has the title “Multilayer-Network Analysis of Protein Interaction Networks”.
Protein interactions can be represented using networks. Accordingly, approaches that have been developed in network science are appropriate for the analysis of protein interactions, and they can lead to the detection of new drug targets. Thus far, only ordinary (“monolayer”) protein interaction networks have been exploited for drug discovery. However, because “multilayer networks” allow the representation of multiple types of interactions and of time-dependent interactions, they have the potential to improve insight from network-based approaches [1].
Aim of my project was to employ known multilayer methods on well-established data to investigate potential use cases of multilayer protein interaction networks. We focussed on various community detection methods [3,4] to find groups of proteins as candidates of functional, biological modules. Additionally, temporal centrality [5] measures were used to identify important proteins across time.

References:
[1] Kivelä, Mikko, et al. “Multilayer networks.” Journal of Complex Networks (2014) [2] Calvano, Steve E., et al. “A network-based analysis of systemic inflammation in humans.” Nature (2005) [3] Peixoto, Tiago P. “Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models.” PRE (2014) [4] Mucha, Peter J., et al. “Community structure in time-dependent, multiscale, and multiplex networks.” Science (2010) [5] Taylor, Dane, et al. “Eigenvector-Based Centrality Measures for Temporal Networks.” arXiv preprint (2015).

Oxford Protein Informatics Group

or "OPIG" to friends