Prediction of Parkinson subtypes at COXIC 2020

Last week I attended the COXIC seminar (joint seminar Oxford – Imperial focused on networks and complex systems) organised by Florian Klimm from Imperial College London (and former OPIG member!). We had several interesting at the seminar. However, one of them caught my eye more than the rest. It was the talk of Dr Sanjukta Krishnagopal (UCL) titled Predicting Parkinson’s Sub-types through Trajectory Clustering in Bipartite Networks​, of which I will give a quick insight. Hope you like it (at least) as much as I did!

This blogpost is based on these two articles:

  1. Sanjukta Krishnagopal, Rainer Von Coelln, Lisa Shulman, Michelle Girvan. “Identifying and predicting Parkinson’s disease subtypes through trajectory clustering via bipartite networks” PloS one (2020)​
  2. Sanjukta Krishnagopal. “Multi-later Trajectory Clustering Network Algorithm for Disease Subtyping” Biomedical Physics & Engineering Express (2020)​

Parkinson’s disease (PD) is the second most common neurodegenerative disorder. The disease course is variable with the age of onset and rate of progression differing across the population. Unfortunately, there is currently no consensus on Parkinson’s subtypes that are biologically valid and clinically relevant. During the talk, the author presented two methods to predict PD subtypes from clinical data from patients studied across five years and using bipartite networks. In these networks, there are two different sets of nodes: patients and variables (demographics data and PD symptoms mainly).

First order Bipartite Trajectory Similarity

For each time point (year), they generate a bipartite network to model connections between individuals and disease variables (see image below). Then, they stack all the networks in a three-dimensional array containing the values of the variables of all patients at each time point.

An illustration of an individual-variable bipartite graph at one timestep (left). Set of bipartite graphs across time (right). From Krishnagopal et. al 2020.

For each patient, they compute their trajectory profile: a matrix whose rows contain the normalised scores of all the variables at a given time point.​ Finally, they create a patient-patient similarity matrix where the distance between each pair of individuals relates to the distance between their trajectory profiles, on which they perform standard Louvain community detection to identify patient subtypes. These subtypes will be characterised by unique trajectory profiles.

Once they have clustered the patients, they assign to each subtype a variable value for each time point (see image below, top). These values are the average values for all patients in the subtype. They use these subtype characterisation to predict the final (year 4) subtype of 39 patients using their baseline (year 0) variable information. The data for these patients was not used in the identification of the subtypes. The subtype with which the patient has minimum baseline distance is the ‘predicted subtype’. The image below (bottom) shows the Euclidean distances between the profiles of patients and the profiles of each subtype for different years. The predicted subtype (coloured in red) had the smallest distance with the actual patient profile for 72% of the patients.

Top: PD subtypes profiles across different years. Bottom: Prediction of the PD subtype of 39 patients. Adapted from Krishnagopal et. al 2020.

Second Order Multilayer Trajectory Similarity​

They generate an independent patient-variable bipartite network for each layer, where a layer represents a range of outcome variable values (see image below, left). ​Then, they implement community detection on each layer to identify variable-communities comprising of patients and variables.​ As each patient at each timepoint belongs to a different community, they can track the trajectories of each patient across different communities, which may or may not be in the same layer (see image below (right)).

Left: bipartite patient-variable network for one layer. The highlighted ovals represent variable-communities consisting of patients and disease variables. Right: represents a stacked multi-layer graph over three layers, where the variable-communities from (left) form the first layer. Three sample trajectories are shown, and the corresponding closeness between nodes X and Y is calculated. From Krishnagopal, 2020.

They define the node closeness between two patient/variables nodes as the fraction of all trajectories passing through the two nodes that overlap​. Then, they construct a patient-patient similarity matrix whose values are the sum of the corresponding node closeness. Lastly, they perform community detection on this latest matrix to identify patient subtypes. These subtypes are characterised by emergent patterns in variable interaction that emerge through higher-order relationships.

The image below shows the subtypes identified by the trajectory clustering algorithm the size of the nodes indicate the number of patient-years in that variable-community. The lines connecting different communities represent the patients’ trajectories and each trajectory cluster is depicted using a different colour.

Trajectory clustering across the outcome variable. Each node represents a variable-community consisting of variables and patients. From Krishnagopal, 2020.

Author