Monthly Archives: March 2015

Graphical User Interface for MOSAICS as a Pymol plugin — PymoSAICS

MOSAICS is suite of sampling methods for molecular simulations of motion of nucleic acid and protein structures. It’s applicability has been demonstrated in simulating large ensembles of nucleic acids (Sim 2012, Minary 2014) and proteins.

Starting with a protein/dna/rna structure you would like to examine, the basic modus operandi of MOSAICS is divided into three parts:

  • Pick your energy function or a statistical/empirical potentials (e.g. empirical Amber or CHARMM)
  • Pick your sampling methodology — e.g. parallel tempering
  • Details of simulation: solvent (implicit?), degrees of freedom (cartesian, torsional?) etc.

The energy function defines the energy surface with respect to your degrees of freedom (DoFs) and the sampling methodology is supposed to explore the conformational space along DoFs.

One of the main fortes of MOSAICS lies in the ability of defining hierarchichal natural moves. Defining regions of collective motion introduces experimental knowledge and intuition into the simulations, greatly accelerating sampling. Ability to define such regions  was one of the main reasons to start the development of the graphical user interface (GUI) for MOSAICS — preliminary version can be seen in the Figure below.


Overview of PymoSAICS in its current form.

Since we are developing the GUI as a plugin for Pymol, we called it PymoSAICS. The initial focus of the project is on nucleic acids due to our interests in the structural effects of epigenetic modifications. As demonstrated in the Figure above, the GUI is divided into three main panels:

  • Current run — prepare a simulation
  • Simulation Manager — manage previous runs, import, export protocols
  • Help — That’s just a link to our website!

Users can upload their favorite structure via PymoSAICS or Pymol and play with the available parameters. The GUI is also an ongoing effort to streamline the available protocols in MOSAICS to shield the user from the many parameters that are available but perhaps not relevant to the simulation at hand.

We are currently starting beta tests of the application which (if you don’t mind not getting any support just yet) is available here, Therefore, if you are interested in becoming a tester please let me know, and you will receive a version around Easter (April-ish)! Contact me via konrad.krawczyk at


A topology-based distance measure for network data

In last week’s group meeting, I introduced our network comparison method (Netdis) and presented some new results that enable the method to be applied to larger networks.

The most tractable methods for network comparison are those which compare at the level of the entire network using statistics that describe global properties, but these statistics are not sensitive enough to be able to reconstruct phylogeny or shed light on evolutionary processes. In contrast, there are several network alignment based methods that compare networks using the properties of the individual proteins (nodes) e.g. local network similarity and/or protein functional or sequence similarity. The aim of these methods is to identify matching proteins/nodes between networks and use these to identify exact or close sub-network matches. These methods are usually computationally intensive and tend to yield an alignment which contains only a relatively small proportion of the network, although this has been alleviated to some extent in more recent methods.

Thus, we do not follow the network alignment paradigm, but instead we take our lead from alignment-free sequence comparison methods that have been used to identify evolutionary relationships. Alignment-free methods based on k-tuple counts (also called k-grams or k-words) have been applied to construct trees from sequence data. A key feature is the standardisation of the counts to separate the signal from the background noise. Inspired by alignment-free sequence comparison we use subgraph counts instead of sequence homology or functional one-to-one matches to compare networks. Our proposed method, Netdis, compares the subgraph content not of the networks themselves but instead of the ensemble of all protein neighbourhoods (ego-networks) in each network, through an averaging many-to-many approach. The comparison between these ensembles is summarised in a Netdis value, which in turn is used as input for phylogenetic tree reconstruction.

Effect of sub-sampling egos on the resulting grouping of networks generated by Netdis. Higher Rand index values indicate better fit to non-sampling results.

Fig1: Effect of sub-sampling egos on the resulting grouping of networks generated by Netdis. Higher Rand index values indicate better fit to non-sampling results.

Extensive tests on simulated and empirical data-sets show that it is not necessary to analyze all possible ego-networks within a network for Netdis to work. Our results indicate that in general, randomly sampling around 10% of egos in each network results in a very similar clustering of networks on average, compared to the tree with 100% sampling (Fig 1). This result has important implications for use-cases where eextremely large graphs are to be compared (e.g > 100,000 nodes). Related to the ego-nework sub-sampling idea is the notion of size-limiting the ego-networks that are to be analyzed by the algorithm. Our tests show that the vast majrity of ego-netowrks in most networks have a relatively low coverage of the overall network. Moreover, by introducing lower-size threshold on the egos, we observe better results on average. Together, this means a limited range of ego-network sizes to be analyzed for each network, which should lead to better statitical properties as the sub-sampling scheme is inspired by bootstrapping.