Drawing Custom Unrooted Trees from Sequence Alignments

Multiple Sequence Alignments can provide a lot of information relating to the relationships between proteins. One notable example was the map of the kinome space published in 2002 (Figure 1).

 

Figure 1. Kinase space as presented by Manning et al. 2002;

Such images organize our thinking about the possible space of such proteins/genes going beyond long lists of multiple sequence alignments. The image in Figure 1, got a revamp later which now is the popular ‘kinome poster’ (Figure 2).

Revamped dendrogram of the kinome fro Fig. 1. Downloaded from http://i.imgur.com/BPLUvfc.png.

Here we have created a script to produce similar dendrograms straight from the multiple sequence alignment files (although clearly not as pretty as Fig 2!). It is not difficult to find software that would produce ‘a dendrogram’ from an MSA but making it do the simple thing of annotating the nodes with colors, shapes etc. with respect to the labels of the genes/sequences is slightly more problematic. Sizes might correspond to the importance of given nodes and colors can organize by their tree branches. The script uses the Biopython module Phylo to construct a tree from an arbitrary MSA and networkx to draw it:

python Treebeard.py
import networkx, pylab
from networkx.drawing.nx_agraph import graphviz_layout
from Bio import Phylo
from Bio.Phylo.TreeConstruction import DistanceCalculator
from Bio.Phylo.TreeConstruction import DistanceTreeConstructor
from Bio import AlignIO

#What color to give to the edges?
e_color = '#ccccff'
#What colors to give to the nodes with similar labels?
color_scheme = {'RSK':'#e60000','SGK':'#ffff00','PKC':'#32cd32','DMPK':'#e600e6','NDR':'#3366ff','GRK':'#8080ff','PKA':'magenta','MAST':'green','YANK':'pink'}
#What sizes to give to the nodes with similar labels?
size_scheme = {'RSK':200,'SGK':150,'PKC':350,'DMPK':400,'NDR':280,'GRK':370,'PKA':325,'MAST':40,'YANK':200}

#Edit this to produce a custom label to color mapping
def label_colors(label):
	color_to_set = 'blue'
	for label_subname in color_scheme:
		if label_subname in label:
			color_to_set = color_scheme[label_subname]
	return color_to_set

#Edit this to produce a custom label to size mapping
def label_sizes(label):
	#Default size
	size_to_set = 20
	for label_subname in size_scheme:
		if label_subname in label:
			size_to_set = size_scheme[label_subname]
	return size_to_set

#Draw a tree whose alignment is stored in msa.phy
def draw_tree():
	
	#This loads the default kinase alignment that should be in the same directory as this script
	aln = AlignIO.read('agc.aln', 'clustal')
	#This will construct the unrooted tree.
	calculator = DistanceCalculator('identity')
	dm = calculator.get_distance(aln)
	constructor = DistanceTreeConstructor()
	tree = constructor.nj(dm)
	G = Phylo.to_networkx(tree)
	node_sizes = []
	labels = {}
	node_colors = []
	for n in G:
		label = str(n)
		if 'Inner' in label:
			#These are the inner tree nodes -- leave them blank and with very small sizes.
			node_sizes.append( 1 )
			labels[n] = ''
			node_colors.append(e_color)
		else:
			#Size of the node depends on the labels!
			node_sizes.append( label_sizes(label) )
			#Set colors depending on our color scheme and label names
			node_colors.append(label_colors(label))
			#set the label that will appear in each node			
			labels[n] = label
	#Draw the tree given the info we provided!
	pos = graphviz_layout(G)
	networkx.draw(G, pos,edge_color=e_color,node_size = node_sizes, labels=labels, with_labels=True,node_color=node_colors)
	#Showing	
	pylab.show()
	#Saving the image -- uncomment
	#pylab.savefig('example.png')

if __name__ == '__main__':
	
	draw_tree()

We are going to use the kinase alignment example to demonstrate how the script can be used. The kinase alignment we use can be found here on the kinase.com website. We load the alignment and construct the unrooted tree using the Bio.Phylo module. Note that on each line of the alignment there is a name. These names are the labels that we use to define the colors and sizes of nodes. There are two dummy functions that achieve that label_nodes() and label_sizes() — if you look at them it should be clear how to define your own custom labeling.

If you download the code and the alignment and run it by:

python Treebeard.py

You should see a similar image as in Fig 3.

Fig 3. Size-color-customized unrooted tree straight from a multiple sequence alignment file of protein kinases. Constructed using the script Treebeard.py