{"id":2873,"date":"2016-02-25T12:18:22","date_gmt":"2016-02-25T12:18:22","guid":{"rendered":"http:\/\/www.blopig.com\/blog\/?p=2873"},"modified":"2016-02-25T12:25:55","modified_gmt":"2016-02-25T12:25:55","slug":"drawing-custom-unrooted-trees-from-sequence-alignments","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2016\/02\/drawing-custom-unrooted-trees-from-sequence-alignments\/","title":{"rendered":"Drawing Custom Unrooted Trees from Sequence Alignments"},"content":{"rendered":"<p>Multiple Sequence Alignments can provide a lot of information relating to the relationships between proteins. One notable example was the map of the kinome space <a href=\"http:\/\/science.sciencemag.org\/content\/298\/5600\/1912.full\">published in 2002 (Figure 1)<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<div style=\"width: 311px\" class=\"wp-caption alignnone\"><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" class=\"\" src=\"https:\/\/i0.wp.com\/d2ufo47lrtsv5s.cloudfront.net\/content\/sci\/298\/5600\/1912\/F1.large.jpg?resize=301%2C254&#038;ssl=1\" alt=\"\" width=\"301\" height=\"254\" \/><p class=\"wp-caption-text\">Figure 1. Kinase space as presented by Manning et al. 2002;<\/p><\/div>\n<p>Such images organize our thinking about the possible space of such proteins\/genes going beyond long lists of multiple sequence alignments. The image in Figure 1, got a revamp later which now is the popular &#8216;kinome poster&#8217; (Figure 2).<\/p>\n<div style=\"width: 304px\" class=\"wp-caption alignnone\"><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" class=\"\" src=\"https:\/\/i0.wp.com\/i.imgur.com\/BPLUvfc.png?resize=294%2C380\" alt=\"\" width=\"294\" height=\"380\" \/><p class=\"wp-caption-text\">Revamped dendrogram of the kinome fro Fig. 1. Downloaded from http:\/\/i.imgur.com\/BPLUvfc.png.<\/p><\/div>\n<p>Here we have created a script to produce similar dendrograms straight from the multiple sequence alignment files (although clearly not as pretty as Fig 2!). It is not difficult to find software that would produce &#8216;a dendrogram&#8217; from an MSA but making it do the simple thing of annotating the nodes with colors, shapes etc. with respect to the labels of the genes\/sequences is slightly more problematic. Sizes might correspond to the importance of given nodes and colors can organize by their tree branches. The script uses the Biopython module Phylo to construct a tree from an arbitrary MSA and networkx to draw it:<\/p>\n<pre class=\"lang:py decode:true\">python Treebeard.py\r\nimport networkx, pylab\r\nfrom networkx.drawing.nx_agraph import graphviz_layout\r\nfrom Bio import Phylo\r\nfrom Bio.Phylo.TreeConstruction import DistanceCalculator\r\nfrom Bio.Phylo.TreeConstruction import DistanceTreeConstructor\r\nfrom Bio import AlignIO\r\n\r\n#What color to give to the edges?\r\ne_color = '#ccccff'\r\n#What colors to give to the nodes with similar labels?\r\ncolor_scheme = {'RSK':'#e60000','SGK':'#ffff00','PKC':'#32cd32','DMPK':'#e600e6','NDR':'#3366ff','GRK':'#8080ff','PKA':'magenta','MAST':'green','YANK':'pink'}\r\n#What sizes to give to the nodes with similar labels?\r\nsize_scheme = {'RSK':200,'SGK':150,'PKC':350,'DMPK':400,'NDR':280,'GRK':370,'PKA':325,'MAST':40,'YANK':200}\r\n\r\n#Edit this to produce a custom label to color mapping\r\ndef label_colors(label):\r\n\tcolor_to_set = 'blue'\r\n\tfor label_subname in color_scheme:\r\n\t\tif label_subname in label:\r\n\t\t\tcolor_to_set = color_scheme[label_subname]\r\n\treturn color_to_set\r\n\r\n#Edit this to produce a custom label to size mapping\r\ndef label_sizes(label):\r\n\t#Default size\r\n\tsize_to_set = 20\r\n\tfor label_subname in size_scheme:\r\n\t\tif label_subname in label:\r\n\t\t\tsize_to_set = size_scheme[label_subname]\r\n\treturn size_to_set\r\n\r\n#Draw a tree whose alignment is stored in msa.phy\r\ndef draw_tree():\r\n\t\r\n\t#This loads the default kinase alignment that should be in the same directory as this script\r\n\taln = AlignIO.read('agc.aln', 'clustal')\r\n\t#This will construct the unrooted tree.\r\n\tcalculator = DistanceCalculator('identity')\r\n\tdm = calculator.get_distance(aln)\r\n\tconstructor = DistanceTreeConstructor()\r\n\ttree = constructor.nj(dm)\r\n\tG = Phylo.to_networkx(tree)\r\n\tnode_sizes = []\r\n\tlabels = {}\r\n\tnode_colors = []\r\n\tfor n in G:\r\n\t\tlabel = str(n)\r\n\t\tif 'Inner' in label:\r\n\t\t\t#These are the inner tree nodes -- leave them blank and with very small sizes.\r\n\t\t\tnode_sizes.append( 1 )\r\n\t\t\tlabels[n] = ''\r\n\t\t\tnode_colors.append(e_color)\r\n\t\telse:\r\n\t\t\t#Size of the node depends on the labels!\r\n\t\t\tnode_sizes.append( label_sizes(label) )\r\n\t\t\t#Set colors depending on our color scheme and label names\r\n\t\t\tnode_colors.append(label_colors(label))\r\n\t\t\t#set the label that will appear in each node\t\t\t\r\n\t\t\tlabels[n] = label\r\n\t#Draw the tree given the info we provided!\r\n\tpos = graphviz_layout(G)\r\n\tnetworkx.draw(G, pos,edge_color=e_color,node_size = node_sizes, labels=labels, with_labels=True,node_color=node_colors)\r\n\t#Showing\t\r\n\tpylab.show()\r\n\t#Saving the image -- uncomment\r\n\t#pylab.savefig('example.png')\r\n\r\nif __name__ == '__main__':\r\n\t\r\n\tdraw_tree()\r\n<\/pre>\n<p>We are going to use the kinase alignment example to demonstrate how the script can be used. The kinase alignment we use can be found <a href=\"http:\/\/kinase.com\/human\/kinome\/groups\/agc.aln\">here<\/a> on the kinase.com website. We load the alignment and construct the unrooted tree using the Bio.Phylo module. Note that on each line of the alignment there is a name. These names are the labels that we use to define the colors and sizes of nodes. There are two dummy functions that achieve that label_nodes() and label_sizes() &#8212; if you look at them it should be clear how to define your own custom labeling.<\/p>\n<p>If you <a href=\"http:\/\/www.stats.ox.ac.uk\/~krawczyk\/treebeard\/Treebeard.py\">download the code<\/a> and <a href=\"http:\/\/kinase.com\/human\/kinome\/groups\/agc.aln\">the alignment<\/a> and run it by:<\/p>\n<pre class=\"lang:sh decode:true\">python Treebeard.py\r\n<\/pre>\n<p>You should see a similar image as in Fig 3.<\/p>\n<div style=\"width: 372px\" class=\"wp-caption alignnone\"><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" class=\"\" src=\"https:\/\/i0.wp.com\/www.stats.ox.ac.uk\/~krawczyk\/treebeard\/example.png?resize=362%2C272\" alt=\"\" width=\"362\" height=\"272\" \/><p class=\"wp-caption-text\">Fig 3. Size-color-customized unrooted tree straight from a multiple sequence alignment file of protein kinases. Constructed using the script Treebeard.py<\/p><\/div>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Multiple Sequence Alignments can provide a lot of information relating to the relationships between proteins. One notable example was the map of the kinome space published in 2002 (Figure 1). &nbsp; Such images organize our thinking about the possible space of such proteins\/genes going beyond long lists of multiple sequence alignments. The image in Figure [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[29],"tags":[],"ppma_author":[482],"class_list":["post-2873","post","type-post","status-publish","format-standard","hentry","category-code"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":482,"user_id":4,"is_guest":0,"slug":"konrad","display_name":"Konrad Krawczyk","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/fdb224fe7b0775e3c9a6956ae2a5ffd7c35ab8ce3ff99c5f6e0a51d45557cdd6?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/2873","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=2873"}],"version-history":[{"count":8,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/2873\/revisions"}],"predecessor-version":[{"id":2881,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/2873\/revisions\/2881"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=2873"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=2873"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=2873"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=2873"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}