{"id":1543,"date":"2017-01-31T15:41:48","date_gmt":"2017-01-31T15:41:48","guid":{"rendered":"http:\/\/www.blopig.com\/blog\/?p=1543"},"modified":"2017-01-31T19:07:46","modified_gmt":"2017-01-31T19:07:46","slug":"r_or_python","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2017\/01\/r_or_python\/","title":{"rendered":"R or Python for data vis?"},"content":{"rendered":"<p>Python users: ever wanted to learn R?<br \/>\nR users: ever wanted to learn Python?<br \/>\nCheck out:\u00a0http:\/\/mathesaurus.sourceforge.net\/r-numpy.html<\/p>\n<p>Both languages are incredibly powerful for doing large-scale data analyses. They both have amazing data visualisation platforms, allowing you to make custom graphs very easily (e.g. with your own set of fonts, color palette choices, etc.) These are just a quick run-down of the good, bad, and ugly:<\/p>\n<p><strong>R<\/strong><\/p>\n<ul>\n<li>The good:\n<ul>\n<li>More established in statistical analyses; if you can&#8217;t find an R package for something, chances are it won&#8217;t be available in Python either.<\/li>\n<li>Data frame parsing is fast and efficient, and incredibly easy to use (e.g. indexing specific rows, which is surprisingly hard in Pandas)<\/li>\n<li>If GUIs are your thing, there are programs like Rstudio that mesh the console, plotting, and code.<\/li>\n<\/ul>\n<\/li>\n<li>The bad:\n<ul>\n<li>For loops are traditionally slow, meaning that you have to use lots of apply commands (e.g. <code class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">tapply<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\">sapply<\/code>).<\/li>\n<\/ul>\n<\/li>\n<li>The ugly:\n<ul>\n<li>Help documentation can be challenging to\u00a0read and follow, leading to (potentially) a steep learning curve.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><strong>Python<\/strong><\/p>\n<ul>\n<li>The good:\n<ul>\n<li>If you have existing code in Python (e.g. analysing protein sequences\/structures), then you can plot straight away without having to save it as a separate CSV file for analysis, etc.<\/li>\n<li>Lots of support for different packages such as NumPy, SciPy, Scikit Learn, etc., with good documentation and lots of help on forums (e.g. Stack Overflow)<\/li>\n<li>It&#8217;s more useful for string manipulation (e.g.\u00a0parsing out the ordering of IMGT numbering for antibodies, which goes from 111A-&gt;111B-&gt;112B-&gt;112A-&gt;112)<\/li>\n<\/ul>\n<\/li>\n<li>The bad:\n<ul>\n<li>Matplotlib, which is the go-to for data visualisation, has a pretty\u00a0steep learning curve.<\/li>\n<\/ul>\n<\/li>\n<li>The ugly:\n<ul>\n<li>For statistical analyses,\u00a0model building can have an unusual syntax. For example, building a linear model in R is incredibly easy (<code class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\">lm<\/code>), whereas Python involves\u00a0<code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">sklearn.linear_model.LinearRegression().fit<\/code>. Otherwise you have to code up a lot of things yourself, which might not be practical.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>For me, Python\u00a0wins because I find it&#8217;s much easier to create an analysis pipeline where you can go from raw data (e.g. PDB structures) to analysing it (e.g. with BioPython) then plotting custom graphics. Another big selling point is that Python packages have great documentation. Of course, there are libraries to do the analyses\u00a0in R but the level of freedom, I find, is a bit more restricted, and R&#8217;s documentation means you&#8217;re often stuck interpreting what the package vignette is saying, rather than doing actual coding.<\/p>\n<p>As for plotting (because pretty graphs are where it&#8217;s at!), here&#8217;s a very simple implementation of plotting the densities of two normal distributions, along with their means and standard deviations.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">import numpy as np\r\nfrom matplotlib import rcParams\r\n\r\n# plt.style.use('xkcd') # A cool feature of matplotlib is stylesheets, e.g. make your plots look XKCD-like\r\n\r\n# change font to Arial\r\n# you can change this to any TrueType font that you have in your machine\r\nrcParams['font.family'] = 'sans-serif'\r\nrcParams['font.sans-serif'] = ['Arial']\r\n\r\nimport matplotlib.pyplot as plt\r\n# Generate two sets of numbers from a normal distribution\r\n# one with mean = 4 sd = 0.5, another with mean (loc) = 1 and sd (scale) = 2\r\nrandomSet = np.random.normal(loc = 4, scale = 0.5, size = 1000)\r\nanotherRandom = np.random.normal(loc = 1, scale = 2, size = 1000)\r\n\r\n# Define a Figure and Axes object using plt.subplots\r\n# Axes object is where we do the actual plotting (i.e. draw the histogram)\r\n# Figure object is used to configure the actual figure (e.g. the dimensions of the figure)\r\nfig, ax = plt.subplots()\r\n\r\n# Plot a histogram with custom-defined bins, with a blue colour, transparency of 0.4\r\n# Plot the density rather than the raw count using normed = True\r\nax.hist(randomSet, bins = np.arange(-3, 6, 0.5), color = '#134a8e', alpha = 0.4, normed = True)\r\nax.hist(anotherRandom, bins = np.arange(-3, 6, 0.5), color = '#e8291c', alpha = 0.4, normed = True)\r\n\r\n# Plot solid lines for the means\r\nplt.axvline(np.mean(randomSet), color = 'blue')\r\nplt.axvline(np.mean(anotherRandom), color = 'red')\r\n\r\n# Plot dotted lines for the std devs\r\nplt.axvline(np.mean(randomSet) - np.std(randomSet), linestyle = '--', color = 'blue')\r\nplt.axvline(np.mean(randomSet) + np.std(randomSet), linestyle = '--', color = 'blue')\r\n\r\nplt.axvline(np.mean(anotherRandom) - np.std(anotherRandom), linestyle = '--', color = 'red')\r\nplt.axvline(np.mean(anotherRandom) + np.std(anotherRandom), linestyle = '--', color = 'red')\r\n\r\n# Set the title, x- and y-axis labels\r\nplt.title('A fancy plot')\r\nax.set_xlabel(\"Value of $x$\") \r\nax.set_ylabel(\"Density\")\r\n\r\n# Set the Figure's size as a 5in x 5in figure\r\nfig.set_size_inches((5,5))\r\n\r\n<\/pre>\n<div id=\"attachment_3323\" style=\"width: 635px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/01\/figure.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" aria-describedby=\"caption-attachment-3323\" loading=\"lazy\" class=\"size-large wp-image-3323\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/01\/figure.png?resize=625%2C625&#038;ssl=1\" alt=\"\" width=\"625\" height=\"625\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/01\/figure.png?resize=1024%2C1024&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/01\/figure.png?resize=150%2C150&amp;ssl=1 150w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/01\/figure.png?resize=300%2C300&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/01\/figure.png?resize=768%2C768&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/01\/figure.png?resize=624%2C624&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/01\/figure.png?w=1500&amp;ssl=1 1500w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/01\/figure.png?w=1250&amp;ssl=1 1250w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><p id=\"caption-attachment-3323\" class=\"wp-caption-text\">Figure made by matplotlib using the code above.<\/p><\/div>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">randomSet = rnorm(mean = 4, sd = 0.5, n = 1000)\r\nanotherRandom = rnorm(mean = 1, sd = 2, n = 1000)\r\n\r\n# Let's define a range to plot the histogram for binning;\r\nlimits = range(randomSet, anotherRandom)\r\nlbound = limits[1] - (diff(limits) * 0.1)\r\nubound = limits[2] + (diff(limits) * 0.1)\r\n# use freq = F to plot density\r\n# in breaks, we define the bins of the histogram by providing a vector of values using seq\r\n# xlab, ylab define axis labels; main sets the title\r\n# rgb defines the colour in RGB values from 0-1, with the fourth digit setting transparency\r\n# e.g. rgb(0,1,0,1) is R = 0, G = 1, B = 0, with a alpha of 1 (i.e. not transparent)\r\nhist(randomSet, freq = F, breaks = seq(lbound, ubound, 0.5), col = rgb(0,0,1,0.4), xlab = 'Value of x', ylab = 'Density', main = 'A fancy plot')\r\n# Use add = T to keep both histograms in one graph\r\n# other parameters, such as breaks, etc., can be introduced here\r\nhist(anotherRandom, freq = F, breaks = seq(lbound, ubound, 0.5), col = rgb(1,0,0,0.4), add = T)\r\n\r\n# Plot vertical lines with v =\r\n# lty = 2 generates a dashed line\r\nabline(v = c(mean(randomSet), mean(anotherRandom)), col = c('blue', 'red'))\r\n\r\nabline(v = c(mean(randomSet)-sd(randomSet), mean(randomSet)+sd(randomSet)), col = 'blue', lty = 2)\r\nabline(v = c(mean(anotherRandom)-sd(anotherRandom), mean(anotherRandom)+sd(anotherRandom)), col = 'red', lty = 2)<\/pre>\n<div id=\"attachment_3322\" style=\"width: 490px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/01\/blah.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" aria-describedby=\"caption-attachment-3322\" loading=\"lazy\" class=\"size-full wp-image-3322\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/01\/blah.png?resize=480%2C480&#038;ssl=1\" alt=\"\" width=\"480\" height=\"480\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/01\/blah.png?w=480&amp;ssl=1 480w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/01\/blah.png?resize=150%2C150&amp;ssl=1 150w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2017\/01\/blah.png?resize=300%2C300&amp;ssl=1 300w\" sizes=\"auto, (max-width: 480px) 100vw, 480px\" \/><\/a><p id=\"caption-attachment-3322\" class=\"wp-caption-text\">Similar figure made using R code from above.<\/p><\/div>\n<p><em>*Special thanks go out to Ali and Lyuba for helpful fixes to make the R code more efficient!<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Python users: ever wanted to learn R? R users: ever wanted to learn Python? Check out:\u00a0http:\/\/mathesaurus.sourceforge.net\/r-numpy.html Both languages are incredibly powerful for doing large-scale data analyses. They both have amazing data visualisation platforms, allowing you to make custom graphs very easily (e.g. with your own set of fonts, color palette choices, etc.) These are just [&hellip;]<\/p>\n","protected":false},"author":22,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[29],"tags":[],"ppma_author":[511],"class_list":["post-1543","post","type-post","status-publish","format-standard","hentry","category-code"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":511,"user_id":22,"is_guest":0,"slug":"jinwoo","display_name":"Jinwoo Leem","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/65d338dc0b03d3026aa9a98f5e43889ca6c9ac9d0f45fe65ea5931207597ce2d?s=96&d=mm&r=g","author_category":"","user_url":"","last_name":"Leem","first_name":"Jinwoo","job_title":"","description":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/1543","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/22"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=1543"}],"version-history":[{"count":10,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/1543\/revisions"}],"predecessor-version":[{"id":3325,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/1543\/revisions\/3325"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=1543"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=1543"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=1543"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1543"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}