{"id":9617,"date":"2023-04-11T09:24:16","date_gmt":"2023-04-11T08:24:16","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=9617"},"modified":"2023-04-18T16:22:41","modified_gmt":"2023-04-18T15:22:41","slug":"better-histograms-with-python","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2023\/04\/better-histograms-with-python\/","title":{"rendered":"Better histograms with Python"},"content":{"rendered":"\n<p>Histograms are frequently used to visualize the distribution of a data set or to compare between multiple distributions. Python, via matplotlib.pyplot, contains convenient functions for plotting histograms; the default plots it generates, however, leave much to be desired in terms of visual appeal and clarity. <br><br>The two code blocks below generate histograms of two normally distributed sets using default matplotlib.pyplot.hist settings and then, in the second block, I add some lines to improve the data presentation. See the comments to determine what each individual line is doing. <\/p>\n\n\n\n<!--more-->\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">## DEFAULT HISTOGRAMS\nimport matplotlib.pyplot as plt\nimport numpy as np\n\n# set random seed so we get reproducible behavior\nnp.random.seed(1)\n\n# generate two data series each containing 1,000 normally distributed values\nd1 = np.random.normal(5.0, 2.0, 1000)\nd2 = np.random.normal(6.0, 2.0, 1000)\n\n# make the plot with default settings\nplt.clf()\nplt.hist(d1)\nplt.hist(d2)\nplt.savefig('default_hist.png', dpi=300)<\/pre>\n\n\n\n<p>The output of this program is:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"417\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/04\/default_hist.png?resize=625%2C417&#038;ssl=1\" alt=\"\" class=\"wp-image-9621\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/04\/default_hist.png?w=1800&amp;ssl=1 1800w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/04\/default_hist.png?resize=300%2C200&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/04\/default_hist.png?resize=1024%2C683&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/04\/default_hist.png?resize=768%2C512&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/04\/default_hist.png?resize=1536%2C1024&amp;ssl=1 1536w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/04\/default_hist.png?resize=624%2C416&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/04\/default_hist.png?w=1250&amp;ssl=1 1250w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/figure>\n\n\n\n<p>And now for the slightly longer but much improved histogram code:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">## BETTER HISTOGRAMS\nimport matplotlib.pyplot as plt\nimport numpy as np\n\n# set random seed so we get reproducible behavior\nnp.random.seed(1)\n\n# generate two data series each containing 1,000 normally distributed values\nd1 = np.random.normal(5.0, 2.0, 1000)\nd2 = np.random.normal(6.0, 2.0, 1000)\n\n# make the plot\nplt.clf()\n\n# generate subplot object so we can modify axis lines easily\nax = plt.subplot(111)\n\n# updated histogram commands\n# use colors that can be differentiated by the colorblind from Paul Tol's notes\n# do not use \"filled\" histograms so all bin heights can be seen clearly\nplt.hist(d1, histtype='step', color='#EE8026', label='Data Set 1', alpha=0.7)\nplt.hist(d2, histtype='step', color='#BA8DB4', label='Data Set 2', alpha=0.7)\n\n# new things\nax.spines['top'].set_visible(False)   # turn off top line\nax.spines['right'].set_visible(False) # turn off right line\nplt.ylabel('Counts')                  # label the y axis\nplt.xlabel('Values')                  # label the x axis\nplt.xlim(-2, 14)                      # set x limits that span full data range\nplt.ylim(-10, 300)                    # set y limits so that full range can be seen\nplt.legend(loc='best', fancybox=True) # add a legend\n\nplt.savefig('better_hist.png', dpi=300)<\/pre>\n\n\n\n<p>The result of this program is:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"417\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/04\/better_hist.png?resize=625%2C417&#038;ssl=1\" alt=\"\" class=\"wp-image-9620\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/04\/better_hist.png?w=1800&amp;ssl=1 1800w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/04\/better_hist.png?resize=300%2C200&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/04\/better_hist.png?resize=1024%2C683&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/04\/better_hist.png?resize=768%2C512&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/04\/better_hist.png?resize=1536%2C1024&amp;ssl=1 1536w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/04\/better_hist.png?resize=624%2C416&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/04\/better_hist.png?w=1250&amp;ssl=1 1250w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/figure>\n\n\n\n<p>This second plot is easier to read, has less visual clutter thanks to the removal of the &#8220;filled&#8221; histograms, and has labeled axes. The choice of histogram bins is an important consideration that I am not going to touch on here. You can experiment yourself to see how adding, for example, bins=&#8217;fd&#8217; to the plt.hist calls in the second program above changes the visual depiction of the results with all else held constant. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Histograms are frequently used to visualize the distribution of a data set or to compare between multiple distributions. Python, via matplotlib.pyplot, contains convenient functions for plotting histograms; the default plots it generates, however, leave much to be desired in terms of visual appeal and clarity. The two code blocks below generate histograms of two normally [&hellip;]<\/p>\n","protected":false},"author":69,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[721,723,722,256,720],"ppma_author":[542],"class_list":["post-9617","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-data-visualization","tag-general-programming","tag-making-figures","tag-plotting","tag-python3"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":542,"user_id":69,"is_guest":0,"slug":"dan","display_name":"Daniel Nissley","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/fd064a72579f11063ca36621317b744b6bc9df79116bc01af9b57f531bf10662?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/9617","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/69"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=9617"}],"version-history":[{"count":5,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/9617\/revisions"}],"predecessor-version":[{"id":9658,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/9617\/revisions\/9658"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=9617"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=9617"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=9617"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=9617"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}