{"id":8600,"date":"2023-09-21T20:19:10","date_gmt":"2023-09-21T19:19:10","guid":{"rendered":"https:\/\/www.blopig.com\/blog\/?p=8600"},"modified":"2023-09-21T20:19:12","modified_gmt":"2023-09-21T19:19:12","slug":"the-surprising-shape-of-normal-distributions-in-high-dimensions","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2023\/09\/the-surprising-shape-of-normal-distributions-in-high-dimensions\/","title":{"rendered":"The Surprising Shape of Normal Distributions in High Dimensions"},"content":{"rendered":"\n<p>Multivariate Normal distributions are an essential component of virtually any modern deep learning method&#8212;be it to initialise the weights and biases of a neural network, perform variational inference in a probabilistic model, or provide a tractable noise distribution for generative modelling.<\/p>\n\n\n\n<p>What most of us (including&#8212;until very recently&#8212;me) aren&#8217;t aware of, however, is that these Normal distributions begin to look less and less like the characteristic bell curve that we associate them with as their dimensionality increases.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p>I stumbled across this interesting and counterintuitive fact in Roman Vershynin&#8217;s excellent <a href=\"https:\/\/www.math.uci.edu\/~rvershyn\/papers\/HDP-book\/HDP-book.html\" data-type=\"link\" data-id=\"https:\/\/www.math.uci.edu\/~rvershyn\/papers\/HDP-book\/HDP-book.html\">&#8220;High-Dimensional Probability: An Introduction with Applications in Data Science&#8221;<\/a> and will re-use some of its expository figures below.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<p>As with many surprising properties of high-dimensional spaces, this behaviour has its roots in the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Curse_of_dimensionality\">curse of dimensionality<\/a>&#8212;i.e. the fact that higher-dimensional spaces have exponentially more volume than lower-dimensional ones. For instance, a cube of width 2 will be 2<sup>3<\/sup>=8 times as large as a unit cube in three dimensions, but 2<sup>10<\/sup>=1024 times as large in ten dimensions.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/image.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"269\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/image.png?resize=625%2C269&#038;ssl=1\" alt=\"\" class=\"wp-image-10380\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/image.png?w=1012&amp;ssl=1 1012w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/image.png?resize=300%2C129&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/image.png?resize=768%2C330&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/image.png?resize=624%2C268&amp;ssl=1 624w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n<\/div>\n\n\n<p>This exponential increase in volume means that, while the density of a standard Normal distribution is still maximal at the origin, most of it is not actually concentrated around its mean&#8212;as our low-dimensional bell-curve-intuition would suggest. Instead, as the dimensionality N increases, it becomes increasingly indistinguishable from a uniform distribution on a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Spherical_shell\" data-type=\"link\" data-id=\"https:\/\/en.wikipedia.org\/wiki\/Spherical_shell\">spherical shell<\/a> of constant width and radius \u221aN.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/image-1.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"348\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/image-1.png?resize=625%2C348&#038;ssl=1\" alt=\"\" class=\"wp-image-10381\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/image-1.png?w=732&amp;ssl=1 732w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/image-1.png?resize=300%2C167&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/image-1.png?resize=624%2C347&amp;ssl=1 624w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<p>We can easily verify this empirically by converting i.i.d. Normal samples of increasing dimensionality into spherical coordinates and comparing their radii, yielding the following histograms.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import numpy as np\nimport pandas as pd\nimport seaborn as sns\n\nnum_samples = 100_000\ndims = [1, 2, 10, 20]\n\n# generate samples\nrng = np.random.default_rng()\nsamples = {d: rng.standard_normal((num_samples, d)) for d in dims}\n\n# get radius\nsamples = {d: np.linalg.norm(samples[d], axis=-1) for d in dims}\n\n# convert to pd.DataFrame and plot\nsamples_df = pd.DataFrame(samples).melt(var_name=\"dim\", value_name=\"r\")\ng = sns.FacetGrid(data=samples_df, col=\"dim\")\ng.map_dataframe(sns.histplot, x=\"r\", element=\"step\")\n\nfor i, d in enumerate(dims):\n  ax = g.axes[0][i]\n  ax.axvline(np.sqrt(d), c=\"r\")<\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/Untitled.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"625\" height=\"153\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/Untitled.png?resize=625%2C153&#038;ssl=1\" alt=\"\" class=\"wp-image-10382\" srcset=\"https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/Untitled.png?resize=1024%2C250&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/Untitled.png?resize=300%2C73&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/Untitled.png?resize=768%2C187&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/Untitled.png?resize=624%2C152&amp;ssl=1 624w, https:\/\/i0.wp.com\/www.blopig.com\/blog\/wp-content\/uploads\/2023\/09\/Untitled.png?w=1189&amp;ssl=1 1189w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/figure>\n\n\n\n<p>Conversely, this means that the cumulative probability density in \u03bc\u00b12\u03c3&#8212;which, as a useful rule of thumb, amounts to around 95% in lower dimensions&#8212;decays rapidly as well.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import numpy as np\nfrom scipy.stats import multivariate_normal\n\ndims = [1, 2, 10, 20]\n\nfor d in dims:\n  mvn = multivariate_normal(mean=np.zeros(d), cov=np.eye(d))\n  p = mvn.cdf(2 * np.ones(d)) - mvn.cdf(-2 * np.ones(d))\n  print(f\"Dimension: {d}\\tdensity in \u03bc\u00b1\u03c3: {p:.3f}\")\n\nOutput:\nDimension: 1    density in \u03bc\u00b1\u03c3: 0.954\nDimension: 2    density in \u03bc\u00b1\u03c3: 0.954\nDimension: 10   density in \u03bc\u00b1\u03c3: 0.794\nDimension: 20   density in \u03bc\u00b1\u03c3: 0.631<\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<p>While such surprising and counterintuitive properties can be a lot of fun to think about from a theoretical standpoint, I also found them extremely helpful when implementing, debugging and testing models that make heavy use of high-dimensional multivariate Normals&#8212;of which there are, as mentioned, quite a few.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Multivariate Normal distributions are an essential component of virtually any modern deep learning method&#8212;be it to initialise the weights and biases of a neural network, perform variational inference in a probabilistic model, or provide a tractable noise distribution for generative modelling. What most of us (including&#8212;until very recently&#8212;me) aren&#8217;t aware of, however, is that these [&hellip;]<\/p>\n","protected":false},"author":86,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[633,189,278],"tags":[],"ppma_author":[616],"class_list":["post-8600","post","type-post","status-publish","format-standard","hentry","category-ai","category-machine-learning","category-statistics"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":616,"user_id":86,"is_guest":0,"slug":"leo","display_name":"Leo Klarner","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/8a288902cdb15c98aa887d33d06a4061fa3ebe87388f89f76734cf2be40ec362?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/8600","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/86"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=8600"}],"version-history":[{"count":5,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/8600\/revisions"}],"predecessor-version":[{"id":10394,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/8600\/revisions\/10394"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=8600"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=8600"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=8600"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=8600"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}