The Surprising Shape of Normal Distributions in High Dimensions

Multivariate Normal distributions are an essential component of virtually any modern deep learning method—be it to initialise the weights and biases of a neural network, perform variational inference in a probabilistic model, or provide a tractable noise distribution for generative modelling.

What most of us (including—until very recently—me) aren’t aware of, however, is that these Normal distributions begin to look less and less like the characteristic bell curve that we associate them with as their dimensionality increases.

I stumbled across this interesting and counterintuitive fact in Roman Vershynin’s excellent “High-Dimensional Probability: An Introduction with Applications in Data Science” and will re-use some of its expository figures below.


As with many surprising properties of high-dimensional spaces, this behaviour has its roots in the curse of dimensionality—i.e. the fact that higher-dimensional spaces have exponentially more volume than lower-dimensional ones. For instance, a cube of width 2 will be 23=8 times as large as a unit cube in three dimensions, but 210=1024 times as large in ten dimensions.

This exponential increase in volume means that, while the density of a standard Normal distribution is still maximal at the origin, most of it is not actually concentrated around its mean—as our low-dimensional bell-curve-intuition would suggest. Instead, as the dimensionality N increases, it becomes increasingly indistinguishable from a uniform distribution on a spherical shell of constant width and radius √N.


We can easily verify this empirically by converting i.i.d. Normal samples of increasing dimensionality into spherical coordinates and comparing their radii, yielding the following histograms.

import numpy as np
import pandas as pd
import seaborn as sns

num_samples = 100_000
dims = [1, 2, 10, 20]

# generate samples
rng = np.random.default_rng()
samples = {d: rng.standard_normal((num_samples, d)) for d in dims}

# get radius
samples = {d: np.linalg.norm(samples[d], axis=-1) for d in dims}

# convert to pd.DataFrame and plot
samples_df = pd.DataFrame(samples).melt(var_name="dim", value_name="r")
g = sns.FacetGrid(data=samples_df, col="dim")
g.map_dataframe(sns.histplot, x="r", element="step")

for i, d in enumerate(dims):
  ax = g.axes[0][i]
  ax.axvline(np.sqrt(d), c="r")

Conversely, this means that the cumulative probability density in μ±2σ—which, as a useful rule of thumb, amounts to around 95% in lower dimensions—decays rapidly as well.

import numpy as np
from scipy.stats import multivariate_normal

dims = [1, 2, 10, 20]

for d in dims:
  mvn = multivariate_normal(mean=np.zeros(d), cov=np.eye(d))
  p = mvn.cdf(2 * np.ones(d)) - mvn.cdf(-2 * np.ones(d))
  print(f"Dimension: {d}\tdensity in μ±σ: {p:.3f}")

Output:
Dimension: 1    density in μ±σ: 0.954
Dimension: 2    density in μ±σ: 0.954
Dimension: 10   density in μ±σ: 0.794
Dimension: 20   density in μ±σ: 0.631

While such surprising and counterintuitive properties can be a lot of fun to think about from a theoretical standpoint, I also found them extremely helpful when implementing, debugging and testing models that make heavy use of high-dimensional multivariate Normals—of which there are, as mentioned, quite a few.

Author