What makes the Normal distribution so central in statistics, and when does it fail?
The Normal distribution is justified by the Central Limit Theorem — averages of large i.i.d. samples converge to Normal regardless of the underlying distribution. It is fully characterized by mean and variance, enabling closed-form inference. It fails for heavy-tailed data, skewed outcomes, bounded quantities, and rare extreme events.
How to think about it
The Normal distribution is the most-used distribution in statistics, but its ubiquity can breed overconfidence. Know why it works and where it breaks.
Why Normal is special
The CLT states: if X₁, X₂, …, Xₙ are i.i.d. with mean μ and finite variance σ², then:
√n · (X̄ₙ - μ) / σ → N(0, 1) as n → ∞
This means sample means are approximately Normal for large n regardless of the underlying distribution. It justifies t-tests, z-tests, confidence intervals, and ordinary least squares.
Properties of N(μ, σ²)
- Symmetric about μ; skewness = 0, kurtosis = 3.
- 68-95-99.7 rule:
[μ ± σ]contains ~68 %,[μ ± 2σ]~95 %,[μ ± 3σ]~99.7 % of probability mass. - Sum of independent Normals is Normal: if
X ~ N(μ₁,σ₁²)andY ~ N(μ₂,σ₂²)independently, thenX+Y ~ N(μ₁+μ₂, σ₁²+σ₂²). - It is the maximum-entropy distribution given a fixed mean and variance.
Worked numeric example
Heights of adult men are approximately N(178 cm, 7²). What fraction are taller than 192 cm?
z = (192 - 178) / 7 = 2.0
P(Z > 2) ≈ 1 - 0.9772 = 2.28 %
About 2.3 % of men exceed 192 cm — consistent with the 2σ rule.
When Normal fails
- Heavy tails: stock returns, insurance claims, internet traffic have fat tails — use Student-t, Pareto, or stable distributions.
- Skewness: income, latency, time-to-event are right-skewed — use log-Normal or Gamma.
- Bounded support: probabilities live in
[0,1]— use Beta. Counts are non-negative integers. - Rare extremes: for tail risk (VaR, stress testing), Normal systematically underestimates extreme probabilities.