Statistics & Probability Medium Asked at GoogleAsked at MetaAsked at Two SigmaAsked at Citadel

What makes the Normal distribution so central in statistics, and when does it fail?

For Data Scientist Data Analyst ML Engineer AI / LLM Engineer

The short answer

The Normal distribution is justified by the Central Limit Theorem — averages of large i.i.d. samples converge to Normal regardless of the underlying distribution. It is fully characterized by mean and variance, enabling closed-form inference. It fails for heavy-tailed data, skewed outcomes, bounded quantities, and rare extreme events.

How to think about it

The Normal distribution is the most-used distribution in statistics, but its ubiquity can breed overconfidence. Know why it works and where it breaks.

Why Normal is special

The CLT states: if X₁, X₂, …, Xₙ are i.i.d. with mean μ and finite variance σ², then:

√n · (X̄ₙ - μ) / σ  →  N(0, 1)  as n → ∞

This means sample means are approximately Normal for large n regardless of the underlying distribution. It justifies t-tests, z-tests, confidence intervals, and ordinary least squares.

Properties of N(μ, σ²)

Symmetric about μ; skewness = 0, kurtosis = 3.
68-95-99.7 rule: [μ ± σ] contains ~68 %, [μ ± 2σ] ~95 %, [μ ± 3σ] ~99.7 % of probability mass.
Sum of independent Normals is Normal: if X ~ N(μ₁,σ₁²) and Y ~ N(μ₂,σ₂²) independently, then X+Y ~ N(μ₁+μ₂, σ₁²+σ₂²).
It is the maximum-entropy distribution given a fixed mean and variance.

Worked numeric example

Heights of adult men are approximately N(178 cm, 7²). What fraction are taller than 192 cm?

z = (192 - 178) / 7 = 2.0
P(Z > 2) ≈ 1 - 0.9772 = 2.28 %

About 2.3 % of men exceed 192 cm — consistent with the 2σ rule.

When Normal fails

Heavy tails: stock returns, insurance claims, internet traffic have fat tails — use Student-t, Pareto, or stable distributions.
Skewness: income, latency, time-to-event are right-skewed — use log-Normal or Gamma.
Bounded support: probabilities live in [0,1] — use Beta. Counts are non-negative integers.
Rare extremes: for tail risk (VaR, stress testing), Normal systematically underestimates extreme probabilities.

Learn it properly Central limit theorem

What makes the Normal distribution so central in statistics, and when does it fail?

Why Normal is special

Properties of N(μ, σ²)

Worked numeric example

When Normal fails

Keep practising

Explore further