Central Limit Theorem & Confidence Intervals
Average enough independent samples and the result is Normal — whatever the original shape. That single fact powers GATE's NAT questions on sums, proportions, and confidence intervals.
What you'll learn
- The sample mean of n iid variables has mean μ and variance σ²/n, so its SD is σ/√n
- The Central Limit Theorem: for large n the sample mean (or sum) is approximately Normal, whatever the population shape
- Standardising a sample mean or sum and reading the answer off the Φ table
- A confidence interval for a known-σ mean: x̄ ± z·(σ/√n), with z = 1.96 for 95%
Before you start
Here is one of the strangest facts in all of probability. Take any population — skewed, discrete, lopsided, doesn’t matter — draw a sample, and compute the average. Repeat that a few thousand times. The histogram of those averages always pulls itself into a clean bell shape. The underlying data can look like anything; the act of averaging is what manufactures the normal curve. That fact is the Central Limit Theorem, and it is the engine behind every GATE question that hands you a sum or a sample mean and expects you to standardise and read Phi like the previous lesson.
The sample mean: mean μ, variance σ²/n
Take n independent draws X₁, …, Xₙ from a population with mean μ and variance
σ². Their average is the sample mean x̄ = (X₁ + … + Xₙ) / n. Two facts hold
for any population, even before the CLT:
- Mean of x̄ is μ — averaging does not shift the centre.
- Variance of x̄ is σ²/n — averaging shrinks the spread. So the standard
deviation of the sample mean is
σ/√n(the “standard error”).
More data gives a tighter estimate, and it tightens like √n, not like n. To halve
the standard error you need four times the data.
The Central Limit Theorem
Central Limit Theorem. For independent, identically distributed X₁, …, Xₙ with
mean μ and finite variance σ², as n grows the sample mean is approximately
x̄ ≈ Normal( μ , σ²/n ) equivalently sum ≈ Normal( nμ , nσ² )
regardless of the population’s original distribution. Once you accept that, every question becomes the same move you learned for the Normal: standardise, then read Φ.
value − mean x̄ − μ sum − nμ
z = ──────────────── = ─────────── = ────────────
std. dev σ/√n √(nσ²)
where Φ(z) = P(Z ≤ z) is the standard-normal table. A rule of thumb: n ≥ 30 is
“large enough” for the approximation in most GATE problems.
Try it. Pick the most lopsided base population you can find — the exponential,
the bimodal — and start drawing samples. At n = 1 the histogram of “sample
means” still looks like the base population. Bump n up and watch the shape
straighten into a bell, hugging N(μ, σ/√n) whatever the population was. That
shrinking standard error σ/√n is exactly why bigger samples give tighter
estimates — and why a 95% confidence interval gets narrower as n grows.
Confidence intervals (known σ)
The CLT also tells you how trustworthy an estimate is. If x̄ is approximately
Normal(μ, σ²/n), then μ lies within a predictable band around x̄. A confidence
interval for the mean, when the population σ is known, is
x̄ ± z · (σ/√n)
The multiplier z comes from the standard normal and sets the confidence level:
- 90% confidence:
z = 1.645 - 95% confidence:
z = 1.96 - 99% confidence:
z = 2.576
A higher confidence level needs a bigger z, which makes the interval wider —
being more sure costs precision. The interval also narrows as n grows, again like
√n.
How GATE asks this
A NAT or MCQ. The classic pattern hands you a sum or proportion built from many
iid pieces, tells you to treat it as Normal, and either gives you a Φ value (often
Φ(2) ≈ 0.9772 or Φ(1) ≈ 0.8413) to reach a decimal, or — as in GATE DA 2025
— asks you to pick the right Φ expression from four options. Either way you compute
the mean and variance of the sum, standardise the endpoints, and combine the Φ
readings; the 2025 sum-of-300-Bernoulli question is worked below. A close cousin asks
for a 95% confidence interval given x̄, σ, and n, expecting x̄ ± 1.96·(σ/√n).
Worked example — a real GATE DA 2025 question
Let
Ybe the sum of 300 independent Bernoulli(0.25) random variables (each is 1 with probability 0.25, else 0). Using the normal approximation,P(60 ≤ Y ≤ 90)equals which ofΦ(2) − Φ(−2),Φ(1) − Φ(−1),Φ(3) − Φ(−3), orΦ(90) − Φ(60)? (We also evaluate it numerically withΦ(2) ≈ 0.9772.)
Step 1 — mean and variance of the sum. A single Bernoulli(p) has mean p and
variance p(1−p). For the sum of 300 of them:
mean = 300 · 0.25 = 75
variance = 300 · 0.25 · 0.75 = 56.25
std. dev = √56.25 = 7.5
Step 2 — standardise both endpoints. Subtract the mean and divide by 7.5:
lower: z = (60 − 75) / 7.5 = −15 / 7.5 = −2
upper: z = (90 − 75) / 7.5 = 15 / 7.5 = +2
Step 3 — read the Φ table. Using the symmetry Φ(−2) = 1 − Φ(2):
P(60 ≤ Y ≤ 90) = Φ(2) − Φ(−2)
= 0.9772 − (1 − 0.9772)
= 0.9772 − 0.0228
= 0.9544
So the answer is the expression Φ(2) − Φ(−2), which evaluates to ≈ 0.9544. This is
the real GATE DA 2025 question (an MCQ over the four Φ-expressions) — the whole
solution is “mean, variance, standardise, subtract two Φ values.”
Quick check
Quick check
Practice this in an interview
All questionsThe CLT states that the sampling distribution of the sample mean converges to a normal distribution as sample size grows, regardless of the shape of the underlying population distribution. It is the theoretical foundation for confidence intervals, hypothesis tests, and many machine-learning approximations — but it applies to the distribution of the mean, not to the raw data.
The Normal distribution is justified by the Central Limit Theorem — averages of large i.i.d. samples converge to Normal regardless of the underlying distribution. It is fully characterized by mean and variance, enabling closed-form inference. It fails for heavy-tailed data, skewed outcomes, bounded quantities, and rare extreme events.
The Law of Large Numbers (LLN) says the sample mean converges to the true mean as sample size grows — it is a statement about where the mean lands. The Central Limit Theorem says the sampling distribution of the mean is approximately normal — it is a statement about the shape of that distribution. LLN guarantees convergence; CLT characterises the rate and shape of that convergence.
A 95% confidence interval means that if you repeated the sampling procedure many times and built an interval each time, 95% of those intervals would contain the true parameter. It does not mean there is a 95% probability that this specific interval contains the parameter.