datarekha

Estimation & confidence intervals

You never see the true parameter — only an estimate from a finite sample. Estimators, standard error, and confidence intervals are how you say how much to trust that number, and how the most-misunderstood interval in statistics actually works.

8 min read Intermediate Math for ML Lesson 24 of 30

What you'll learn

  • Estimators and their sampling distribution: your estimate is itself random
  • Standard error as the spread of an estimate, and why it shrinks like 1/√n
  • Bias vs variance of an estimator — and what 'consistent' means
  • What a 95% confidence interval really claims (and the interpretation everyone gets wrong)
  • The bootstrap — confidence intervals when you have no formula

Before you start

You measure the average session time from 200 users and get 4.2 minutes. But the true average — over all users, forever — you’ll never see. Your 4.2 is an estimate, and the honest next question is: how far off could it be?

Estimators are random

An estimator is any rule that turns a sample into a guess — the sample mean estimates the population mean. Run it on a different sample and you’d get a slightly different number. So the estimate has its own distribution, the sampling distribution, and its spread is the standard error:

SE(x̄) = σ / √n

That √n is the central fact of statistics: to halve your uncertainty you need four times the data. (This is the CLT at work — the sampling distribution of the mean is approximately normal.)

Bias and variance of an estimate

  • Bias — does the estimator systematically miss? The sample mean is unbiased; dividing the sample variance by n instead of n−1 is biased.
  • Variance — how much does it bounce around between samples? SE².
  • Consistent — does it converge to the truth as n → ∞? The sample mean does. (Same bias/variance language as models, applied to estimates.)

Confidence intervals — and the trap

A 95% confidence interval is estimate ± z · SE (with z ≈ 1.96). Here’s the catch almost everyone gets wrong. It does not mean “there’s a 95% probability the true value is in this interval.” The true value is fixed; it’s either in or out. What’s random is the interval. The honest statement:

If you repeated the whole experiment many times, about 95% of the intervals you’d construct would contain the true value.

Watch it happen — each line is one experiment’s interval:

About 1 in 20 misses — exactly the 5% you signed up for. Crank n up and the intervals get tighter, but the coverage stays 95%.

The bootstrap: a CI with no formula

When there’s no neat formula for SE (a median, a weird metric, a model score), resample your data with replacement thousands of times, recompute the statistic each time, and read the interval off the 2.5th and 97.5th percentiles.

Where this lives in ML

  • Reporting metrics with error bars instead of a single accuracy number.
  • A/B testing — the difference in conversion comes with a CI; if it straddles zero, you can’t claim a winner.
  • Model comparison — is model A really better, or within noise?
  • “We need more data” — the √n law tells you how much more buys how much less uncertainty (with sharply diminishing returns).

Quick check

Quick check

0/3
Q1You compute a 95% CI of [4.0, 4.4] for the mean. Which statement is correct?
Q2To cut your standard error in half, you need…
Q3When would you reach for a bootstrap confidence interval?

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Practice this in an interview

All questions
What is the correct interpretation of a 95% confidence interval?

A 95% confidence interval means that if you repeated the sampling procedure many times and built an interval each time, 95% of those intervals would contain the true parameter. It does not mean there is a 95% probability that this specific interval contains the parameter.

What is the difference between a biased estimator and an inconsistent estimator?

Bias measures the systematic error of an estimator at a fixed sample size — whether its expected value equals the true parameter. Consistency is an asymptotic property — whether the estimator converges in probability to the true parameter as sample size grows to infinity. An estimator can be biased yet consistent, or unbiased yet inconsistent.

What does the Central Limit Theorem actually say, and why does it matter?

The CLT states that the sampling distribution of the sample mean converges to a normal distribution as sample size grows, regardless of the shape of the underlying population distribution. It is the theoretical foundation for confidence intervals, hypothesis tests, and many machine-learning approximations — but it applies to the distribution of the mean, not to the raw data.

What is maximum likelihood estimation, and what is the intuition behind it?

Maximum likelihood estimation finds the parameter values that make the observed data most probable under the assumed model. Intuitively, you ask: given this data, which world would have been most likely to generate it?

Related lessons

Explore further

Skip to content