What is the difference between standard error and standard deviation?
Standard deviation measures the spread of individual observations around the population mean. Standard error measures the spread of sample means around the true mean — it equals the standard deviation divided by the square root of the sample size, so it shrinks as the sample grows while the standard deviation does not.
How to think about it
The confusion between these two is one of the most common statistical errors in industry. Nail the conceptual distinction, the formula, and where each belongs in a report.
Standard deviation (SD)
SD = sigma = sqrt( Var(X) )
SD describes variability in the data itself. It answers: “How spread out are individual observations?” If you measure the heights of 1,000 people, the SD of that sample tells you how far a typical person’s height deviates from the average. Adding more people to the sample does not materially change the SD — it converges to the population SD, but it does not shrink toward zero.
Standard error (SE)
SE = sigma / sqrt(n) ≈ s / sqrt(n)
where s is the sample SD (used when σ is unknown).
SE describes variability of the sample mean as an estimator. It answers: “If we repeated this study many times, how much would the computed mean bounce around?” By the Central Limit Theorem, the sample mean X̄ has distribution approximately N(mu, sigma^2 / n), so its standard deviation is sigma / sqrt(n).
The key relationship
SE = SD / sqrt(n)
Double the sample size: SE drops by a factor of sqrt(2) ≈ 1.41. To halve the SE, you need four times as many observations. This is the diminishing-returns cost of precision in statistics.
When to use each
| Quantity | When to use |
|---|---|
| Standard deviation | Describe spread in a dataset or population |
| Standard error | Describe precision of an estimate (mean, proportion, coefficient) |
| SE in confidence intervals | X_bar ± z * SE or X_bar ± t * SE |
| SD in error bars on raw data | Show data variability, not estimation uncertainty |
Common mistake in reporting
Error bars in plots are ambiguous. Always label whether they represent ±1 SD (data spread) or ±1 SE (estimation precision). SE-based error bars are narrower and give a false impression of data concentration if readers expect SD.