datarekha
Statistics & Probability Easy Asked at AmazonAsked at GoogleAsked at MetaAsked at Airbnb

When is the mean a misleading summary statistic, and what should you use instead?

The short answer

The mean is distorted by skewness and outliers, masks multimodality, and can describe a value that no individual in the dataset actually holds. Skewed, heavy-tailed, or multimodal distributions almost always require the median, percentiles, or the full distributional picture rather than the mean.

How to think about it

The mean is the right summary when the distribution is symmetric, unimodal, and free of gross outliers. In every other common business context it misleads more often than it informs.

Why the mean fails

Outlier sensitivity: the mean of {10, 11, 12, 13, 14} is 12. Add one value of 1 000 and the mean becomes 210 — no longer representative of any element in the set.

Skewed distributions: income, revenue per user, latency, house prices, and insurance claims are all right-skewed. The mean exceeds the median and misrepresents the typical individual’s experience.

Multimodal distributions: a bimodal distribution of session lengths (quick reads at 30 s and deep reads at 8 min) might have a mean of 2.5 min — a value almost no user actually produces. The mean obscures the two distinct user behaviours.

Discrete or bounded data: a survey asks “how many times per week do you exercise?” Most respondents answer 0 or 1; the mean of 1.3 is not a valid answer to the question.

Worked example — AWS Lambda latency

P50 (median) latency: 12 ms. P95: 180 ms. P99: 2 400 ms. Mean: 47 ms.

The mean (47 ms) is pulled up by a heavy tail of slow requests. An SLA built on the mean would be satisfied while 1 in 100 users wait 2.4 seconds — a terrible experience invisible in the mean. Engineering reliability targets correctly use P99 or P99.9, not the mean.

Better alternatives by situation

Distribution shapePreferred summary
Symmetric, unimodal, no outliersMean + SD
Skewed or heavy-tailedMedian + IQR or percentiles
Heavy right tail (revenue, latency)P50, P90, P95, P99
MultimodalMixture model or histogram; report modes separately
Bounded, sparse integer dataMedian or mode + distribution plot

When the mean is exactly right

The mean is the correct summary when you need a quantity that, when summed over n observations, recovers the total. Total revenue, total cost, total impressions — these are exactly mean × n. For aggregations that depend on sums, the mean is both correct and sufficient.

Keep practising

All Statistics & Probability questions

Explore further

Skip to content