datarekha

Averages That Lie

Your dashboard says average revenue per customer is $590 — but almost no real customer spends that. Here is why the mean misleads on skewed business data, and what to use instead.

7 min read Beginner Business Analytics Lesson 8 of 21

What you'll learn

  • Why the arithmetic mean is pulled by outliers and rarely represents a 'typical' customer
  • What the median is and why it stays honest when data is skewed
  • How percentiles (p50, p90, p99) describe the whole picture
  • The Pareto principle and why you manage the head and tail of your customer base differently

Before you start

Your dashboard is not broken. The math is correct. The problem is that the arithmetic mean — the most familiar kind of average — is easily hijacked by a single big spender, and on most business data that hijacking happens constantly. By the end of this lesson you will be able to look at any “average” on a report and immediately ask the right follow-up question.

The dataset that breaks the dashboard

Imagine your business has exactly 10 customers this month:

CustomerMonthly spend
Customers 1–9$100 each
Customer 10 (the whale)$5,000

The mean — the arithmetic average, calculated by adding all values and dividing by the count — is:

(9 × $100 + $5,000) ÷ 10 = $5,900 ÷ 10 = $590

So the dashboard reports $590. Now go talk to your customers. Nine of the ten spend $100. One spends $5,000. The “average customer” who spends $590 does not exist. Not one actual person is near that number.

Why the mean gets dragged around

The mean gives every data point equal weight in the sum. Customer 10 contributes $5,000 to the numerator while each of the other nine contributes only $100. One observation fifty times larger than the others drags the entire average upward into empty space — a gap where nobody actually sits.

This is not a flaw you can fix by collecting more data. It is a mathematical property of the mean. Whenever data has a long tail — a few very large values stretching out to the right — the mean will live somewhere in that tail, away from the crowd.

The median: the honest middle

The median is the middle value when all observations are sorted from smallest to largest — by definition, half the values fall below it and half above. It does not add values up; it just finds the center of the ranked list.

Sort our 10 customers by spend:

$100, $100, $100, $100, $100, $100, $100, $100, $100, $5,000

With 10 values the median sits between the 5th and 6th entries — both are $100 — so the median = $100.

That is the honest answer to “what does a typical customer spend?” Nine out of ten customers spend $100. The median says so; the mean does not.

Visualizing the gap

The diagram below plots all 10 customers as dots along a revenue axis. Notice where the mean lives: in the empty gap between the cluster and the whale.

$0$100$590$5,000Median $100Mean $590← empty gap →
Nine customers cluster at $100; one whale sits at $5,000. The mean ($590) floats in the gap where no customer actually is. The median ($100) sits with the crowd.

Skewed data is the normal case in business

Revenue per customer, order values, salary distributions, session lengths, support ticket resolution times — nearly all of these are right-skewed (meaning the tail points to the right, toward large values). A handful of power users, big orders, or senior executives pull the mean up and away from the typical experience.

In skewed data the median is almost always the more honest description of “typical.” The mean is not useless — it tells you the total divided by count, which matters for budgeting — but it is a terrible proxy for the individual customer.

Percentiles: describing the whole shape

A single number — mean or median — always hides information. Percentiles give you the shape. A percentile is the value below which a given percentage of observations fall.

  • p50 is the 50th percentile, which is identical to the median: half of customers spend less, half spend more.
  • p90 means 90% of customers spend less than this value. If p90 = $350, then the top 10% of customers spend $350 or more — a very different group from the median customer.
  • p99 is the top 1%: your whales.

In our 10-customer example the p90 is the 9th value in the sorted list, which is $100 (all of customers 1–9 spend $100). Only p100 — the maximum — reaches $5,000. Percentiles make the whale visible without letting it distort every other number.

Practically, a product team might track p50 and p90 load times separately: p50 tells you the median user experience; p90 tells you how bad it gets for the slowest 10%.

The Pareto principle and managing the tail

There is a well-documented pattern across many businesses called the Pareto principle — also called the 80/20 rule — which observes that roughly 80% of revenue tends to come from roughly 20% of customers (the exact ratio varies, but the lopsidedness is common). Your one whale in a 10-customer sample is a textbook version.

This matters for strategy: the top 20% of customers (your high-value segment) often deserve different attention — dedicated account managers, loyalty programs, early access — than the long tail of occasional small spenders. Averaging everyone together obscures that the two groups need completely different treatment.

The mean hides the whale. The median finds the crowd. Percentiles show you both.

What to report instead

MetricWhat it tells youWhen to use it
MeanTotal ÷ count; useful for budgetingAlways show it, but never alone
Median (p50)The typical individual experienceDefault for describing “average customer”
p90The ceiling for 90% of customersSpotting the power-user or high-load tail
p99The whale tierAccount management, outlier investigation

A good dashboard shows at minimum: mean, median, and p90. If mean is close to median, the data is roughly symmetric and the mean is a fair summary. If mean is materially larger than median, you are looking at a skewed distribution and the story is in the gap.

Next

Segmentation and RFM — stop averaging customers; group them by recency, frequency, and monetary value so you can manage each segment on its own terms.


Quick check

0/3
Q1In our 10-customer dataset (nine at $100, one at $5,000), why does the mean equal $590 while the median equals $100?
Q2A startup reports 'average session length: 14 minutes.' The median session length is 3 minutes. What does this most likely mean?
Q3Your e-commerce platform records daily orders for 100 customers. The p90 of order value is $800. What does this tell you?

Practice this in an interview

All questions
When is the mean a misleading summary statistic, and what should you use instead?

The mean is distorted by skewness and outliers, masks multimodality, and can describe a value that no individual in the dataset actually holds. Skewed, heavy-tailed, or multimodal distributions almost always require the median, percentiles, or the full distributional picture rather than the mean.

What is regression to the mean, and why does it fool analysts into seeing treatment effects that do not exist?

Regression to the mean is the statistical tendency for extreme measurements to be followed by measurements closer to the population mean, purely due to random noise — not because of any intervention. Analysts who intervene after observing an extreme value and then observe improvement often incorrectly attribute the recovery to their action.

When should you use mean vs median vs mode, and which is most robust to outliers?

Mean is optimal for symmetric, outlier-free data; median is the go-to for skewed distributions or when outliers are real rather than errors; mode is the only sensible average for nominal/categorical data. Robustness is a formal concept — the median's breakdown point is 50%, meaning half the data can be corrupted before it fails, while the mean's breakdown point is essentially 0%.

What makes a chart misleading, and how do you spot truncated y-axes, dual axes, and 3D distortion?

Charts mislead when visual area or slope no longer encodes the underlying ratio faithfully. The three most common traps are a truncated y-axis that magnifies trivial differences, dual axes that let the designer set any ratio between scales, and 3D perspective that foreshortens far elements and inflates near ones.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content