datarekha

Mean, Median, Mode & z-scores

Summarise a dataset in a few numbers: centre with mean/median/mode, spread with variance, and standardise with the z-score that ML preprocessing relies on.

6 min read Beginner GATE DA Lesson 3 of 122

What you'll learn

  • Mean, median, mode, and range as measures of centre and spread
  • Population variance (divide by n) vs sample variance (divide by n-1, Bessel's correction)
  • Standard deviation and the z-score = (x - mu)/sigma standardization
  • Why the mean is sensitive to outliers while the median is robust

Before you start

“What’s the typical income in this town?” — that one question already needs two answers, because the mean and the median can be very different numbers, and which one is honest depends on whether one billionaire moved in. Descriptive statistics is the small toolkit for that: a couple of ways to say “where is the centre” (mean, median, mode), a couple for “how spread out” (variance, standard deviation), and at the end a tiny rescaling — the z-score — that turns “I got 80 on this test” into “I was 2 SDs above the class mean.”

These are cheap marks on the exam, and the z-score in particular is the exact move ML preprocessing makes when it standardises features.

Measures of centre

Mean(∑ x) / nbalance pointsensitive to outliersMedianmiddle valuesort, take centrerobust to outliersModemost frequentcan be none or manyworks for categories
Three notions of “typical”: balance point, middle, and most common.
  • Mean μ = (Σ x) / n — the arithmetic average, the balance point of the data.
  • Median — sort the values; the median is the middle one (average of the two middle values if n is even).
  • Mode — the most frequently occurring value; a dataset can have one, several, or no mode, and it is the only centre that works for categorical data.
  • Rangemax − min, the crudest measure of spread.

For the dataset 2, 4, 4, 6, 9: mean = 25/5 = 5, median = 4 (middle of five sorted values), mode = 4 (it appears twice), range = 9 − 2 = 7.

Spread: variance and standard deviation

Variance measures the average squared distance from the mean. There are two versions, and the divisor is the whole exam trap:

  • Population variance divides by n: σ² = (Σ (x − μ)²) / n.
  • Sample variance divides by n − 1: s² = (Σ (x − x̄)²) / (n − 1).

That n − 1 is Bessel’s correction — using the sample mean (computed from the same data) slightly understates spread, so dividing by n − 1 instead of n corrects the bias when you only have a sample. Standard deviation is just the square root of variance (σ or s), back in the original units.

The z-score — standardization

z=x − μσin SD unitshow many standard deviations x sits from the mean
The z-score rescales any value to “standard deviations from the mean”.

The z-score z = (x − μ) / σ answers: how many standard deviations does a value x sit above (positive) or below (negative) the mean? It strips away the units and scale, which is exactly why standardization is a staple of ML preprocessing — it puts every feature on the same footing. A z-score of 0 is exactly average; +2 means two SDs above the mean.

How GATE asks this

Reliably a quick NAT: a value, a mean, and a standard deviation are given, and you compute the z-score to a few decimals — or you are given a small dataset and asked for its mean/median/variance. MCQs probe the concepts: which divisor is the sample variance (n − 1), or which measure of centre is robust to outliers (the median). z-score normalization appeared in GATE DA 2024.

Worked example — a real 2024 question

A data value of 106000 comes from a distribution with mean μ = 96000 and standard deviation σ = 21000. Find its z-score.

Plug straight into the formula:

z = (x − μ) / σ
  = (106000 − 96000) / 21000
  = 10000 / 21000
  ≈ 0.476

So z ≈ 0.476 — the value sits roughly half a standard deviation above the mean. This is a real GATE DA 2024 question, and 0.476 is the verified answer. Note how the large raw numbers collapse to a clean, unit-free score once standardized.

Quick check

Quick check

0/6
Q1A value x = 80 comes from a distribution with mean μ = 50 and standard deviation σ = 12. Compute its z-score (about 3 decimals).numerical answer — type a number
Q2A test score of 68 comes from a class with mean 74 and standard deviation 8. What is the z-score?numerical answer — type a number
Q3For the dataset 3, 5, 5, 7, 10, what is the median?numerical answer — type a number
Q4Which statements about descriptive statistics are TRUE? (select all that apply)select all that apply
Q5The population variance of a dataset is 49. What is its population standard deviation?numerical answer — type a number
Q6Annual incomes in a town are 30k, 32k, 35k, 38k, and 5,000k (one billionaire). Which measure best describes the 'typical' income?

Practice this in an interview

All questions
What is a z-score and what is standardization used for in data science?

A z-score expresses how many standard deviations an observation is from the mean of its distribution, converting raw values to a common unitless scale. Standardization — subtracting the mean and dividing by the standard deviation — is essential before algorithms that depend on distances or regularization penalties, because it prevents features with large numeric ranges from dominating those with small ranges.

When is the mean a misleading summary statistic, and what should you use instead?

The mean is distorted by skewness and outliers, masks multimodality, and can describe a value that no individual in the dataset actually holds. Skewed, heavy-tailed, or multimodal distributions almost always require the median, percentiles, or the full distributional picture rather than the mean.

When should you use mean vs median vs mode, and which is most robust to outliers?

Mean is optimal for symmetric, outlier-free data; median is the go-to for skewed distributions or when outliers are real rather than errors; mode is the only sensible average for nominal/categorical data. Robustness is a formal concept — the median's breakdown point is 50%, meaning half the data can be corrupted before it fails, while the mean's breakdown point is essentially 0%.

How do you handle skewed features in a machine learning dataset, and why does skew matter?

Right-skewed features (long tail on the right) concentrate most values near zero while a few extreme values pull the mean up, which distorts distance-based models and linear regression. Common fixes are log, square-root, or Box-Cox transformations that compress the tail and make the distribution closer to normal, improving model convergence and reducing the undue influence of large values.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content