What is a z-score and what is standardization used for in data science?

A z-score expresses how many standard deviations an observation is from the mean of its distribution, converting raw values to a common unitless scale. Standardization — subtracting the mean and dividing by the standard deviation — is essential before algorithms that depend on distances or regularization penalties, because it prevents features with large numeric ranges from dominating those with small ranges.

When is the mean a misleading summary statistic, and what should you use instead?

The mean is distorted by skewness and outliers, masks multimodality, and can describe a value that no individual in the dataset actually holds. Skewed, heavy-tailed, or multimodal distributions almost always require the median, percentiles, or the full distributional picture rather than the mean.

When should you use mean vs median vs mode, and which is most robust to outliers?

Mean is optimal for symmetric, outlier-free data; median is the go-to for skewed distributions or when outliers are real rather than errors; mode is the only sensible average for nominal/categorical data. Robustness is a formal concept — the median's breakdown point is 50%, meaning half the data can be corrupted before it fails, while the mean's breakdown point is essentially 0%.

How do you handle skewed features in a machine learning dataset, and why does skew matter?

Right-skewed features (long tail on the right) concentrate most values near zero while a few extreme values pull the mean up, which distorts distance-based models and linear regression. Common fixes are log, square-root, or Box-Cox transformations that compress the tail and make the distribution closer to normal, improving model convergence and reducing the undue influence of large values.

Mean, Median, Mode & z-scores — GATE DA

Here are the marks five students scored on a short quiz, out of ten: 2, 4, 4, 6, 9. A parent asks how the class did, and you want to answer with a single number. But which single number? You already have three honest answers, and they do not all agree.

Three ways to say “typical”

Let us find all three on these five marks, because each is a different idea of the centre.

Add the marks and share them out equally: 2 + 4 + 4 + 6 + 9 = 25, shared among five students, is 5 each. That equal-share value is the mean.

Or line the marks up in order — 2, 4, 4, 6, 9 — and read the one in the middle. Here it is 4. That middle value is the median. (If there were an even number of marks, you would average the two in the middle.)

Or simply ask which mark came up most often. The 4 appears twice and every other mark once, so the mode is 4. The mode is the only one of the three that also works for things you cannot average, like favourite colours.

So the same five marks give mean 5, median 4, mode 4. Close together here. The crudest measure of spread, the range, is just max − min = 9 − 2 = 7. Now watch what happens when the numbers are not so gentle.

The mean adds everything and shares it, so the single huge income drags it far above what any ordinary neighbour earns — past ₹10,00k, a figure no one on the street would recognise as “typical”. The median just walks to the middle of the sorted list and stops at ₹35k, calmly ignoring how large the top value is. This is the rule worth remembering: the mean is pulled by outliers; the median is not. For skewed things — incomes, house prices — the median is the more honest centre.

Spread is the other half of the story

The centre alone can hide a lot. Two classes can share a mean of 5 and still feel completely different — one with every mark near 5, the other with marks flung out to 0 and 10. To capture that, we measure how far the values sit from the mean, on average. Squaring each distance (so that below and above both count as “far”) and averaging gives the variance; its square root, back in the original units, is the standard deviation.

There is one fork that costs marks, so let us name it carefully. If your data is the whole population, divide the summed squared distances by n. If it is only a sample drawn from a larger population, divide by n − 1 instead — a fix called Bessel’s correction, because the sample mean is built from the same data and makes the distances come out slightly too small.

Watch it once, on the list 2, 4, 6, 8, 10, whose mean is 30/5 = 6:

distances from the mean:   −4, −2, 0, 2, 4
squared:                    16,  4, 0, 4, 16     →  sum = 40

population variance  = 40 / 5     = 8      →  SD = √8  ≈ 2.83
sample variance      = 40 / (5−1) = 10     →  SD = √10 ≈ 3.16

Same 40 on top; only the divisor changes, and n − 1 always gives the larger answer. The exam’s job is usually just to see whether you pick the right one.

The z-score — how unusual is one value?

Subtract the mean, divide by the standard deviation — a value’s distance from average, measured in SDs.

“I scored 80” means little until you know the class. Was that one mark above average, or three? The z-score answers exactly that. It subtracts the mean and divides by the standard deviation, so a raw value becomes “how many standard deviations from the mean” — z = (x − μ)/σ. A z-score of 0 is dead average, +2 is two SDs above, a negative z-score is below. Because it strips away the units and the scale, the same z is comparable across any two measurements — which is the exact move machine-learning preprocessing makes when it standardises features.

A worked example — a real 2024 question

A data value of 106000 comes from a distribution with mean μ = 96000 and standard deviation σ = 21000. Find its z-score.

The value sits 10000 above the mean, and one whole standard deviation is 21000 — so the value is less than one SD out, and the z-score should come in under one. The arithmetic agrees:

z = (x − μ) / σ = (106000 − 96000) / 21000 = 10000 / 21000 ≈ 0.476

So z ≈ 0.476 — about half a standard deviation above the mean. This is a real GATE DA 2024 answer, and notice how the large, awkward numbers collapse to one clean, unit-free score once standardised.

A question to carry forward

The z-score reshapes any measurement to a common scale of “SDs from the mean”. Here is the thread for the next lesson: if you collected the z-scores of every value in a large, bell-shaped dataset, what would their own mean and standard deviation turn out to be — and why might that be a convenient thing?

In one breath

Centre: mean (equal share, pulled by outliers), median (the middle of the sorted list, calm under outliers), mode (most frequent, the only one for categories).
Range = max − min; for skewed data prefer the median.
Spread: variance = average squared distance from the mean; SD = √variance, in the original units.
The divisor: population variance ÷ n; sample variance ÷ n − 1 (Bessel’s, always the larger). On 2,4,6,8,10: σ² = 8, s² = 10.
z-score z = (x − μ)/σ = how many SDs from the mean; unit-free, the move ML standardisation makes (2024: z ≈ 0.476).

Practice

Quick check

0/6

Q1Recall: which single measure of centre is least disturbed by one extreme outlier?

Q2Trace: for the dataset 3, 5, 5, 7, 10, what is the median?numerical answer — type a number

Q3Trace: the SAMPLE 4, 8, 6, 10, 12 has mean 8. Compute its sample variance (divide by n − 1).numerical answer — type a number

Q4Apply: a value x = 80 comes from a distribution with mean μ = 50 and standard deviation σ = 12. Compute its z-score.numerical answer — type a number

Q5Apply: the population variance of a dataset is 49. What is its population standard deviation?numerical answer — type a number

Q6Create: two classes both have mean mark 5. Class A's marks are 5,5,5,5,5; class B's are 0,2,5,8,10. Without full calculation, reason which has the larger standard deviation and why.

Mean, Median, Mode & z-scores

What you'll learn

Before you start

Three ways to say “typical”

Spread is the other half of the story

The z-score — how unusual is one value?

A worked example — a real 2024 question

A question to carry forward

In one breath

Practice

Quick check

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further