datarekha
Statistics & Probability Medium Asked at AmazonAsked at AppleAsked at Palantir

When would you use Spearman correlation instead of Pearson correlation?

The short answer

Pearson correlation measures the strength of the linear relationship between two continuous variables and is sensitive to outliers and non-normality. Spearman correlation is Pearson applied to the ranks of the data, making it appropriate for monotonic (not necessarily linear) relationships, ordinal variables, and data with outliers or heavy-tailed distributions.

How to think about it

State what each measures precisely, explain the rank transformation and why it confers robustness, then give concrete scenarios where the choice matters.

Pearson correlation (r)

r = Cov(X, Y) / (SD(X) * SD(Y))

Pearson measures the degree to which X and Y lie on a straight line. It is the maximum-likelihood estimate of the population correlation when (X, Y) is bivariate normal. It is sensitive to:

  • Outliers: A single extreme point can drive r from near zero to 0.9 or vice versa.
  • Non-linearity: A strong monotonic but curved relationship (e.g., exponential) yields r < 1 even though the relationship is perfectly predictable.
  • Non-normality: Hypothesis tests on r assume bivariate normality; inference is distorted for heavy-tailed distributions.

Spearman correlation (ρ)

Spearman replaces each value with its rank, then computes Pearson on those ranks:

rho = Pearson(rank(X), rank(Y))

Shortcut formula (when no ties): rho = 1 - (6 * sum(d_i^2)) / (n * (n^2 - 1)) where d_i = rank(x_i) - rank(y_i).

Spearman equals 1 for any perfectly monotonic increasing relationship, not just linear ones. Because ranks are bounded, outliers in raw values are automatically capped in influence.

Choosing between them

ScenarioUse
Both variables continuous, relationship expected to be linear, no extreme outliersPearson
Ordinal variables (e.g., survey ratings)Spearman
Heavy-tailed distributions or known outliersSpearman
Monotonic but curved relationshipSpearman
Inference relies on normality assumptionSpearman or bootstrap Pearson

A diagnostic approach

Compute both. If they agree closely, Pearson is likely fine. A large gap between them — especially when Pearson is much higher — suggests outliers or non-linearity are distorting the Pearson estimate.

Kendall’s tau is a third option: also rank-based, more robust than Spearman for small samples, but computationally O(n log n) rather than O(n).

Keep practising

All Statistics & Probability questions

Explore further

Skip to content