datarekha
Statistics & Probability Easy Asked at GoogleAsked at AmazonAsked at MetaAsked at Netflix

What is the difference between correlation and causation, and why does the distinction matter?

The short answer

Correlation measures the strength of a linear relationship between two variables, but a shared cause, reverse causation, or coincidence can all produce correlation without any causal link. Treating correlation as causation leads to interventions that fail or cause harm.

How to think about it

Correlation quantifies co-movement; causation asserts that changing one variable produces a change in another. Conflating the two is one of the most common and consequential errors in applied analysis.

Why correlation is not causation

A Pearson correlation of r = 0.9 between X and Y means they move together strongly. It does not tell you:

  • Whether X causes Y
  • Whether Y causes X
  • Whether a third variable Z drives both (confounding)
  • Whether the relationship is spurious coincidence

Classic example: Ice cream sales and drowning rates correlate positively every summer. The confounder is hot weather — it drives both. Banning ice cream would not reduce drowning deaths.

Three causal structures that all produce correlation

StructureDescription
X → YDirect causation
Y → XReverse causation
Z → X and Z → YCommon cause (confounding)

All three produce the same observed correlation between X and Y. Observational data alone cannot distinguish them.

How to establish causation

  1. Randomised controlled trial (RCT): randomly assign treatment so confounders are balanced.
  2. Natural experiment: exploit an exogenous shock (lottery, policy change) that mimics randomisation.
  3. Causal graph + do-calculus: encode assumptions explicitly and identify adjustment sets.
  4. Granger causality: time-series test — does past X predict future Y beyond past Y alone? (necessary but not sufficient.)

Keep practising

All Statistics & Probability questions

Explore further

Skip to content