datarekha
Statistics & Probability Medium Asked at GoogleAsked at AmazonAsked at MetaAsked at Microsoft

What is Simpson's paradox? Walk through a concrete example.

The short answer

Simpson's paradox occurs when a trend that appears in several subgroups disappears or reverses when those subgroups are combined. It arises because a lurking variable (the group size itself, correlated with both treatment and outcome) distorts the aggregate.

How to think about it

Simpson’s paradox is the phenomenon where an association present in every subgroup reverses in the combined data. It is not a mathematical contradiction — it is a signal that group membership is a confounder that must be controlled.

Worked numeric example — hospital survival rates

Two hospitals treat the same two conditions: mild cases and severe cases.

Hospital A

ConditionSurvivedTotalRate
Mild8001 00080%
Severe2001 00020%
Combined1 0002 00050%

Hospital B

ConditionSurvivedTotalRate
Mild9010090%
Severe40090044%
Combined4901 00049%

Within each condition, Hospital B has a higher survival rate (90% vs 80% for mild; 44% vs 20% for severe). Yet the combined rate favours Hospital A (50% vs 49%).

The reason: Hospital A handles proportionally more mild cases (1 000 out of 2 000 = 50%), which inflate its aggregate. Hospital B is disproportionately sent severe cases (900 out of 1 000 = 90%), dragging its aggregate down. Condition severity is the lurking confounder.

Inline diagram — within-group vs aggregate trend

90%50%20%Hospital AHospital BMild (B better)Severe (B better)Aggregate (A ≥ B)
Within each condition Hospital B outperforms, yet the aggregate line barely moves — a classic Simpson reversal caused by case-mix imbalance.

Why it happens mathematically

The aggregate rate is a weighted average of subgroup rates where the weights (group sizes) differ between A and B. When groups with systematically different base rates also differ in size across treatments, the weights distort the aggregate.

Learn it properly Simpson's Paradox

Keep practising

All Statistics & Probability questions

Explore further

Skip to content