Machine Learning Hard Asked at GoogleAsked at MetaAsked at StripeAsked at Airbnb

What is the Precision-Recall curve, and why does it outperform ROC-AUC on imbalanced datasets?

For Data Scientist ML Engineer AI / LLM Engineer

The short answer

The PR curve plots precision against recall as the decision threshold varies. On imbalanced datasets it is more informative than ROC because it ignores the large pool of true negatives that inflate ROC-AUC — a model that looks good on ROC can still have dismal precision, which PR-AUC immediately exposes. PR-AUC is the better metric whenever the positive class is rare and getting predictions right matters more than ranking.

How to think about it

Explain the PR curve axes, derive why FPR is misleading when negatives dominate, then show the numeric example that makes the difference concrete.

The PR curve

x-axis: Recall = TP / (TP + FN)
y-axis: Precision = TP / (TP + FP)

As the threshold falls, recall climbs (you catch more positives) but precision typically falls (you also pick up more false positives). The curve starts at (0, high-precision) and ends at (1, base-rate). A model with no skill sits at the horizontal line y = base rate. Area under the PR curve — PR-AUC or Average Precision (AP) — summarises performance across all thresholds.

Why ROC misleads on imbalanced data

ROC’s x-axis is False Positive Rate = FP / (FP + TN).

When the negative class is enormous (say 10,000 negatives, 100 positives), even 500 false positives produce an FPR of only 0.05 — making the curve look well-behaved. But those 500 false positives swamp the 80 true positives: precision = 80 / (80 + 500) = 0.14. The model is near-useless for finding positives, yet ROC-AUC might report 0.93.

PR-AUC uses precision directly, so this collapse is immediately visible.

Numeric example

Suppose 100 positives, 9,900 negatives (1% prevalence). A model scores:

Metric	Value
ROC-AUC	0.96
Precision at 90% recall	0.08
PR-AUC	0.41

The ROC number looks excellent; PR-AUC reveals the model gets only 8 true positives for every 100 positive predictions — unacceptable for fraud detection.

When to use each

Situation	Prefer
Balanced classes or TN matters	ROC-AUC
Rare positives (fraud, disease, anomaly)	PR-AUC / AP
Ranking quality across all thresholds	ROC-AUC
Retrieving high-precision positive predictions	PR-AUC

Baseline reference

ROC baseline for a random classifier: diagonal line, AUC = 0.5.
PR baseline for a random classifier: horizontal line at y = prevalence (not 0.5).

Learn it properly Metrics that matter