What is the Precision-Recall curve, and why does it outperform ROC-AUC on imbalanced datasets?
The PR curve plots precision against recall as the decision threshold varies. On imbalanced datasets it is more informative than ROC because it ignores the large pool of true negatives that inflate ROC-AUC — a model that looks good on ROC can still have dismal precision, which PR-AUC immediately exposes. PR-AUC is the better metric whenever the positive class is rare and getting predictions right matters more than ranking.
How to think about it
Explain the PR curve axes, derive why FPR is misleading when negatives dominate, then show the numeric example that makes the difference concrete.
The PR curve
As the threshold falls, recall climbs (you catch more positives) but precision typically falls (you also pick up more false positives). The curve starts at (0, high-precision) and ends at (1, base-rate). A model with no skill sits at the horizontal line y = base rate. Area under the PR curve — PR-AUC or Average Precision (AP) — summarises performance across all thresholds.
Why ROC misleads on imbalanced data
ROC’s x-axis is False Positive Rate = FP / (FP + TN).
When the negative class is enormous (say 10,000 negatives, 100 positives), even 500 false positives produce an FPR of only 0.05 — making the curve look well-behaved. But those 500 false positives swamp the 80 true positives: precision = 80 / (80 + 500) = 0.14. The model is near-useless for finding positives, yet ROC-AUC might report 0.93.
PR-AUC uses precision directly, so this collapse is immediately visible.
Numeric example
Suppose 100 positives, 9,900 negatives (1% prevalence). A model scores:
| Metric | Value |
|---|---|
| ROC-AUC | 0.96 |
| Precision at 90% recall | 0.08 |
| PR-AUC | 0.41 |
The ROC number looks excellent; PR-AUC reveals the model gets only 8 true positives for every 100 positive predictions — unacceptable for fraud detection.
When to use each
| Situation | Prefer |
|---|---|
| Balanced classes or TN matters | ROC-AUC |
| Rare positives (fraud, disease, anomaly) | PR-AUC / AP |
| Ranking quality across all thresholds | ROC-AUC |
| Retrieving high-precision positive predictions | PR-AUC |
Baseline reference
- ROC baseline for a random classifier: diagonal line, AUC = 0.5.
- PR baseline for a random classifier: horizontal line at y = prevalence (not 0.5).