datarekha
Machine Learning Easy Asked at GoogleAsked at AmazonAsked at Microsoft

What is the accuracy paradox and how does it expose the failure of accuracy as a metric?

The short answer

The accuracy paradox occurs when a trivial model — one that always predicts the majority class — achieves high accuracy on an imbalanced dataset despite having zero predictive power for the minority class. A model that predicts 'not fraud' on every transaction achieves 99.9% accuracy if fraud is 0.1% of the data, but its recall for fraud is zero. Accuracy is only meaningful when classes are roughly balanced.

How to think about it

Lead with the clearest concrete example, derive the numbers, then show what metric to use instead.

The paradox in one example

A credit-card fraud dataset: 99,900 legitimate transactions and 100 fraudulent ones (0.1% fraud rate).

A “model” that predicts legitimate for every single row achieves:

  • Accuracy = 99,900 / 100,000 = 99.9%
  • Recall for fraud = 0 / 100 = 0%
  • Precision for fraud = undefined (no positive predictions)
  • F1 for fraud = 0

The accuracy number is the worst kind of misleading: it’s the majority-class baseline, not a measure of learning. Any real fraud model must beat this baseline on fraud-specific metrics, not on accuracy.

Why accuracy fails in this setting

Accuracy treats every mistake equally: missing one fraud is weighted the same as wrongly flagging one legitimate transaction. On imbalanced data, the cheap route to high accuracy is to never predict the minority class — and the model will happily take that route if accuracy is the loss signal.

When accuracy is valid

Accuracy is a fine metric when:

  1. Classes are approximately balanced (say, no class below 30–40% of samples).
  2. The costs of FP and FN are roughly symmetric.
  3. You are comparing multiple models head-to-head on the same balanced dataset.

What to use instead

SituationPreferred metric
Imbalanced binary classificationF1 (positive class), PR-AUC
Rare events (fraud, disease)Recall @ fixed FPR, PR-AUC
Multiple minority classesMacro or weighted F1
Ranked retrievalMAP, NDCG
Learn it properly Class imbalance

Keep practising

All Machine Learning questions

Explore further

Skip to content