Machine Learning Easy Asked at GoogleAsked at AmazonAsked at Microsoft

What is the accuracy paradox and how does it expose the failure of accuracy as a metric?

For Data Scientist ML Engineer Data Analyst AI / LLM Engineer

The short answer

The accuracy paradox occurs when a trivial model — one that always predicts the majority class — achieves high accuracy on an imbalanced dataset despite having zero predictive power for the minority class. A model that predicts 'not fraud' on every transaction achieves 99.9% accuracy if fraud is 0.1% of the data, but its recall for fraud is zero. Accuracy is only meaningful when classes are roughly balanced.

How to think about it

The accuracy paradox is the gap between a number that looks excellent and a model that has learned nothing at all. Interviewers love it because it catches people who reach for accuracy by reflex — and it rewards anyone whose instinct is to ask, “accurate compared to what?”

The paradox in one example

Picture a credit-card fraud dataset: 99,900 legitimate transactions and 100 fraudulent ones, so fraud is 0.1% of the data. Now build the laziest “model” imaginable — one that ignores its input and predicts legitimate every single time. Score it:

Accuracy = 99,900 / 100,000 = 99.9%
Recall on fraud = 0 / 100 = 0%
Precision on fraud = undefined — it never makes a positive prediction
F1 on fraud = 0

Ninety-nine point nine percent, and it has never once caught fraud. That accuracy isn’t measuring skill; it is just echoing the majority class back at you. Any model worth deploying has to beat the 99.9% baseline on fraud-specific metrics — not merely tie it on accuracy.

Why accuracy fails here

Accuracy weighs every mistake the same: missing a real fraud costs exactly as much as falsely flagging a clean transaction. On heavily imbalanced data, that even-handedness backfires. The cheapest route to a high score is to never predict the rare class at all — and if accuracy is the signal you optimise, the model will take that route every time.

When accuracy is the right call

Accuracy earns its keep when the setting is fair to it:

Classes are roughly balanced — say, nothing below 30–40% of the samples.
A false positive and a false negative cost about the same.
You are comparing models head-to-head on the same balanced set.

Outside those conditions, report it next to a metric that can actually see the minority class.

What to use instead

Situation	Reach for
Imbalanced binary classification	F1 on the positive class, PR-AUC
Rare events (fraud, disease)	Recall at a fixed false-positive rate, PR-AUC
Several minority classes	Macro- or weighted-F1
Ranked retrieval	MAP, NDCG

Learn it properly Class imbalance

What is the accuracy paradox and how does it expose the failure of accuracy as a metric?

The paradox in one example

Why accuracy fails here

When accuracy is the right call

What to use instead

Keep practising

Explore further