datarekha
Machine Learning Medium Asked at GoogleAsked at AmazonAsked at MetaAsked at AppleAsked at Stripe

What is the ROC curve and what does AUC actually measure?

The short answer

The ROC curve plots True Positive Rate (recall) against False Positive Rate at every decision threshold. AUC — the area under that curve — equals the probability that the model ranks a randomly chosen positive example above a randomly chosen negative one. A random classifier scores 0.5; a perfect classifier scores 1.0.

How to think about it

Define the axes, explain AUC as a ranking probability, cover the operating-point interpretation, and flag the imbalance pitfall.

The two axes

  • True Positive Rate (TPR) = Recall = TP / (TP + FN) — y-axis.
  • False Positive Rate (FPR) = FP / (FP + TN) — x-axis.

As the decision threshold drops from 1.0 to 0.0, every prediction flips to positive. The model traces a path from (0, 0) — predict nothing positive — to (1, 1) — predict everything positive. A useful model curves toward the top-left corner before the diagonal.

What AUC measures: the ranking interpretation

AUC = P(score of a random positive > score of a random negative).

This is the Wilcoxon–Mann–Whitney statistic for the two score distributions. It tells you how well the model ranks positives above negatives, independent of any threshold. An AUC of 0.85 means: pick one positive and one negative at random — the model will rank the positive higher 85% of the time.

Reading a specific point on the curve

Each point on the ROC curve is an operating point for a fixed threshold. To choose a threshold in production, plot the curve and select the point that satisfies your business constraint — for example, “keep FPR below 5%” — then read off the corresponding TPR.

AUC in code

from sklearn.metrics import roc_auc_score
auc = roc_auc_score(y_true, y_score)  # y_score is the positive-class probability

Limitations

  • AUC summarises the entire threshold range, including operating points you’d never use. A model optimised for AUC may not be optimal at your deployment threshold.
  • On heavily imbalanced data, a high AUC can coexist with near-zero precision. A model with 0.97 AUC on a 1% positive dataset can still be nearly useless in production. Use PR-AUC instead (see the PR curve question).
Learn it properly Metrics that matter

Keep practising

All Machine Learning questions

Explore further

Skip to content