What is a confusion matrix and what four quantities does it report?

A confusion matrix tallies predictions against ground truth in a 2x2 table: true positives, true negatives, false positives, and false negatives. From those four cells every classification metric — accuracy, precision, recall, F1, specificity — can be derived. It exposes *which kind* of error a model makes, not just how often it errs.

What is the Precision-Recall curve, and why does it outperform ROC-AUC on imbalanced datasets?

The PR curve plots precision against recall as the decision threshold varies. On imbalanced datasets it is more informative than ROC because it ignores the large pool of true negatives that inflate ROC-AUC — a model that looks good on ROC can still have dismal precision, which PR-AUC immediately exposes. PR-AUC is the better metric whenever the positive class is rare and getting predictions right matters more than ranking.

What is the ROC curve and what does AUC actually measure?

The ROC curve plots True Positive Rate (recall) against False Positive Rate at every decision threshold. AUC — the area under that curve — equals the probability that the model ranks a randomly chosen positive example above a randomly chosen negative one. A random classifier scores 0.5; a perfect classifier scores 1.0.

How do you approach anomaly detection, and why is accuracy a bad metric for it?

Anomaly detection finds rare points that deviate from normal patterns, using statistical, distance, density, or model-based methods like isolation forest and one-class SVM, often trained mostly on normal data. Accuracy is misleading because anomalies are extremely rare, so a model that predicts 'normal' for everything scores high accuracy while catching nothing. Use precision, recall, F1, PR-AUC, or ROC-AUC instead, chosen by the cost of false positives vs false negatives.

Confusion Matrix, Precision, Recall, ROC — GATE DA

What you'll learn

The confusion matrix: TP, FP, FN, TN — the four cells every metric is built from

accuracy = (TP+TN)/total; precision = TP/(TP+FP); recall = TP/(TP+FN)

F1 is the harmonic mean of precision and recall, and why accuracy lies on imbalanced data

ROC curve (TPR vs FPR) and AUC as a threshold-free ranking score

Last lesson ended with the indictment: on the 5% positive medical set, a model that predicts “negative” for everyone scores 95% accuracy and catches not one sick patient. Accuracy lied because it lumped every error into one heap. To see which kind of mistake a model makes, you have to pull that heap apart — and a single number never can.

Think about what is really at stake. A model that flags a healthy patient as sick raises a false alarm; a model that waves a sick patient through commits a miss. Those two failures cost wildly different things — and accuracy, which only counts total rights and wrongs, cannot tell them apart. What can is a small table that cross-tabulates what the model said against what was true. That table is the confusion matrix, and precision, recall, F1, and AUC all read straight off its four cells.

The confusion matrix

Cross “what the model predicted” against “what was actually true.” Each of the four cells counts one outcome:

Every classification metric is just a ratio of these four counts.

TP — predicted positive, actually positive (a correct hit).
FP — predicted positive, actually negative (a false alarm).
FN — predicted negative, actually positive (a miss).
TN — predicted negative, actually negative (a correct rejection).

The four metrics

From those four counts come the four numbers GATE asks for:

Precision and recall differ only in their denominator — predicted-positives vs actual-positives.

Accuracy = (TP + TN) / total — the fraction of all predictions that were correct. The number that lied last lesson.
Precision = TP / (TP + FP) — of everything flagged positive, how many really were. High precision means few false alarms.
Recall (sensitivity, TPR) = TP / (TP + FN) — of everything actually positive, how many you caught. High recall means few misses.
F1 = 2 · P · R / (P + R) — the harmonic mean of precision and recall. It stays low unless both are high, so it folds the trade-off into one number.

Drag the decision threshold and watch the four cells — and therefore every metric — move together:

TryThreshold & confusion

One model score, infinite classifiers — move the threshold

Each dot is one sample. Positives (top row) and negatives (bottom row) are sorted by the model's probability score. The threshold line decides what counts as a positive prediction — drag it left and you catch more real positives (recall ↑, precision ↓); drag it right and you flag fewer false alarms (precision ↑, recall ↓).

Model scorespositives (actual 1)negatives (actual 0)

threshold0.50

Predict positive when score ≥ 0.50

Confusion matrix @ 0.50

pred +

pred −

actual +

TP40

FN10

actual −

FP7

TN41

Precision85.1%

Recall80.0%

F182.5%

Accuracy82.7%

The threshold is not "0.5 by default" — it is a tunable parameter. Shift it toward 0 to maximise recall (catch everything, accept false alarms); shift toward 1 to maximise precision (only flag what you're sure about, miss some real cases). Which trade-off is right depends on the problem — not on the model.

ROC and AUC

A classifier really outputs a score, and you pick a threshold to turn it into a yes/no. Sweep that threshold from strict to lenient and plot TPR (recall, TP/(TP+FN)) on the y-axis against FPR (FP/(FP+TN)) on the x-axis — the traced curve is the ROC curve. AUC, the area under it, is the probability that the model ranks a random positive above a random negative: 0.5 is coin-flip guessing, 1.0 is perfect. Because AUC never fixes a single threshold, it is a clean one-number comparison of two models’ ranking ability.

TryMetrics · threshold, ROC & PR

One model, no single score — drag the threshold, trade precision for recall

Two overlapping score distributions: the model gives every row a number in 0–1, and you choose where to cut. Everything right of the threshold is predicted positive. Drag it and watch the confusion matrix, ROC, and PR move together — then pull the classes apart with separation and see the ROC bow toward the corner.

Model scores negatives positives

threshold = 0.50 · predict positive when score ≥ 0.50

separationoverlapping

Pull the two humps apart or together. More overlap → no threshold can win → the ROC collapses onto the diagonal (AUC → 0.5).

Confusion matrix @ 0.50

pred +

pred −

actual +

TP933

FN67

actual −

FP82

TN918

Precision92.0%

Recall (TPR)93.3%

F192.6%

Accuracy92.6%

FPR8.2%

ROC-AUC0.979

ROC curveAUC = 0.979

PR curvebaseline = 0.50

There is no single “accuracy.” Sliding the threshold left catches more real positives (recall ↑) but flags more junk (precision ↓); sliding it right does the reverse. ROC and PR summarise every threshold at once — and only when the classes actually separate does the ROC bow toward the top-left corner.

How GATE asks this

The reliable NAT hands you a populated confusion matrix and asks for one metric — precision, recall, F1, or accuracy — to two or three decimals. The recipe never varies: read off TP, FP, FN, TN, then plug into the ratio. The MSQ variant builds the same counts from a word problem and asks which statements hold — GATE DA 2026 (Q47) gave one class 20 items and the other 10, with a handful misclassified each way, then asked you to compare the two classes’ accuracy, precision, and recall. Either way you must keep the precision-vs-recall denominators straight, and remember that accuracy lies under class imbalance.

Worked example — GATE DA 2026

A binary classifier on 30 samples produces TP = 8, FP = 2, FN = 6, TN = 14. Compute accuracy, precision, recall, and F1.

Check the total first: 8 + 2 + 6 + 14 = 30. Then take each metric in turn:

accuracy  = (TP + TN) / total = (8 + 14) / 30 = 22/30  ≈ 0.733
precision = TP / (TP + FP)    = 8 / (8 + 2)   = 8/10   = 0.800
recall    = TP / (TP + FN)    = 8 / (8 + 6)   = 8/14   ≈ 0.571
F1        = 2·P·R / (P + R)   = 2·0.8·0.571 / (0.8 + 0.571)
          = 0.914 / 1.371                              ≈ 0.667

So accuracy ≈ 0.733, precision = 0.80, recall ≈ 0.571, F1 ≈ 0.667. Precision (0.80) beats recall (0.571), exactly as the few-false-alarms / more-misses cell counts predicted: the model is cautious — when it says positive it is usually right, but it still misses 6 of the 14 actual positives.

In one breath

The confusion matrix cross-tabulates predictions against truth into four counts — TP, FP, FN, TN — and every metric is a ratio of them: accuracy (TP+TN)/total counts all correct calls (but lies under imbalance), precision TP/(TP+FP) is how many flagged positives are real, recall TP/(TP+FN) is how many real positives are caught, and F1 2PR/(P+R) is their harmonic mean; sweeping the score threshold traces the ROC curve of TPR against FPR, whose area AUC is the threshold-free chance the model ranks a random positive above a random negative.

Practice

Quick check

0/6

Q1Recall — Which statements about precision vs recall are correct? (select all that apply)select all that apply

Q2Recall — A dataset is 99% negative. A model predicts 'negative' for every sample. Which statements are TRUE? (select all that apply)select all that apply

Q3Trace — A confusion matrix has TP = 8, FP = 2, FN = 6, TN = 14. Compute the precision. (2 decimals)numerical answer — type a number

Q4Trace — Same matrix (TP = 8, FP = 2, FN = 6, TN = 14). Compute the recall. (3 decimals)numerical answer — type a number

Q5Trace — Same matrix (TP = 8, FP = 2, FN = 6, TN = 14, total 30). Compute the accuracy. (3 decimals)numerical answer — type a number

Q6Apply — With precision = 0.80 and recall = 0.571 (from the matrix above), compute the F1 score. (3 decimals)numerical answer — type a number

A question to carry forward

Every metric here — the confusion matrix, precision, recall, the ROC curve — quietly assumes you already have a classifier: something that emits “positive” or “negative,” or better, a score you can threshold. We have spent six lessons building models, but every one of them predicted a number. Not once have we built a machine that outputs a class.

So the obvious gap opens. The ROC curve practically begs for it — it sweeps the threshold on a model’s score, treating that score as a probability of being positive. Where does such a probability come from? Here is the thread onward: can you take the linear machinery of regression — that same wᵀx + b weighted sum — and bend its unbounded output into a clean probability between 0 and 1, giving you the first true classifier, the one every metric in this lesson was waiting to grade?

Confusion Matrix, Precision, Recall, ROC

What you'll learn

Before you start

The confusion matrix

The four metrics

One model score, infinite classifiers — move the threshold

ROC and AUC

One model, no single score — drag the threshold, trade precision for recall

How GATE asks this

Worked example — GATE DA 2026

In one breath

Practice

Quick check

A question to carry forward

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further