What is k-fold cross-validation and when should you use it over a single train/validation split?

K-fold CV partitions data into k equal folds, trains on k-1 and validates on the remaining fold k times, then averages the k scores. It gives a lower-variance estimate of generalization error than a single split and is preferred when the dataset is small enough that a single held-out set would be too noisy or wasteful.

What is stratified k-fold cross-validation and when is it necessary?

Stratified k-fold ensures each fold has the same class-label proportions as the full dataset. It is necessary for imbalanced classification because standard random k-fold can produce folds where a minority class is entirely absent, making per-fold metrics undefined or severely misleading.

Why can't you use standard k-fold cross-validation on time-series data, and what should you use instead?

Standard k-fold randomly shuffles data, so a validation fold can contain timestamps earlier than the training fold — training on the future to predict the past. Time-series CV uses walk-forward (expanding-window or sliding-window) splits that always validate on data strictly after the training window.

Why do you need nested cross-validation, and what problem does it solve over regular cross-validation?

Nested cross-validation separates hyperparameter tuning from performance estimation using an inner loop for model selection and an outer loop for evaluation. It solves the optimistic-bias problem: if you tune and evaluate on the same folds, the validation data leaks into model selection and your reported score overestimates real-world performance. The inner loop never touches the outer test fold, giving an unbiased estimate of the whole pipeline's generalization.

Cross-Validation: k-fold, LOO, Stratified — GATE DA

What you'll learn

Why a single train/test split gives a noisy, luck-dependent score

k-fold CV: split into k folds, train on k−1, validate on 1, rotate, average — that is k models

Leave-one-out (LOO) is k = n, so the number of folds equals the number of training samples

Stratified CV preserves class proportions — essential for imbalanced data

Last lesson left us hunting the bottom of the U-curve with only one crude instrument: test the model on data it has not seen. But watch that instrument wobble. You split your data 80/20, train, and score 0.84. You reshuffle the split, retrain, and score 0.79. Which number is the truth? Neither — a single train/test split is a coin flip, because the test set is just one small random sample, and your score depends on which rows happened to land in it.

The fix is almost embarrassingly simple: stop trusting one lucky test, and instead test on every row, then average. If a single referee can be unfair, ask several and average their verdicts. That averaging-away of the luck is cross-validation, and it turns a jittery one-off reading into a stable estimate you can actually compare complexities with.

k-fold cross-validation

Split the data into k equal folds. Hold one fold out as the validation set, train on the other k−1, and record the score. Then rotate, so each fold takes its turn as the validation set exactly once. You train k separate models and average their k scores into one stable estimate (with a standard deviation for the spread).

Each fold (highlighted) is the validation set exactly once; k folds means k models, averaged into one score.

Drag k and watch the folds rotate — notice how every row eventually gets validated:

TryCross-validation

Watch k-fold CV rotate the validation slice across every row

Pick k, hit Run (or Step). Each fold, one slice becomes val and the rest are train. Every row gets validated exactly once. Toggle Stratified to see class balance preserved per fold.

class

fold 1

0.83

Folds done1/ 5

Mean acc.0.830

± std—

speed

Leave-one-out (LOO): the extreme case

Push k all the way up to n, the number of training samples. Now each fold holds exactly one row: you train on n−1 rows, validate on the single left-out row, and repeat for every row. So the number of folds — and the number of models you train — equals the number of training samples. LOO squeezes the most data into each model (only one row held out), but pays for it by fitting n models in all.

Stratified cross-validation

Plain k-fold shuffles rows blindly. On imbalanced data — say 5% positives — a random fold can end up with zero positive examples, which makes its score meaningless. Stratified CV fixes this by preserving the class proportions in every fold: if the full set is 5% positive, each fold is held to roughly 5% positive too. For classification, and especially imbalanced classification, stratified k-fold is the default.

How GATE asks this

The signature NAT hands you a dataset size and a CV scheme and asks for the number of iterations (models). The trap is the held-out test set: GATE DA 2026 gave 1000 samples with 100 held out as a test set, then asked how many iterations LOOCV runs on the remainder. LOO trains one model per training sample, and there are 1000 − 100 = 900 of those — so the answer is 900, not 1000. The MCQ variant describes a 5% positive dataset and asks which scheme to use and what to report: stratified CV, scored with AUC rather than plain accuracy.

Worked example — GATE DA 2026

A dataset has 1000 samples. You set aside 100 as a held-out test set and run leave-one-out cross-validation on the rest. How many iterations does LOOCV run?

Trace it carefully. The held-out test set never enters cross-validation, so the training set size is 1000 − 100 = 900. LOO sets k = n, training one model per training sample — so it runs 900 iterations, one model per left-out row. Were you instead to use plain 5-fold CV on those same 900, you would train only 5 models, each validating on 900 / 5 = 180 rows.

In one breath

A single train/test split scores the model on one small random subset, so the number swings with the luck of the draw; cross-validation averages that luck away by splitting into k folds, rotating each through the validation role once, and averaging the k scores — with leave-one-out the extreme k = n (one model per sample, most data but n fits) and stratified k-fold the variant that preserves each class’s proportion in every fold, which is essential on imbalanced data where a random fold might otherwise contain no positives.

Practice

Quick check

0/6

Q1Recall — Why is a single 80/20 train/test split a poor way to compare two models?

Q2Recall — When is STRATIFIED k-fold cross-validation the right choice? (select all that apply)select all that apply

Q3Trace — You have 1000 samples, hold out 100 as a test set, and run LOOCV on the remainder. How many models (iterations) does LOOCV train?numerical answer — type a number

Q4Trace — Using plain 5-fold cross-validation on a training set of 800 samples, how many models are trained, in total, to produce the cross-validated score?numerical answer — type a number

Q5Trace — A 5-fold CV reports accuracy scores [0.80, 0.82, 0.78, 0.81, 0.79] across its folds. What single number best summarises the model's performance? (2 decimals)numerical answer — type a number

Q6Apply — A medical dataset is 5% positive. Which combination is the safest evaluation choice?

A question to carry forward

Cross-validation hands back a single, trustworthy number for “how good is this model.” But look closely at the score it averaged — for a classifier, what is that number? Quietly, all along, it has been accuracy: the fraction of predictions that were right.

And accuracy is a liar on lopsided data. On the 5% positive medical set, a model that smugly predicts “negative” for everyone scores 95% — while catching not one sick patient. The last quiz already reached for “report AUC instead,” but why? Here is the thread onward: when the classes are imbalanced, or a false alarm and a missed case cost wildly different amounts, what richer set of numbers — drawn from a little table of right and wrong predictions — tells the honest story that one accuracy figure hides?

Cross-Validation: k-fold, LOO, Stratified

What you'll learn

Before you start

k-fold cross-validation

Watch k-fold CV rotate the validation slice across every row

Leave-one-out (LOO): the extreme case

Stratified cross-validation

How GATE asks this

Worked example — GATE DA 2026

In one breath

Practice

Quick check

A question to carry forward

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further