datarekha

Cross-Validation: k-fold, LOO, Stratified

One train/test split is a coin flip. Cross-validation rotates the validation set so every sample is tested once — averaging out the luck.

7 min read Intermediate GATE DA Lesson 83 of 122

What you'll learn

  • Why a single train/test split gives a noisy, luck-dependent score
  • k-fold CV: split into k folds, train on k−1, validate on 1, rotate, average — that is k models
  • Leave-one-out (LOO) is k = n, so the number of folds equals the number of training samples
  • Stratified CV preserves class proportions — essential for imbalanced data

Before you start

You split your data 80/20, train, and score 0.84. You reshuffle the split, retrain, and now score 0.79. Which number is the truth? Neither — a single train/test split is noisy, because the test set is just one small random sample and your score depends on which rows happened to land in it. Cross-validation removes that luck by testing on every row, then averaging.

k-fold cross-validation

Split the data into k equal folds. Hold out one fold as the validation set, train on the other k−1, and record the score. Then rotate: each fold gets a turn as the validation set exactly once. You train k separate models and average their k scores for one stable estimate (plus a standard deviation for spread).

5-fold CV → 5 rounds, 5 modelsRound 1Round 2Round 3Round 4Round 5valtrain
Each fold (highlighted) is the validation set exactly once; k folds means k models, averaged into one score.

Drag k and watch the folds rotate — notice how every row eventually gets validated:

Leave-one-out (LOO): the extreme case

Push k all the way up to n, the number of training samples. Now each fold holds exactly one sample: you train on n−1 rows and validate on the single left-out row, then repeat for every row. So the number of folds — and the number of models you train — equals the number of training samples. LOO uses the most data per model (only one row held out), but it pays for that by fitting n models.

Stratified cross-validation

Plain k-fold shuffles rows blindly. On imbalanced data — say 5% positives — a random fold might end up with zero positive examples, making its score meaningless. Stratified CV fixes this by preserving the class proportions in every fold: if the full set is 5% positive, each fold is held to ~5% positive too. For classification, especially imbalanced classification, stratified k-fold is the default.

How GATE asks this

The signature NAT hands you a dataset size and a CV scheme and asks for the number of iterations (models). The trap is the held-out test set: GATE DA 2026 gave 1000 samples with 100 held out as a test set, then asked how many iterations LOOCV runs on the remainder. LOO trains one model per training sample, and there are 1000 − 100 = 900 of those — so the answer is 900, not 1000. The MCQ variant describes a 5% positive dataset and asks which scheme to use and what to report: stratified CV, scored with AUC rather than plain accuracy.

Worked example

A dataset has 1000 samples. You set aside 100 as a held-out test set and run leave-one-out cross-validation on the rest. How many iterations does LOOCV run?

Training set size after the holdout: 1000 − 100 = 900. LOO sets k = n, training one model per training sample. So it runs 900 iterations — one model per left-out row. Were you instead to use plain 5-fold CV on those 900, you would train only 5 models, each validating on 900 / 5 = 180 rows.

Quick check

Quick check

0/6
Q1You have 1000 samples, hold out 100 as a test set, and run LOOCV on the remainder. How many models (iterations) does LOOCV train?numerical answer — type a number
Q2Using plain 5-fold cross-validation on a training set of 800 samples, how many models are trained, in total, to produce the cross-validated score?numerical answer — type a number
Q3A 5-fold CV reports accuracy scores [0.80, 0.82, 0.78, 0.81, 0.79] across its folds. What single number best summarises the model's performance?numerical answer — type a number
Q4When is STRATIFIED k-fold cross-validation the right choice? (select all that apply)select all that apply
Q5Why is a single 80/20 train/test split a poor way to compare two models?
Q6A medical dataset is 5% positive. Which combination is the safest evaluation choice?

Practice this in an interview

All questions
What is k-fold cross-validation and when should you use it over a single train/validation split?

K-fold CV partitions data into k equal folds, trains on k-1 and validates on the remaining fold k times, then averages the k scores. It gives a lower-variance estimate of generalization error than a single split and is preferred when the dataset is small enough that a single held-out set would be too noisy or wasteful.

What is stratified k-fold cross-validation and when is it necessary?

Stratified k-fold ensures each fold has the same class-label proportions as the full dataset. It is necessary for imbalanced classification because standard random k-fold can produce folds where a minority class is entirely absent, making per-fold metrics undefined or severely misleading.

Why can't you use standard k-fold cross-validation on time-series data, and what should you use instead?

Standard k-fold randomly shuffles data, so a validation fold can contain timestamps earlier than the training fold — training on the future to predict the past. Time-series CV uses walk-forward (expanding-window or sliding-window) splits that always validate on data strictly after the training window.

Why do we split data into train, validation, and test sets, and what are the typical proportions?

The train set fits the model, the validation set tunes hyperparameters and guides model selection, and the held-out test set provides an unbiased estimate of final generalization error. Using the test set during development causes optimistic bias because the evaluation signal leaks into decisions.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content