Deep Learning Easy Asked at GoogleAsked at AmazonAsked at Microsoft

What is early stopping, and how does it prevent overfitting?

For Data Scientist ML Engineer AI / LLM Engineer

The short answer

Early stopping monitors validation loss after each epoch and halts training when it has not improved for a set number of epochs (the patience). It prevents the model from memorising training data past the point of best generalisation, acting as a free regulariser that requires no change to the model or loss function.

How to think about it

Early stopping is the simplest regularisation technique: stop training at the checkpoint where held-out performance is best, rather than running for a fixed number of epochs.

Why training and validation loss diverge

After enough epochs, the model learns patterns specific to the training set that do not generalise. Training loss continues to fall, but validation loss plateaus then rises. The gap between the two curves is a direct measure of overfitting.

Save the model at the validation-loss minimum; restore it when patience runs out.

Implementation

best_val_loss = float("inf")
patience      = 5
strikes       = 0

for epoch in range(max_epochs):
    train_one_epoch(model, train_loader, optimizer)
    val_loss = evaluate(model, val_loader)

    if val_loss < best_val_loss:
        best_val_loss = val_loss
        torch.save(model.state_dict(), "best_checkpoint.pt")
        strikes = 0
    else:
        strikes += 1
        if strikes >= patience:
            print(f"Early stopping at epoch {epoch}")
            break

# Restore the best weights
model.load_state_dict(torch.load("best_checkpoint.pt"))

Patience tuning

Too small a patience stops training during a transient spike in validation loss. Too large and you waste compute. A patience of 5–20 epochs is typical; for learning-rate schedules with warmup, set patience after the warmup period ends.

Learn it properly Dropout, BN, LN

What is early stopping, and how does it prevent overfitting?

Why training and validation loss diverge

Implementation

Patience tuning

Keep practising

Explore further