Explain the bias-variance tradeoff and how it relates to overfitting.

Bias is error from overly simple assumptions (underfitting) and variance is error from sensitivity to training-data noise (overfitting); reducing one often increases the other. An overfit model has low bias but high variance, so techniques like regularization, more data, and simpler models trade a little bias for a large reduction in variance.

What is the bias–variance tradeoff?

A model's expected test error splits into bias (error from over-simplified assumptions, causing underfitting), variance (sensitivity to the particular training sample, causing overfitting), and irreducible noise. Adding complexity lowers bias but raises variance, so the best model minimises their sum on unseen data — not the training error.

Explain the bias-variance tradeoff and how you'd diagnose which one you have.

Bias is error from oversimplifying assumptions (underfitting); variance is error from sensitivity to the training set (overfitting). Total error decomposes into bias squared, variance, and irreducible noise, and reducing one often increases the other. You diagnose by comparing training and validation error: high error on both means high bias, while a large gap (low train, high validation) means high variance.

What are overfitting and underfitting, and how do you fix each?

Overfitting occurs when a model memorizes training noise and fails to generalize; underfitting occurs when the model is too simple to capture the true signal. Fixes differ: overfitting requires regularization, more data, or reduced complexity; underfitting requires a more expressive model or better features.

Overfitting & bias–variance — Deep Learning

A model that gets 100% on your training data and 60% on new data hasn’t learned — it has memorized. The whole point of training is to do well on data you’ve never seen, and the gap between training performance and real performance is the central tension in all of machine learning. Before you reach for the fixes (dropout, BatchNorm, more data), you need to diagnose what’s going wrong. That diagnosis is the bias–variance tradeoff.

Bias and variance

Bias is error from a model too simple to capture the real pattern. A straight line trying to fit a curve is high-bias — it’s wrong in a consistent, systematic way. This is underfitting.
Variance is error from a model so flexible it fits the noise in the training set, not just the signal. Resample the data and it learns something wildly different. This is overfitting.

Capacity (how flexible the model is) trades one for the other: too little → high bias; too much → high variance. The art is landing in between — at the capacity where the test error bottoms out:

The crucial pattern in that second chart: training error always falls as you add capacity, but test error falls and then rises. The minimum of the test curve is the sweet spot. If you only watched training error, you’d happily crank capacity straight into overfitting and never know.

The train/validation curve

That’s why you never train without a held-out validation set. You train on one split and watch the loss on the other. The shape tells you everything:

Validation loss bottoms out, then rises as the model starts memorizing. Stop at the minimum.

Both high, still falling → underfitting. Train longer / bigger model.
Train low, validation low, small gap → good fit. Ship it.
Train low, validation rising, growing gap → overfitting. The model is memorizing the training set.

Early stopping

The cheapest regularizer in existence: stop training when validation loss stops improving. Keep a copy of the best-so-far weights; if validation hasn’t improved for patience epochs, halt and restore them.

import numpy as np

# Simulated validation losses over 20 epochs: down, bottom, then creeping up.
val_loss = [0.92,0.71,0.55,0.44,0.38,0.34,0.31,0.30,0.305,0.31,
            0.32,0.34,0.37,0.40,0.43,0.45,0.48,0.50,0.52,0.55]

best, best_epoch, patience, wait = float("inf"), -1, 3, 0
for epoch, v in enumerate(val_loss):
    if v < best:
        best, best_epoch, wait = v, epoch, 0     # new best → save weights here
    else:
        wait += 1
        if wait >= patience:
            print(f"early stop at epoch {epoch} (no improvement for {patience})")
            break

print(f"best val loss {best:.3f} at epoch {best_epoch} — restore THOSE weights")

early stop at epoch 10 (no improvement for 3)
best val loss 0.300 at epoch 7 — restore THOSE weights

In one breath

A big gap between training and held-out performance means the model memorized noise — it overfit.
Bias is error from too-simple a model (underfitting, high train and test error); variance is error from too-flexible a model fitting noise (overfitting, low train but high test).
Capacity trades bias for variance: as you add it, training error always falls, but test error falls then rises — aim for the bottom of that U.
Always watch a held-out validation set; the train/validation curve diagnoses underfit (both high), good fit (both low, small gap), or overfit (train low, validation rising).
Early stopping is the cheapest regularizer: keep the best-so-far weights and halt when validation stops improving — and diagnose before you regularize, since regularizing an underfit model makes it worse.

Quick check

0/3

Q1Your model has very low training error but high validation error. What's happening?

Q2As you increase model capacity, how do training and test error typically behave?

Q3When does adding regularization (dropout, weight decay) HURT rather than help?

Now that you can diagnose overfitting, here are the treatments: dropout, BatchNorm, and LayerNorm. And to make every epoch count, see how batch size and learning rate shape the optimization itself.

Overfitting & bias–variance

What you'll learn

Before you start

Bias and variance

The train/validation curve

Early stopping

In one breath

Quick check

Quick check

Next

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further