datarekha

Overfitting & bias–variance

A model that nails the training data but fails on new data has memorized noise. The bias–variance tradeoff, the train/validation curve, and early stopping — the diagnosis behind every regularization fix.

7 min read Beginner Deep Learning Lesson 13 of 27

What you'll learn

  • What bias and variance are, and why capacity trades one for the other
  • How to read the train/validation curve to catch overfitting
  • Early stopping, and why diagnosis must come before regularization

Before you start

A model that gets 100% on your training data and 60% on new data hasn’t learned — it has memorized. The whole point of training is to do well on data you’ve never seen, and the gap between training performance and real performance is the central tension in all of machine learning. Before you reach for the fixes (dropout, BatchNorm, more data), you need to diagnose what’s going wrong. That diagnosis is the bias–variance tradeoff.

Bias and variance

  • Bias is error from a model too simple to capture the real pattern. A straight line trying to fit a curve is high-bias — it’s wrong in a consistent, systematic way. This is underfitting.
  • Variance is error from a model so flexible it fits the noise in the training set, not just the signal. Resample the data and it learns something wildly different. This is overfitting.

Capacity (how flexible the model is) trades one for the other: too little → high bias; too much → high variance. The art is landing in between. Slide the capacity below and watch both the fit and the error curves respond:

The crucial pattern in that second chart: training error always falls as you add capacity, but test error falls and then rises. The minimum of the test curve is the sweet spot. If you only watched training error, you’d happily crank capacity straight into overfitting and never know.

The train/validation curve

That’s why you never train without a held-out validation set. You train on one split and watch the loss on the other. The shape tells you everything:

losstraining time (epochs) →trainvalidationearly stopgap = overfitting
Validation loss bottoms out, then rises as the model starts memorizing. Stop at the minimum.
  • Both high, still falling → underfitting. Train longer / bigger model.
  • Train low, validation low, small gap → good fit. Ship it.
  • Train low, validation rising, growing gap → overfitting. The model is memorizing the training set.

Early stopping

The cheapest regularizer in existence: stop training when validation loss stops improving. Keep a copy of the best-so-far weights; if validation hasn’t improved for patience epochs, halt and restore them.

Quick check

Quick check

0/3
Q1Your model has very low training error but high validation error. What's happening?
Q2As you increase model capacity, how do training and test error typically behave?
Q3When does adding regularization (dropout, weight decay) HURT rather than help?

Next

Now that you can diagnose overfitting, here are the treatments: dropout, BatchNorm, and LayerNorm. And to make every epoch count, see how batch size and learning rate shape the optimization itself.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Practice this in an interview

All questions
Explain the bias-variance tradeoff and how it relates to overfitting.

Bias is error from overly simple assumptions (underfitting) and variance is error from sensitivity to training-data noise (overfitting); reducing one often increases the other. An overfit model has low bias but high variance, so techniques like regularization, more data, and simpler models trade a little bias for a large reduction in variance.

What is the bias–variance tradeoff?

A model's expected test error splits into bias (error from over-simplified assumptions, causing underfitting), variance (sensitivity to the particular training sample, causing overfitting), and irreducible noise. Adding complexity lowers bias but raises variance, so the best model minimises their sum on unseen data — not the training error.

Explain the bias-variance tradeoff and how you'd diagnose which one you have.

Bias is error from oversimplifying assumptions (underfitting); variance is error from sensitivity to the training set (overfitting). Total error decomposes into bias squared, variance, and irreducible noise, and reducing one often increases the other. You diagnose by comparing training and validation error: high error on both means high bias, while a large gap (low train, high validation) means high variance.

What are overfitting and underfitting, and how do you fix each?

Overfitting occurs when a model memorizes training noise and fails to generalize; underfitting occurs when the model is too simple to capture the true signal. Fixes differ: overfitting requires regularization, more data, or reduced complexity; underfitting requires a more expressive model or better features.

Related lessons

Explore further

Skip to content