datarekha

Bias–variance & learning curves

The single most useful diagnostic in ML: is your model too simple (bias) or too flexible (variance)? Read the learning curve to decide whether to get more data or change the model.

7 min read Beginner Machine Learning Lesson 5 of 33

What you'll learn

  • The bias–variance decomposition and what under/overfitting really mean
  • How to read a learning curve to diagnose bias vs variance
  • The decision it drives — more data, more capacity, or more regularization

Before you start

If you learn one diagnostic in classical ML, make it this one. When a model underperforms, you have a fork in the road: get more data, add capacity, or add regularization — and choosing wrong wastes weeks. The bias–variance framework, read off a learning curve, tells you which way to go.

Two ways to be wrong

Total error decomposes into two sources you trade off against each other:

  • Bias — error from a model too simple to capture the real pattern. A line fitting a curve is high-bias. It’s wrong in a consistent, systematic way. This is underfitting.
  • Variance — error from a model so flexible it fits the noise in the training set. Resample the data and it learns something wildly different. This is overfitting.

Model capacity trades one for the other: too little → high bias; too much → high variance. (This is why L1/L2 regularization exists — it deliberately adds a little bias to cut a lot of variance.)

The learning curve tells you which one you have

Plotting train and validation error against capacity gives the classic U-curve. But the more actionable view plots them against training-set size — the learning curve. Its shape is a direct diagnosis:

Read it like this:

Learning curve shapeDiagnosisWhat to do
Both errors high, converged, small gapHigh bias (underfit)More capacity / features. More data won’t help.
Both errors low, small gapGood fitShip it.
Big gap, low train + high validation, gap still shrinkingHigh variance (overfit)More data (the gap is closing), or regularize / simplify.

That middle column is the payoff: a high-bias model and a high-variance model both have “bad validation error,” but the fix is the opposite. The learning curve is how you tell them apart instead of guessing.

Quick check

Quick check

0/3
Q1Your learning curve shows training and validation error both flat at ~30%, with almost no gap between them. What's the diagnosis and fix?
Q2A different model shows ~2% training error but ~22% validation error, and the gap is still shrinking as you add data. Diagnosis?
Q3Why does the bias–variance framework explain why regularization exists?

Next

Once you can diagnose the fit, the treatments follow: regularization for high variance, richer feature engineering for high bias, and honest cross-validation to measure it all.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Practice this in an interview

All questions
Explain the bias-variance tradeoff and how you'd diagnose which one you have.

Bias is error from oversimplifying assumptions (underfitting); variance is error from sensitivity to the training set (overfitting). Total error decomposes into bias squared, variance, and irreducible noise, and reducing one often increases the other. You diagnose by comparing training and validation error: high error on both means high bias, while a large gap (low train, high validation) means high variance.

What is the bias–variance tradeoff?

A model's expected test error splits into bias (error from over-simplified assumptions, causing underfitting), variance (sensitivity to the particular training sample, causing overfitting), and irreducible noise. Adding complexity lowers bias but raises variance, so the best model minimises their sum on unseen data — not the training error.

Explain the bias-variance tradeoff and how it relates to overfitting.

Bias is error from overly simple assumptions (underfitting) and variance is error from sensitivity to training-data noise (overfitting); reducing one often increases the other. An overfit model has low bias but high variance, so techniques like regularization, more data, and simpler models trade a little bias for a large reduction in variance.

How do L1 and L2 regularization affect bias and variance, and when would you pick one over the other?

Both L1 and L2 add a penalty on coefficient size that increases bias slightly but reduces variance, combating overfitting. L2 (ridge) shrinks all coefficients smoothly and handles correlated features well; L1 (lasso) drives some coefficients exactly to zero, performing feature selection. Choose L1 when you want sparsity and interpretability, L2 when you want stability, and elastic net to get both.

Related lessons

Explore further

Skip to content