Bias–variance & learning curves
The single most useful diagnostic in ML: is your model too simple (bias) or too flexible (variance)? Read the learning curve to decide whether to get more data or change the model.
What you'll learn
- The bias–variance decomposition and what under/overfitting really mean
- How to read a learning curve to diagnose bias vs variance
- The decision it drives — more data, more capacity, or more regularization
Before you start
If you learn one diagnostic in classical ML, make it this one. When a model underperforms, you have a fork in the road: get more data, add capacity, or add regularization — and choosing wrong wastes weeks. The bias–variance framework, read off a learning curve, tells you which way to go.
Two ways to be wrong
Total error decomposes into two sources you trade off against each other:
- Bias — error from a model too simple to capture the real pattern. A line fitting a curve is high-bias. It’s wrong in a consistent, systematic way. This is underfitting.
- Variance — error from a model so flexible it fits the noise in the training set. Resample the data and it learns something wildly different. This is overfitting.
Model capacity trades one for the other: too little → high bias; too much → high variance. (This is why L1/L2 regularization exists — it deliberately adds a little bias to cut a lot of variance.)
The learning curve tells you which one you have
Plotting train and validation error against capacity gives the classic U-curve. But the more actionable view plots them against training-set size — the learning curve. Its shape is a direct diagnosis:
Read it like this:
| Learning curve shape | Diagnosis | What to do |
|---|---|---|
| Both errors high, converged, small gap | High bias (underfit) | More capacity / features. More data won’t help. |
| Both errors low, small gap | Good fit | Ship it. |
| Big gap, low train + high validation, gap still shrinking | High variance (overfit) | More data (the gap is closing), or regularize / simplify. |
That middle column is the payoff: a high-bias model and a high-variance model both have “bad validation error,” but the fix is the opposite. The learning curve is how you tell them apart instead of guessing.
Quick check
Quick check
Next
Once you can diagnose the fit, the treatments follow: regularization for high variance, richer feature engineering for high bias, and honest cross-validation to measure it all.
Practice this in an interview
All questionsBias is error from oversimplifying assumptions (underfitting); variance is error from sensitivity to the training set (overfitting). Total error decomposes into bias squared, variance, and irreducible noise, and reducing one often increases the other. You diagnose by comparing training and validation error: high error on both means high bias, while a large gap (low train, high validation) means high variance.
A model's expected test error splits into bias (error from over-simplified assumptions, causing underfitting), variance (sensitivity to the particular training sample, causing overfitting), and irreducible noise. Adding complexity lowers bias but raises variance, so the best model minimises their sum on unseen data — not the training error.
Bias is error from overly simple assumptions (underfitting) and variance is error from sensitivity to training-data noise (overfitting); reducing one often increases the other. An overfit model has low bias but high variance, so techniques like regularization, more data, and simpler models trade a little bias for a large reduction in variance.
Both L1 and L2 add a penalty on coefficient size that increases bias slightly but reduces variance, combating overfitting. L2 (ridge) shrinks all coefficients smoothly and handles correlated features well; L1 (lasso) drives some coefficients exactly to zero, performing feature selection. Choose L1 when you want sparsity and interpretability, L2 when you want stability, and elastic net to get both.