What is the bias–variance tradeoff?
A model's expected test error splits into bias (error from over-simplified assumptions, causing underfitting), variance (sensitivity to the particular training sample, causing overfitting), and irreducible noise. Adding complexity lowers bias but raises variance, so the best model minimises their sum on unseen data — not the training error.
How to think about it
Give the decomposition, then the picture, then how you’d diagnose and fix it — interviewers want the last part most, because it shows you’ve debugged a real model.
The decomposition
For a given test point, expected squared error breaks into three pieces:
Bias is error from the model being too simple to capture the true signal — wrong assumptions, like fitting a straight line to a curve. High bias → underfitting: the model is wrong even on the training data.
Variance is how much the fitted model changes if you swap in a different training sample of the same size. A deep tree memorises its sample, so a new sample gives a very different tree. High variance → overfitting: great on training data, poor on new data.
Irreducible noise is the floor — randomness in the labels that no model can explain. You can’t beat it; you can only avoid adding to the other two.
The picture
How to diagnose it
Compare training error and validation error:
- High training error and high validation error → high bias. The model can’t even fit what it’s seen.
- Low training error but a large gap to validation error → high variance. It memorised the sample.
How to fix each
| If you have… | Lower it by… |
|---|---|
| High bias (underfit) | A more expressive model, better features, less regularization, training longer |
| High variance (overfit) | More data, stronger regularization, fewer features, simpler model, bagging/early stopping |