Explain the bias-variance tradeoff and how you'd diagnose which one you have.
Bias is error from oversimplifying assumptions (underfitting); variance is error from sensitivity to the training set (overfitting). Total error decomposes into bias squared, variance, and irreducible noise, and reducing one often increases the other. You diagnose by comparing training and validation error: high error on both means high bias, while a large gap (low train, high validation) means high variance.
How to think about it
The crisp answer
Expected prediction error decomposes into bias² + variance + irreducible noise. Bias is systematic error from a model too simple to capture the true relationship; variance is how much the fitted model would change if you trained it on a different sample. The tradeoff is that pushing model complexity down raises bias and lowers variance, and pushing it up does the reverse.
Why it happens
A linear model on nonlinear data has high bias — it underfits no matter how much data you give it. A deep tree or high-degree polynomial fits the training noise, so it has high variance and fails to generalize. As GeeksforGeeks describes the tradeoff, the goal is the sweet spot of complexity that minimizes total error, not either extreme.
How to diagnose
Plot training versus validation error (a learning curve):
- High bias: both train and validation error are high and close together. The model can’t even fit the training data.
- High variance: train error is low but validation error is much higher — a large gap.
Fixes for each
- High bias: add features or interactions, use a more expressive model, reduce regularization, train longer.
- High variance: get more data, add regularization (L1/L2, dropout), reduce complexity, use bagging, or do early stopping.
The common trap
Candidates conflate the symptom with the cause. More data fixes variance but barely helps bias. Also note the modern caveat: very overparameterized deep networks can show “double descent,” where test error drops again past the interpolation point — so the classic U-shaped curve is a guide, not a law. Expected follow-up: “Which does regularization target?” — variance, by constraining the effective model complexity.