Explain the bias-variance tradeoff and how you'd diagnose which one you have.

Bias is error from oversimplifying assumptions (underfitting); variance is error from sensitivity to the training set (overfitting). Total error decomposes into bias squared, variance, and irreducible noise, and reducing one often increases the other. You diagnose by comparing training and validation error: high error on both means high bias, while a large gap (low train, high validation) means high variance.

How do L1 and L2 regularization affect bias and variance, and when would you pick one over the other?

Both L1 and L2 add a penalty on coefficient size that increases bias slightly but reduces variance, combating overfitting. L2 (ridge) shrinks all coefficients smoothly and handles correlated features well; L1 (lasso) drives some coefficients exactly to zero, performing feature selection. Choose L1 when you want sparsity and interpretability, L2 when you want stability, and elastic net to get both.

What is the bias–variance tradeoff?

A model's expected test error splits into bias (error from over-simplified assumptions, causing underfitting), variance (sensitivity to the particular training sample, causing overfitting), and irreducible noise. Adding complexity lowers bias but raises variance, so the best model minimises their sum on unseen data — not the training error.

Explain the bias-variance tradeoff and how it relates to overfitting.

Bias is error from overly simple assumptions (underfitting) and variance is error from sensitivity to training-data noise (overfitting); reducing one often increases the other. An overfit model has low bias but high variance, so techniques like regularization, more data, and simpler models trade a little bias for a large reduction in variance.

Bias–variance & learning curves — Machine Learning

If you learn one diagnostic in classical ML, make it this one. When a model underperforms, you have a fork in the road: get more data, add capacity, or add regularization — and choosing wrong wastes weeks. The bias–variance framework, read off a learning curve, tells you which way to go.

TryBias–variance · model capacity

Slide the capacity — find the sweet spot

The same 12 noisy points, fit by a polynomial of degree 4. Low degree can't bend enough (underfit); high degree wiggles to chase the noise (overfit). Watch the train error keep falling while the test error makes a U.

fit (solid) vs true function (dashed)

train test · error vs degree

polynomial degree 4

train 0.022 · test 0.081. Sweet spot — near the minimum of the test curve (degree 4). Enough capacity to fit the signal, not the noise.

Two ways to be wrong

Total error decomposes into two sources you trade off against each other:

Bias — error from a model too simple to capture the real pattern. A line fitting a curve is high-bias. It’s wrong in a consistent, systematic way. This is underfitting.
Variance — error from a model so flexible it fits the noise in the training set. Resample the data and it learns something wildly different. This is overfitting.

Model capacity trades one for the other: too little → high bias; too much → high variance. (This is why L1/L2 regularization exists — it deliberately adds a little bias to cut a lot of variance.)

The learning curve tells you which one you have

Plotting train and validation error against capacity gives the classic U-curve. But the more actionable view plots them against training-set size — the learning curve. Its shape is a direct diagnosis:

The learning curve’s shape is the diagnosis: both errors high (bias), both low (good), or a big train–validation gap (variance).

Read it like this:

Learning curve shape	Diagnosis	What to do
Both errors high, converged, small gap	High bias (underfit)	More capacity / features. More data won’t help.
Both errors low, small gap	Good fit	Ship it.
Big gap, low train + high validation, gap still shrinking	High variance (overfit)	More data (the gap is closing), or regularize / simplify.

That middle column is the payoff: a high-bias model and a high-variance model both have “bad validation error,” but the fix is the opposite. The learning curve is how you tell them apart instead of guessing.

import numpy as np
from sklearn.model_selection import learning_curve
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1500, n_features=20, n_informative=6, random_state=0)

# A deep tree = high variance. Watch the gap between train and CV error.
sizes, train_scores, val_scores = learning_curve(
    DecisionTreeClassifier(max_depth=None, random_state=0),
    X, y, train_sizes=np.linspace(0.1, 1.0, 6), cv=5, scoring="accuracy",
)
train_err = 1 - train_scores.mean(axis=1)
val_err   = 1 - val_scores.mean(axis=1)

print(f"{'n_train':>8} {'train_err':>10} {'val_err':>9} {'gap':>7}")
for n, te, ve in zip(sizes, train_err, val_err):
    print(f"{int(n):8d} {te:10.3f} {ve:9.3f} {ve-te:7.3f}")
print("\nTrain error ~0 but a large, slowly-shrinking gap = high variance.")
print("Fix: limit max_depth, or add more data (the gap is still closing).")

In one breath

Total error splits into bias (too simple → systematic error → underfit) and variance (too flexible → fits noise → overfit); model capacity trades one against the other.
The learning curve (train + validation error vs training-set size) is the actionable diagnostic — its shape is the diagnosis.
Both errors high, small gap = high bias → add capacity/features (more data won’t help). Both low, small gap = good fit → ship. Big gap (low train, high val) = high variance → more data if the gap is still closing, or regularize/simplify.
The most expensive mistake is throwing more data at a high-bias model — it lands in the same place; read the curve first.
This is why regularization exists: deliberately add a little bias to cut a lot of variance.

Quick check

0/3

Q1Your learning curve shows training and validation error both flat at ~30%, with almost no gap between them. What's the diagnosis and fix?

Q2A different model shows ~2% training error but ~22% validation error, and the gap is still shrinking as you add data. Diagnosis?

Q3Why does the bias–variance framework explain why regularization exists?

Once you can diagnose the fit, the treatments follow: regularization for high variance, richer feature engineering for high bias, and honest cross-validation to measure it all.

Bias–variance & learning curves

What you'll learn

Before you start

Slide the capacity — find the sweet spot

Two ways to be wrong

The learning curve tells you which one you have

In one breath

Quick check

Quick check

Next

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further