datarekha

The Bias-Variance Trade-off

Total error splits into bias, variance, and irreducible noise. Reduce one and you usually raise the other — the conceptual backbone of the ML section.

8 min read Intermediate GATE DA Lesson 82 of 122

What you'll learn

  • Bias = error from over-simplifying (underfitting); variance = sensitivity to the training set (overfitting)
  • Total expected error ≈ bias² + variance + irreducible noise
  • The trade-off: lowering one term often raises the other
  • What lowers variance (more data, regularization, simpler models, bagging) vs what lowers bias
  • Reading the U-shaped error-vs-complexity curve

Before you start

Every supervised model makes errors for three different reasons, and separating them is the single most useful lens in machine learning. Bias is the error from a model that is too simple to capture the truth — it underfits. Variance is the error from a model so flexible that it chases the quirks of this particular training set — it overfits. The third piece, irreducible noise, is randomness in the data itself that no model can remove.

The catch is that bias and variance pull in opposite directions: make a model more flexible and its bias drops but its variance climbs. You cannot drive both to zero; you find the sweet spot.

The decomposition and the U-curve

Expected prediction error decomposes into three additive pieces:

Expected error  ≈  bias²  +  variance  +  irreducible noise

As model complexity grows, bias falls (the model can express more) while variance rises (it reacts more to the training sample). Their sum is U-shaped — the lowest point is the best-generalising model:

ErrorModel complexity →underfit (simple)overfit (flexible)bias²variancetotal errorsweet spot
Bias² falls and variance rises with complexity; their sum (total error) is U-shaped. The minimum is the best generalisation.

The handles you can pull, and which term they move:

  • Lowers variance: more training data, regularization (e.g. ridge’s λ), simpler models, bagging / averaging ensembles.
  • Lowers bias: more expressive models (higher-degree polynomials, deeper trees), adding informative features.

Almost every lever that cuts one term raises the other. That tension is the trade-off.

How GATE asks this

Reliably an MCQ or MSQ on the direction of an effect: “increasing model complexity does what to bias and variance?”, or “which of the following reduce variance?” The answer pattern is fixed — more complexity means lower bias, higher variance; regularization and more data mean lower variance, higher (or unchanged) bias. It also rides inside ridge questions (more λ → more bias, less variance) and overfitting/underfitting questions across nearly every paper.

Worked example

Compare two extreme models on the same data:

  • A high-degree polynomial can bend through almost every training point. It has low bias (flexible enough to capture the true shape) but high variance — shift the training set slightly and the wiggly curve changes drastically. This is the overfitting corner (right side of the U-curve).
  • A constant predictor (always outputs the mean) ignores the inputs entirely. It has high bias (it cannot represent any real structure) but low variance — it barely changes when the training set changes. This is the underfitting corner (left side).

Neither extreme generalises well. The best model sits in between, where the rising variance and falling bias sum to the smallest total error.

Now put numbers on the decomposition. Suppose a tuned model has bias = 0.2, variance = 0.05, and irreducible noise = 0.01. Then

total error = bias² + variance + noise
            = 0.2² + 0.05 + 0.01
            = 0.04 + 0.05 + 0.01
            = 0.10

so the expected error is 0.10. (Note it is bias², not bias, that enters the sum.)

Quick check

Quick check

0/6
Q1Which changes tend to REDUCE variance (often at the cost of higher bias)? (select all that apply)select all that apply
Q2A high-degree polynomial fit (relative to the true function) typically has…
Q3A constant predictor that always outputs the training mean has which profile?
Q4Given bias = 0.2, variance = 0.05, irreducible noise = 0.01, what is the total expected error (bias² + variance + noise)?numerical answer — type a number
Q5Which statements about the bias-variance trade-off are TRUE? (select all that apply)select all that apply
Q6Two models on the same data: A has bias² = 0.09 and variance = 0.02; B has bias² = 0.01 and variance = 0.12. With noise = 0.01 for both, which has the LOWER total error, and what is that lower total?

Practice this in an interview

All questions

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content