datarekha

Bagging, boosting & stacking

Why a committee of models beats any single one. The unifying theory behind random forests and XGBoost — bagging cuts variance, boosting cuts bias, and stacking blends diverse models to win.

7 min read Intermediate Machine Learning Lesson 14 of 33

What you'll learn

  • Why ensembles win — diverse models with uncorrelated errors cancel out
  • Bagging (parallel, cuts variance) vs boosting (sequential, cuts bias)
  • Stacking and voting — blending different model families

Before you start

If you’ve wondered why random forests and XGBoost dominate tabular ML, here’s the unifying answer: ensembles. A committee of models, combined well, beats any single one — and nearly every winning Kaggle solution is an ensemble. This lesson is the theory that ties the tree methods together.

Why a committee wins

The intuition is the wisdom of crowds: if you average many models that each make different, uncorrelated errors, the errors cancel and the consensus is more accurate than any individual. The crucial word is uncorrelated — ten copies of the same model add nothing. Diversity is the whole game. Ensembles work precisely to the degree their members are wrong in different ways.

Bagging — parallel, cuts variance

Bagging (bootstrap aggregating) trains many models in parallel, each on a different bootstrap sample (a random resample of the data, with replacement), then averages them. Because each model sees a slightly different dataset, they make different errors — and averaging cancels the noise, sharply reducing variance. Resample the data and watch how each bootstrap differs:

A random forest is exactly this: bag decision trees, and also randomize the features each split considers, which decorrelates the trees even more.

Boosting — sequential, cuts bias

Boosting flips the idea: train models one after another, each new one focused on the examples the ensemble got wrong so far. Instead of averaging independent models, it builds an additive sequence that keeps correcting its own mistakes — which reduces bias and produces the extremely accurate models you saw in XGBoost.

Bagging (parallel)→ average → cuts varianceBoosting (sequential)each fixes the last → cuts biasStackingmetameta-model blends them
Three ways to combine models: bag in parallel, boost in sequence, or stack with a meta-learner.

Stacking & voting — blend different families

The third family combines different model types. Voting just averages (or majority-votes) their predictions. Stacking goes further: it trains a small meta-model on the base models’ predictions, learning how to weight them. Because a tree, a linear model, and a k-NN make very different errors, blending them often beats any one — which is why multi-level stacking routinely wins Kaggle competitions.

Quick check

Quick check

0/3
Q1What is the key requirement for an ensemble to outperform its members?
Q2What's the difference between bagging and boosting?
Q3What does stacking add over simple voting?

Next

That completes the supervised core. Next, evaluation done rigorously — feature selection and model selection with nested CV.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Practice this in an interview

All questions
Bagging vs boosting — how do they differ, and when does each help?

Bagging trains many independent models in parallel on bootstrap samples and averages them, which mainly reduces variance; boosting trains models sequentially so each corrects its predecessor's errors, which mainly reduces bias. Use bagging (e.g. random forests) when your base learner is high-variance and overfits; use boosting (e.g. gradient boosting) when you need to squeeze out bias and maximize accuracy, accepting more tuning and overfitting risk.

What is the difference between bagging and boosting, and what error component does each primarily reduce?

Bagging trains many independent models on bootstrap samples in parallel and averages their predictions, primarily reducing variance. Boosting trains models sequentially, each correcting the errors of its predecessor, primarily reducing bias.

When would you choose a random forest over gradient boosting (XGBoost/LightGBM), and vice versa?

Random forests are faster to train, easier to tune, robust to noisy features, and hard to overfit with more trees — making them a strong default baseline. Gradient boosting typically achieves higher accuracy on structured/tabular data, but requires careful tuning of learning rate, tree depth, and early stopping to avoid overfitting.

Random forest vs gradient boosting — which would you choose and why?

Random forest builds deep trees independently in parallel and averages them, making it robust, low-tuning, and resistant to overfitting; gradient boosting builds shallow trees sequentially to correct residual errors, usually achieving higher accuracy when carefully tuned. Choose random forest for a fast, stable baseline on noisy data, and gradient boosting when squeezing out maximum accuracy on tabular data is worth the tuning effort.

Related lessons

Explore further

Skip to content