datarekha
Machine Learning Medium Asked at GoogleAsked at Uber

What is pruning in decision trees and when would you use pre-pruning versus post-pruning?

The short answer

Pruning removes splits that do not improve generalisation. Pre-pruning stops growth early via hyperparameters like max_depth or min_samples_leaf. Post-pruning (cost-complexity pruning) grows the full tree then collapses nodes whose removal does not hurt held-out accuracy enough.

How to think about it

An unpruned tree will grow until every leaf is pure, perfectly fitting training data but failing on new samples. Pruning is the primary regularisation tool for single decision trees.

Pre-pruning (early stopping)

Controlled entirely by hyperparameters evaluated before or during growth:

ParameterEffect
max_depthHard cap on tree height
min_samples_splitNode must have at least N samples to split
min_samples_leafEach resulting leaf must have at least N samples
min_impurity_decreaseSplit must improve impurity by at least this amount

Pre-pruning is fast but can stop too early — a split with zero immediate gain might still unlock a high-gain split one level deeper.

Post-pruning (cost-complexity / reduced-error pruning)

sklearn implements minimal cost-complexity pruning, which adds a penalty proportional to the number of leaves:

R_alpha(T) = R(T) + alpha * |leaves(T)|

As alpha increases, more leaves are collapsed. Cross-validate over alpha to find the right balance:

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
import numpy as np

clf = DecisionTreeClassifier(random_state=0)
path = clf.cost_complexity_pruning_path(X_train, y_train)
alphas = path.ccp_alphas

scores = [
    cross_val_score(
        DecisionTreeClassifier(ccp_alpha=a, random_state=0),
        X_train, y_train, cv=5
    ).mean()
    for a in alphas
]

best_alpha = alphas[np.argmax(scores)]
final_tree = DecisionTreeClassifier(ccp_alpha=best_alpha).fit(X_train, y_train)

Which to use? Pre-pruning is cheaper and sufficient for most ensemble contexts (trees in random forests are deliberately shallow). Post-pruning is better when you want a single interpretable tree that is as small as possible without sacrificing accuracy.

Learn it properly Decision trees

Keep practising

All Machine Learning questions

Explore further

Skip to content