datarekha
Machine Learning Medium Asked at GoogleAsked at MetaAsked at AmazonAsked at MicrosoftAsked at Spotify

When should you use grid search vs random search vs Bayesian optimisation for hyperparameter tuning?

The short answer

Grid search exhaustively tries every combination in a predefined grid, which is only practical for 1–2 hyperparameters. Random search samples combinations uniformly at random and finds good values faster per compute budget, especially when only a few hyperparameters actually matter. Bayesian optimisation fits a surrogate model of the objective and proposes the next trial intelligently, giving the best sample efficiency for expensive evaluations.

How to think about it

The right choice depends on how many hyperparameters matter, how cheap each evaluation is, and whether you have a compute budget or a time budget.

Tests every combination on a discrete grid. With 5 hyperparameters each having 5 values, that is 5^5 = 3,125 CV evaluations. Combinatorial explosion makes it impractical beyond 2–3 hyperparameters, and it wastes budget testing values of irrelevant hyperparameters.

Samples hyperparameter combinations uniformly at random. A key insight from Bergstra & Bengio (2012): if only 2 of 20 hyperparameters matter, grid search wastes budget repeating the same values of the 18 irrelevant ones, while random search effectively covers the 2 that matter with every trial. In practice, random search matches or beats grid search with far fewer evaluations.

Bayesian optimisation

Maintains a surrogate model (Gaussian Process or Tree-structured Parzen Estimator) of hyperparameter → CV score. At each step it uses an acquisition function (e.g., expected improvement) to select the next point that trades off exploration (uncertain regions) and exploitation (near-optimal regions). This makes it the most sample-efficient method — critical when each evaluation takes hours (deep learning, large ensembles).

# Grid search (small grids only)
from sklearn.model_selection import GridSearchCV
gs = GridSearchCV(estimator, {"C": [0.01, 0.1, 1, 10], "gamma": ["scale", "auto"]}, cv=5)

# Random search (budget-constrained)
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import loguniform
rs = RandomizedSearchCV(estimator, {"C": loguniform(1e-3, 1e3)}, n_iter=60, cv=5, random_state=42)

# Bayesian optimisation (expensive evaluations)
from skopt import BayesSearchCV
from skopt.space import Real
bs = BayesSearchCV(estimator, {"C": Real(1e-3, 1e3, prior="log-uniform")}, n_iter=40, cv=5)

When to use each

MethodBest when
Grid search≤ 2 hyperparameters, small discrete ranges
Random search3+ hyperparameters, compute budget matters
Bayesian optimisationEach CV fold is expensive (minutes–hours)
Learn it properly Hyperparameter tuning

Keep practising

All Machine Learning questions

Explore further

Skip to content