Backtracking
How to search a combinatorial space by building solutions step-by-step, pruning dead branches early — the engine behind N-Queens, Sudoku, and constraint satisfaction.
What you'll learn
- What backtracking is: DFS over a decision tree with undo on failure
- How the choose-explore-unchoose template applies to every backtracking problem
- Why pruning is the difference between tractable and exponential blowup
- How constraint satisfaction and combinatorial search appear in data and ML workflows
Before you start
Backtracking is the idea that you can build a solution one piece at a time, and whenever a partial solution cannot possibly lead to a valid complete solution, you stop and try something else.
It is depth-first search over a decision tree — a tree where each level represents one choice to make and each branch represents one option — where each node represents a partial state, each edge represents one choice, and the leaves are either valid complete solutions or dead ends. The “backtrack” step is simply undoing the last choice and moving to the next sibling edge.
The decision tree picture
Suppose you want all subsets of {A, B, C}. At each position you make a binary choice — include the element or skip it.
[]
/ \
[A] []
/ \ / \
[A,B] [A] [B] []
/ \ / \ / \ / \
[A,B,C][A,B][A,C][A] [B,C][B] [C] []
Traversing this tree depth-first and collecting the leaves gives all 8 subsets. For permutations the branching factor is larger, but the structure is identical.
The key observation: if you can detect that a partial path is already invalid — a queen is already under attack, a Sudoku row already has a repeat — you prune the entire subtree rooted at that node and never recurse into it.
The universal template
Every backtracking algorithm follows the same three-line loop:
def backtrack(state, choices):
if is_complete(state):
record(state)
return
for choice in choices:
if is_valid(state, choice): # prune before recursing
make_choice(state, choice) # choose
backtrack(state, next_choices(state, choice)) # explore
undo_choice(state, choice) # un-choose
The undo_choice step is what makes it backtracking rather than plain recursion — you restore state to exactly what it was before, so the next iteration of the loop starts from a clean slate.
Classic examples
Subsets and permutations
Generating all subsets of [1, 2, 3] is the simplest case: the validity check is always true (any partial subset is valid), so there is no pruning — the work is genuinely O(2ⁿ). Permutations are similar but require tracking which elements have already been used; the validity check is “not already in the current path”, which prunes the obviously duplicate branches.
N-Queens
Place n queens on an n×n board so that no two queens share a row, column, or diagonal. The decision tree has n levels (one per row) and n branches per level (one per column). Without pruning you would examine nⁿ configurations. With pruning — reject a column choice the moment it conflicts with any already-placed queen — the search space shrinks dramatically. For n = 8 there are only 92 solutions out of 8⁸ = 16,777,216 possible placements; pruning makes this explorable in milliseconds.
Sudoku
Fill a 9×9 grid so every row, column, and 3×3 box contains each digit exactly once. The validity check is “this digit does not already appear in this row, column, or box”. Because each cell constrains its neighbors, good pruning propagation (trying only legal digits for each empty cell) reduces the effective branching factor from 9 to 1–2 on typical puzzles.
Word search
Given a grid of letters and a target word, find whether the word exists as a connected path. At each cell you try all four neighbors; the validity check is “in-bounds, not yet visited, and the next letter matches”. You mark cells visited before recursing (choose) and unmark them after (un-choose).
Run it: all subsets via backtracking
Notice that the number of subsets is exactly 2ⁿ — there is no pruning here because every partial subset is valid. The loop variable start is the structural pruning that avoids generating duplicate subsets in different orders.
N-Queens counter
The N-Queens problem is where pruning matters. Run this to see how many solutions exist for each board size:
For n = 6 there are 4 solutions; for n = 8 there are 92. The three sets (cols, diag1, diag2) give O(1) conflict checks, keeping each placement decision fast.
Complexity and pruning’s impact
Worst-case backtracking is still exponential — you cannot escape that when the solution space itself is exponential. What pruning changes is the effective branching factor: instead of exploring every branch at every level, you skip entire subtrees the moment a constraint is violated.
For N-Queens with n = 12, the naive O(nⁿ) search would examine ~8.9 × 10¹² configurations. With pruning, the actual nodes visited number in the millions — a factor of millions improvement without changing the worst-case class.
This is the practical insight: write the validity check to be as tight as possible, and check it as early as possible — before recursing, not after.
Connection to data science and ML
Backtracking appears in data work in a few concrete ways:
Constraint satisfaction. Assigning ML pipeline stages, scheduling jobs under resource limits, and allocating feature groups to model slots are all constraint satisfaction problems. Backtracking (or its industrial cousin, branch-and-bound) is the standard exact solver.
Combinatorial feature search. Exhaustive feature subset selection is the subset enumeration problem. Practitioners almost never run it to completion, but the pruning intuition — “if this subset already scores worse than the current best, skip all supersets” — is exactly backtracking logic. Branch-and-bound hyperparameter search uses the same idea.
Search in planning and config spaces. When an agent must plan a sequence of actions or explore a configuration space, the decision tree is the state space. Backtracking (with memoization added) becomes dynamic programming; with a cost function it becomes A*.
Quick check
Quick check
Practice this in an interview
All questionsAt each step of a recursive walk through the input, you make a binary choice: include the current element or skip it. Recording the current path at every node of the recursion tree (not just the leaves) collects all 2^n subsets. A `start` index prevents duplicates by ensuring elements are only considered left-to-right.
Use backtracking with a running total. At each step, try adding a candidate to the current path. If the total equals the target, record the path. If it exceeds the target, prune. Passing the same start index (not i+1) back into the recursion allows unlimited reuse of the same element.
At each recursion level, swap one of the remaining (unused) elements into the current position, recurse to fill the rest, then swap back to restore the state. Alternatively, track a 'used' set and build the permutation in a separate path list. The result is all n! orderings.
Binary search halves the search space each iteration to find a target in O(log n). The tricky part is not the idea but the boundary conditions: closed vs. half-open intervals, how to update lo/hi, and when to use lo < hi vs. lo <= hi. One clean template eliminates all the classic bugs.