Greedy Algorithms
How to solve optimization problems by always picking the best available option right now — and how to know when that local optimism is provably correct versus dangerously wrong.
What you'll learn
- What makes a strategy greedy: one irrevocable locally optimal choice per step
- When greedy is provably correct: interval scheduling and canonical coin systems
- How to construct a counterexample to catch a wrong greedy (coins {1,3,4}, target 6)
- Why DP is the fallback when greedy fails, and how to tell the difference
Before you start
A greedy algorithm makes the locally optimal choice at each step and never looks back.
There is no reconsideration, no backtracking, no storing of what-ifs. You pick the best-looking option available right now, move forward, and repeat. That simplicity is exactly what makes greedy algorithms fast — and exactly what makes them dangerous to trust without proof.
The core idea
Every greedy algorithm follows the same skeleton:
- Define what “best available option right now” means for your problem.
- Pick it.
- Reduce the problem to a smaller version of itself.
- Repeat until done.
The hard part is step 1. Defining the right local criterion is everything. Pick the wrong one and you will get an answer that looks reasonable but is not optimal.
When greedy is correct
Interval scheduling (activity selection)
You have a list of activities, each with a start and finish time. You want to schedule the maximum number of non-overlapping activities in a single room.
The greedy rule: always pick the activity that finishes earliest among those that do not conflict with what you have already scheduled.
This is provably optimal. Intuitively: by finishing as early as possible, you leave the most remaining time open for future activities. Any other first pick either starts later (same argument applies) or finishes later (which can only leave less room). A formal exchange argument proves no other schedule can include more activities.
Canonical coin systems
In the US coin system (1, 5, 10, 25 cents), making change by always picking the largest coin that fits is optimal. Most national currencies are designed so that this greedy property holds.
Huffman coding
Huffman encoding — used inside DEFLATE, PNG, and most compression formats — builds a variable-length prefix code by greedily merging the two lowest-frequency symbols at each step. The resulting tree is provably the minimum-entropy prefix code.
When greedy fails
The coin change trap
Consider coins {1, 3, 4} and a target of 6.
Greedy (largest-coin-first): pick 4, remaining 2, pick 1, remaining 1, pick 1. Total: 3 coins.
Optimal: pick 3, remaining 3, pick 3. Total: 2 coins.
The greedy is wrong. The problem is that taking the locally largest coin cuts off a globally better path. Dynamic programming finds the correct answer by considering all subproblems — it does not commit to a choice without knowing the downstream consequences.
The US coin system avoids this failure because it is a canonical coin system — a mathematically verified set where the greedy-choice property holds for every target amount. Arbitrary coin sets do not share this property, and the only safe way to check is proof or exhaustive testing.
Contrast with dynamic programming
| Greedy | Dynamic programming | |
|---|---|---|
| Choices | One irrevocable pick per step | Explores all sub-choices |
| Reconsideration | Never | Always |
| Speed | Usually O(n log n) or O(n) | Usually O(n²) or O(n·k) |
| Correctness | Only for problems with the greedy-choice property | Correct for any problem with overlapping subproblems |
Use greedy when you can prove (or cite a known proof) that the local criterion yields a globally optimal solution. Use DP otherwise.
Activity selection in practice
The following code implements activity selection by earliest finish time. It sorts activities by finish time and greedily picks each one that starts at or after the last selected activity ends.
The assertion at the end confirms correctness structurally. For this classic eight-activity input the greedy finds all four non-overlapping slots; no algorithm can do better.
Run the greedy choice and compare it with the naive one:
Greedy in data science and ML
Greedy approximations are pervasive in applied machine learning:
Feature selection heuristics — forward selection adds one feature at a time, always picking the feature that most improves the current model score. It is greedy: it never removes a previously added feature.
Decision tree splits — at each node, CART and ID3 pick the single split that maximizes information gain or Gini reduction right now. The resulting tree is not globally optimal, but the greedy construction is tractable and usually good.
Beam search — instead of exploring all decoding paths in sequence models, beam search keeps only the top-k candidates at each step. It is a bounded greedy search. Large language models use it at inference time.
In each case the greedy approximation trades provable optimality for speed, and practitioners accept that trade because exhaustive search is infeasible at scale.
Quick check
Quick check
Practice this in an interview
All questionsThe NFL theorem proves that averaged over all possible data distributions, no learning algorithm outperforms any other — including random guessing. In practice it means there is no universally best algorithm; the right choice depends on inductive biases that match the actual problem structure.
Optimize precision when a false positive is costly — spam filters, ad targeting, legal evidence — because you'd rather miss some positives than act on wrong ones. Optimize recall when a false negative is costly — cancer screening, fraud detection, safety systems — because missing a true positive can be catastrophic. The business cost of each error type should drive the choice, not the metric itself.
At each house you make one choice: rob it (and skip the previous) or skip it (and carry forward whatever you had). dp[i] = max(dp[i-1], dp[i-2] + nums[i]). Since you only look back two steps, two variables replace the full array, giving O(n) time and O(1) space.
Grid search exhaustively tries every combination in a predefined grid, which is only practical for 1–2 hyperparameters. Random search samples combinations uniformly at random and finds good values faster per compute budget, especially when only a few hyperparameters actually matter. Bayesian optimisation fits a surrogate model of the objective and proposes the next trial intelligently, giving the best sample efficiency for expensive evaluations.