datarekha

Greedy Algorithms

How to solve optimization problems by always picking the best available option right now — and how to know when that local optimism is provably correct versus dangerously wrong.

6 min read Intermediate Data Structures & Algorithms Lesson 26 of 32

What you'll learn

  • What makes a strategy greedy: one irrevocable locally optimal choice per step
  • When greedy is provably correct: interval scheduling and canonical coin systems
  • How to construct a counterexample to catch a wrong greedy (coins {1,3,4}, target 6)
  • Why DP is the fallback when greedy fails, and how to tell the difference

Before you start

A greedy algorithm makes the locally optimal choice at each step and never looks back.

There is no reconsideration, no backtracking, no storing of what-ifs. You pick the best-looking option available right now, move forward, and repeat. That simplicity is exactly what makes greedy algorithms fast — and exactly what makes them dangerous to trust without proof.

The core idea

Every greedy algorithm follows the same skeleton:

  1. Define what “best available option right now” means for your problem.
  2. Pick it.
  3. Reduce the problem to a smaller version of itself.
  4. Repeat until done.

The hard part is step 1. Defining the right local criterion is everything. Pick the wrong one and you will get an answer that looks reasonable but is not optimal.

When greedy is correct

Interval scheduling (activity selection)

You have a list of activities, each with a start and finish time. You want to schedule the maximum number of non-overlapping activities in a single room.

The greedy rule: always pick the activity that finishes earliest among those that do not conflict with what you have already scheduled.

This is provably optimal. Intuitively: by finishing as early as possible, you leave the most remaining time open for future activities. Any other first pick either starts later (same argument applies) or finishes later (which can only leave less room). A formal exchange argument proves no other schedule can include more activities.

Canonical coin systems

In the US coin system (1, 5, 10, 25 cents), making change by always picking the largest coin that fits is optimal. Most national currencies are designed so that this greedy property holds.

Huffman coding

Huffman encoding — used inside DEFLATE, PNG, and most compression formats — builds a variable-length prefix code by greedily merging the two lowest-frequency symbols at each step. The resulting tree is provably the minimum-entropy prefix code.

When greedy fails

The coin change trap

Consider coins {1, 3, 4} and a target of 6.

Greedy (largest-coin-first): pick 4, remaining 2, pick 1, remaining 1, pick 1. Total: 3 coins.

Optimal: pick 3, remaining 3, pick 3. Total: 2 coins.

The greedy is wrong. The problem is that taking the locally largest coin cuts off a globally better path. Dynamic programming finds the correct answer by considering all subproblems — it does not commit to a choice without knowing the downstream consequences.

The US coin system avoids this failure because it is a canonical coin system — a mathematically verified set where the greedy-choice property holds for every target amount. Arbitrary coin sets do not share this property, and the only safe way to check is proof or exhaustive testing.

Contrast with dynamic programming

GreedyDynamic programming
ChoicesOne irrevocable pick per stepExplores all sub-choices
ReconsiderationNeverAlways
SpeedUsually O(n log n) or O(n)Usually O(n²) or O(n·k)
CorrectnessOnly for problems with the greedy-choice propertyCorrect for any problem with overlapping subproblems

Use greedy when you can prove (or cite a known proof) that the local criterion yields a globally optimal solution. Use DP otherwise.

Activity selection in practice

The following code implements activity selection by earliest finish time. It sorts activities by finish time and greedily picks each one that starts at or after the last selected activity ends.

The assertion at the end confirms correctness structurally. For this classic eight-activity input the greedy finds all four non-overlapping slots; no algorithm can do better.

Run the greedy choice and compare it with the naive one:

Greedy in data science and ML

Greedy approximations are pervasive in applied machine learning:

Feature selection heuristics — forward selection adds one feature at a time, always picking the feature that most improves the current model score. It is greedy: it never removes a previously added feature.

Decision tree splits — at each node, CART and ID3 pick the single split that maximizes information gain or Gini reduction right now. The resulting tree is not globally optimal, but the greedy construction is tractable and usually good.

Beam search — instead of exploring all decoding paths in sequence models, beam search keeps only the top-k candidates at each step. It is a bounded greedy search. Large language models use it at inference time.

In each case the greedy approximation trades provable optimality for speed, and practitioners accept that trade because exhaustive search is infeasible at scale.

Quick check

Quick check

0/3
Q1You have coins {1, 5, 10, 25} and need to make 41 cents. The greedy (largest-coin-first) gives 25+10+5+1 = 4 coins. Is this optimal?
Q2For coins {1, 3, 4} and target 6, the greedy (largest-coin-first) returns 3 coins. Why is this not optimal?
Q3In the activity-selection problem, why is 'pick the earliest-finishing compatible activity' the correct greedy rule — rather than, say, 'pick the shortest activity'?

Practice this in an interview

All questions
What does the No Free Lunch theorem state, and what are its practical implications for choosing algorithms?

The NFL theorem proves that averaged over all possible data distributions, no learning algorithm outperforms any other — including random guessing. In practice it means there is no universally best algorithm; the right choice depends on inductive biases that match the actual problem structure.

When should you optimize precision and when should you optimize recall?

Optimize precision when a false positive is costly — spam filters, ad targeting, legal evidence — because you'd rather miss some positives than act on wrong ones. Optimize recall when a false negative is costly — cancer screening, fraud detection, safety systems — because missing a true positive can be catastrophic. The business cost of each error type should drive the choice, not the metric itself.

You are a robber. Given an array of house values, find the maximum money you can rob without robbing two adjacent houses.

At each house you make one choice: rob it (and skip the previous) or skip it (and carry forward whatever you had). dp[i] = max(dp[i-1], dp[i-2] + nums[i]). Since you only look back two steps, two variables replace the full array, giving O(n) time and O(1) space.

When should you use grid search vs random search vs Bayesian optimisation for hyperparameter tuning?

Grid search exhaustively tries every combination in a predefined grid, which is only practical for 1–2 hyperparameters. Random search samples combinations uniformly at random and finds good values faster per compute budget, especially when only a few hyperparameters actually matter. Bayesian optimisation fits a surrogate model of the objective and proposes the next trial intelligently, giving the best sample efficiency for expensive evaluations.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content