A* Search
Add what a step actually cost to what the heuristic guesses is left, then always expand the smallest sum. That's A*.
What you'll learn
- A* expands the node with the lowest f(n) = g(n) + h(n) from a priority queue
- g(n) is cost-so-far from the start; h(n) is the heuristic guess of cost-to-go
- Admissible h means A* is optimal; consistent h means no node is expanded twice
- Hand-tracing the expansion order on a small weighted graph, with alphabetical tie-breaks
Before you start
Greedy search rushes toward the goal using only the heuristic’s guess. Uniform-cost search ignores the goal entirely and just minimises the cost so far. Both miss the trick. A* is the small idea that fuses them: at every step, expand the node whose total estimated path cost — what you’ve already paid plus what the heuristic thinks is left — is smallest.
That balance is why A* is the textbook search algorithm. With a sensible heuristic
it still finds the cheapest path, and it usually expands far fewer nodes than
uniform-cost search to get there. It is also the workhorse behind real routing and
pathfinding — the same f = g + h rule plans the shortest drive in a maps app and the
path a game character walks across a map.
The evaluation function
- g(n) — actual cost of the cheapest path found so far from start to n.
- h(n) — heuristic estimate of cost from n to the nearest goal.
- f(n) — the priority key. Expand the smallest f from the frontier.
Two heuristic properties decide the algorithm’s guarantees:
- Admissible —
h(n)never overestimates the true cost to a goal (h(n) <= h*(n)). With an admissible h, A* is optimal: the first goal it pops is on the cheapest path. - Consistent (monotonic) — for every edge
n -> n'with costc,h(n) <= c + h(n'). Consistency implies admissibility and guarantees every node is expanded at most once.
A small graph to expand
Five nodes, a few weighted edges, one start S and one goal G. Each node carries its heuristic value; we will walk the frontier step by step.
How GATE asks this
A real NAT/MCQ pattern: a tiny weighted graph with heuristic values printed on each node, and “List the order in which A* expands nodes” or “What is the f-value when node X is expanded?”. Standing convention: alphabetical tie-break when two frontier nodes share the lowest f. This pattern appeared in both GATE DA 2024 and 2025.
Worked example — tracing the expansion order
Using the graph above. Start S, goal G. Initialise the frontier with S.
Step 1. Frontier: {S(g=0, f=0+4=4)}. Expand S. Push neighbours:
- A via S: g = 0 + 1 = 1, f = 1 + 3 = 4
- B via S: g = 0 + 2 = 2, f = 2 + 1 = 3
Frontier: {A(f=4), B(f=3)}.
Step 2. Lowest f is B (f = 3). Expand B. Push G via B:
- G via B: g = 2 + 3 = 5, f = 5 + 0 = 5
Frontier: {A(f=4), G(f=5)}.
Step 3. Lowest f is A (f = 4). Expand A. Push G via A:
- G via A: g = 1 + 5 = 6, f = 6 + 0 = 6
But G was already on the frontier with g = 5, which is cheaper. The new path
through A is worse and gets discarded. Frontier: {G(f=5)}.
Step 4. Pop G — goal reached on the path S -> B -> G with total
cost 5.
So the expansion order is S, B, A, G (or S, B, A if you stop at “non-goal expansions”; many GATE keys count G’s pop as the fourth expansion). The optimal path cost is 5 and matches the f-value of G when it was popped.
Quick check
Quick check
Practice this in an interview
All questionsUse backtracking with a running total. At each step, try adding a candidate to the current path. If the total equals the target, record the path. If it exceeds the target, prune. Passing the same start index (not i+1) back into the recursion allows unlimited reuse of the same element.
Grid search exhaustively tries every combination in a predefined grid, which is only practical for 1–2 hyperparameters. Random search samples combinations uniformly at random and finds good values faster per compute budget, especially when only a few hyperparameters actually matter. Bayesian optimisation fits a surrogate model of the objective and proposes the next trial intelligently, giving the best sample efficiency for expensive evaluations.
Scan left to right, carrying a running sum. At each element, decide: extend the existing subarray (add to the running sum) or start fresh (take just the current element). Whichever is larger becomes the new running sum. Track the global maximum throughout. One pass, O(n) time, O(1) space.
Catalyst is a rule-based and cost-based query optimizer that transforms a logical plan through four phases — analysis, logical optimization, physical planning, and code generation — before any data is touched. Adaptive Query Execution (AQE), introduced in Spark 3, extends this by re-optimizing the physical plan at runtime using actual shuffle statistics rather than stale estimates.