Decision Trees — Making Choices Under Uncertainty
Launch nationally, run a pilot, or skip it entirely? Decision trees give you a structured way to compare options when the future is uncertain — and fold-back arithmetic tells you which branch wins.
What you'll learn
- What decision nodes (squares) and chance nodes (circles) mean in a decision tree
- How fold-back works: evaluate right-to-left, pick the highest EV at every decision node
- Why a lower-EV pilot can still be the smarter choice — value of information
- How a single shaky probability estimate can flip the entire recommendation
Before you start
Your market research says there’s a 45% chance the market is good. A full national launch could return $900k — or cost you $300k if it flops. A cautious pilot caps your downside at $50k but also caps your upside at $250k. Doing nothing costs nothing and earns nothing.
These three options feel incomparable — different upside, different downside, different probabilities. A decision tree is the tool that makes them comparable.
What a decision tree is
A decision tree lays your problem out left to right like a branching path. Two kinds of nodes (junction points) do all the work:
- A square node is a decision node — a fork where you pick which branch to take (launch, pilot, skip).
- A circle node is a chance node — a fork where the market picks, and each branch carries a probability (good market, bad market).
At the far right of every branch sits a payoff — the dollar outcome if the path up to that point actually happened.
The art is drawing the tree accurately. The math is called fold-back.
Fold-back: evaluate right to left
Fold-back (also called roll-back) is the procedure for collapsing the tree into a single recommended action. You work from the tips back toward the root:
- At every chance node, compute the expected value —
probability × payoffsummed across all branches. (Expected value, introduced in the previous lesson, is the probability-weighted average outcome.) - At every decision node, keep only the branch with the highest EV and discard the others.
- The EV that survives at the root is the value of the whole decision — and the surviving path is your recommendation.
Let’s do it with real numbers.
The launch decision — by hand
Here are the payoffs (in $k for readability):
| Path | Probability | Payoff |
|---|---|---|
| Go national, market good | 0.45 | +$900 |
| Go national, market bad | 0.55 | −$300 |
| Run a pilot, market good | 0.45 | +$250 |
| Run a pilot, market bad | 0.55 | −$50 |
| Don’t launch | 1.00 | $0 |
Fold back the two chance nodes first:
EV(Go national) = 0.45 × $900 + 0.55 × (−$300)
= $405 − $165
= $240
EV(Run a pilot) = 0.45 × $250 + 0.55 × (−$50)
= $112.50 − $27.50
= $85
EV(Don't launch) = $0
At the decision node you compare $240, $85, and $0. Go national wins — at 45% odds of a good market, the expected payoff of $240 beats the pilot ($85) and doing nothing ($0).
Try it: drag the probability and watch the recommendation flip
The widget shows the exact tree above. Hit Go national at 45% — it highlights green at $240. Now drag P(market is good) down toward 20%. Watch the recommendation shift. At a low enough probability the national launch EV turns negative and the pilot (or even doing nothing) becomes the smart call. The tree doesn’t change; only one input changes — and the whole recommendation flips.
That is both the power and the risk of a decision tree.
Why a pilot can beat national even with a lower EV
Notice that the pilot EV ($85) is well below the national EV ($240) at 45% odds. Yet there are two reasons a rational manager might still choose the pilot:
Downside protection. The worst case nationally is −$300. The worst case for a pilot is −$50. If your company cannot absorb a $300k loss — say, it would force layoffs or kill another project — the pilot’s capped downside has real value that the EV number doesn’t capture.
Value of information. A pilot lets you learn before committing fully. Running a small test and observing real customer behaviour updates your probability estimate. If the pilot goes well you can then launch nationally with much higher confidence. The technical term is value of information — sometimes the smart move is the option that lets you learn cheaply before betting big, even if its standalone EV is lower.
Where trees go wrong
The practical implication: spend your energy getting the probabilities right, not just the payoffs. A $900k upside is irrelevant if the probability attached to it is fantasy.
Decision trees in practice
Decision trees show up wherever structured choices meet uncertain outcomes: capital allocation, product roadmaps, hiring a senior role vs. promoting internally, entering a new market. The format scales — you can nest chance nodes inside chance nodes, add more decision points later in the tree (should we double down if the pilot succeeds?), and attach costs to the act of gathering information itself.
The core discipline is always the same: draw the tree honestly, assign probabilities carefully, fold back from right to left, and let the EV guide — not override — your judgment.
Check your understanding
Quick check
Next
Sensitivity analysis — when your probability estimate is uncertain (and it always is), sensitivity analysis tells you which shaky assumption matters most and how far it has to move before your recommendation changes.
Practice this in an interview
All questionsAt each node the algorithm iterates over every feature and every candidate threshold, scores each candidate split by the weighted impurity of the two child nodes, and selects the pair that gives the largest impurity reduction. It then recurses on each child until a stopping criterion is met.
The optimal threshold depends on the business cost of false positives versus false negatives, not on defaulting to 0.5. You choose it by plotting the PR or ROC curve on a held-out set, computing the metric that captures your cost function (e.g., F-beta, revenue, expected cost) at each threshold, and selecting the point that maximises it. Threshold tuning is free and should always precede resampling or model changes.
Pruning removes splits that do not improve generalisation. Pre-pruning stops growth early via hyperparameters like max_depth or min_samples_leaf. Post-pruning (cost-complexity pruning) grows the full tree then collapses nodes whose removal does not hurt held-out accuracy enough.
The default 0.5 threshold optimises for balanced accuracy but is rarely the right choice for business objectives. The correct threshold is found by translating the business cost of false positives and false negatives into a cost matrix, then sweeping the threshold on a held-out set to find the point that minimises expected cost or maximises expected profit. Operational constraints — such as review-team capacity — further bound the feasible region.