datarekha

Decision Trees — Making Choices Under Uncertainty

Launch nationally, run a pilot, or skip it entirely? Decision trees give you a structured way to compare options when the future is uncertain — and fold-back arithmetic tells you which branch wins.

8 min read Intermediate Business Analytics Lesson 14 of 21

What you'll learn

  • What decision nodes (squares) and chance nodes (circles) mean in a decision tree
  • How fold-back works: evaluate right-to-left, pick the highest EV at every decision node
  • Why a lower-EV pilot can still be the smarter choice — value of information
  • How a single shaky probability estimate can flip the entire recommendation

Before you start

Your market research says there’s a 45% chance the market is good. A full national launch could return $900k — or cost you $300k if it flops. A cautious pilot caps your downside at $50k but also caps your upside at $250k. Doing nothing costs nothing and earns nothing.

These three options feel incomparable — different upside, different downside, different probabilities. A decision tree is the tool that makes them comparable.

What a decision tree is

A decision tree lays your problem out left to right like a branching path. Two kinds of nodes (junction points) do all the work:

  • A square node is a decision node — a fork where you pick which branch to take (launch, pilot, skip).
  • A circle node is a chance node — a fork where the market picks, and each branch carries a probability (good market, bad market).

At the far right of every branch sits a payoff — the dollar outcome if the path up to that point actually happened.

The art is drawing the tree accurately. The math is called fold-back.

Fold-back: evaluate right to left

Fold-back (also called roll-back) is the procedure for collapsing the tree into a single recommended action. You work from the tips back toward the root:

  1. At every chance node, compute the expected valueprobability × payoff summed across all branches. (Expected value, introduced in the previous lesson, is the probability-weighted average outcome.)
  2. At every decision node, keep only the branch with the highest EV and discard the others.
  3. The EV that survives at the root is the value of the whole decision — and the surviving path is your recommendation.

Let’s do it with real numbers.

The launch decision — by hand

Here are the payoffs (in $k for readability):

PathProbabilityPayoff
Go national, market good0.45+$900
Go national, market bad0.55−$300
Run a pilot, market good0.45+$250
Run a pilot, market bad0.55−$50
Don’t launch1.00$0

Fold back the two chance nodes first:

EV(Go national) = 0.45 × $900 + 0.55 × (−$300)
               = $405 − $165
               = $240

EV(Run a pilot) = 0.45 × $250 + 0.55 × (−$50)
               = $112.50 − $27.50
               = $85

EV(Don't launch) = $0

At the decision node you compare $240, $85, and $0. Go national wins — at 45% odds of a good market, the expected payoff of $240 beats the pilot ($85) and doing nothing ($0).

Try it: drag the probability and watch the recommendation flip

The widget shows the exact tree above. Hit Go national at 45% — it highlights green at $240. Now drag P(market is good) down toward 20%. Watch the recommendation shift. At a low enough probability the national launch EV turns negative and the pilot (or even doing nothing) becomes the smart call. The tree doesn’t change; only one input changes — and the whole recommendation flips.

That is both the power and the risk of a decision tree.

Why a pilot can beat national even with a lower EV

Notice that the pilot EV ($85) is well below the national EV ($240) at 45% odds. Yet there are two reasons a rational manager might still choose the pilot:

Downside protection. The worst case nationally is −$300. The worst case for a pilot is −$50. If your company cannot absorb a $300k loss — say, it would force layoffs or kill another project — the pilot’s capped downside has real value that the EV number doesn’t capture.

Value of information. A pilot lets you learn before committing fully. Running a small test and observing real customer behaviour updates your probability estimate. If the pilot goes well you can then launch nationally with much higher confidence. The technical term is value of information — sometimes the smart move is the option that lets you learn cheaply before betting big, even if its standalone EV is lower.

Where trees go wrong

The practical implication: spend your energy getting the probabilities right, not just the payoffs. A $900k upside is irrelevant if the probability attached to it is fantasy.

Decision trees in practice

Decision trees show up wherever structured choices meet uncertain outcomes: capital allocation, product roadmaps, hiring a senior role vs. promoting internally, entering a new market. The format scales — you can nest chance nodes inside chance nodes, add more decision points later in the tree (should we double down if the pilot succeeds?), and attach costs to the act of gathering information itself.

The core discipline is always the same: draw the tree honestly, assign probabilities carefully, fold back from right to left, and let the EV guide — not override — your judgment.

Check your understanding

Quick check

0/3
Q1At P(good) = 45%, what is the EV of Go National?
Q2You are evaluating two options. Option A: 60% chance of +$200, 40% chance of −$100. Option B: guaranteed $60. Fold back and choose.
Q3A pilot has a lower EV than a national launch. Under which circumstance is choosing the pilot still rational?

Next

Sensitivity analysis — when your probability estimate is uncertain (and it always is), sensitivity analysis tells you which shaky assumption matters most and how far it has to move before your recommendation changes.

Practice this in an interview

All questions
Walk me through exactly how a decision tree chooses a split at each node.

At each node the algorithm iterates over every feature and every candidate threshold, scores each candidate split by the weighted impurity of the two child nodes, and selects the pair that gives the largest impurity reduction. It then recurses on each child until a stopping criterion is met.

How do you choose the optimal decision threshold for a binary classifier?

The optimal threshold depends on the business cost of false positives versus false negatives, not on defaulting to 0.5. You choose it by plotting the PR or ROC curve on a held-out set, computing the metric that captures your cost function (e.g., F-beta, revenue, expected cost) at each threshold, and selecting the point that maximises it. Threshold tuning is free and should always precede resampling or model changes.

What is pruning in decision trees and when would you use pre-pruning versus post-pruning?

Pruning removes splits that do not improve generalisation. Pre-pruning stops growth early via hyperparameters like max_depth or min_samples_leaf. Post-pruning (cost-complexity pruning) grows the full tree then collapses nodes whose removal does not hurt held-out accuracy enough.

How do you choose the classification threshold for a model when the goal is a business outcome, not pure accuracy?

The default 0.5 threshold optimises for balanced accuracy but is rarely the right choice for business objectives. The correct threshold is found by translating the business cost of false positives and false negatives into a cost matrix, then sweeping the threshold on a held-out set to find the point that minimises expected cost or maximises expected profit. Operational constraints — such as review-team capacity — further bound the feasible region.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content