Case & Behavioral Hard Asked at StripeAsked at PayPalAsked at NetflixAsked at Uber

How do you choose the classification threshold for a model when the goal is a business outcome, not pure accuracy?

The short answer

The default 0.5 threshold optimises for balanced accuracy but is rarely the right choice for business objectives. The correct threshold is found by translating the business cost of false positives and false negatives into a cost matrix, then sweeping the threshold on a held-out set to find the point that minimises expected cost or maximises expected profit. Operational constraints — such as review-team capacity — further bound the feasible region.

How to think about it

Why 0.5 is almost never optimal

A threshold of 0.5 is appropriate only when false positives and false negatives carry equal cost, and when the positive-class prior matches 50 %. In business problems neither condition holds.

Framework

Step 1 — Quantify the cost asymmetry. Ask the business stakeholder:

What does a false negative cost? (Missing a fraud case: average fraud loss = $240.)
What does a false positive cost? (Wrongly blocking a legitimate transaction: lost sale ~$80, customer service call ~$15. Total ~$95.)
Cost ratio: FN cost / FP cost = 240 / 95 ≈ 2.5. FNs are 2.5x more expensive.

Step 2 — Define the objective function.

Expected cost at threshold t = (FP rate x FP cost x N_neg) + (FN rate x FN cost x N_pos)

Minimise expected cost over t.

Step 3 — Sweep the threshold.

For each candidate threshold t in {0.1, 0.15, …, 0.90}:

Compute confusion matrix on the validation set.
Compute expected cost using the formula above.
Record the result.

Plot expected cost vs threshold. The minimum is the business-optimal threshold.

Step 4 — Apply operational constraints. If the fraud-review team can process at most 500 flagged cases per day, the threshold must be set high enough that the flag rate does not exceed their capacity. This may shift the threshold away from the mathematical minimum.

Worked example. Fraud model, 100,000 daily transactions, 0.5 % base rate (500 fraud). At threshold 0.5: precision 72 %, recall 61 %, expected cost = (500 x 0.39 x $240) + (99,500 x 0.08 x $95) = $46,800 + $756,200 = $803,000. At threshold 0.3: precision 58 %, recall 81 %, expected cost = (500 x 0.19 x $240) + (99,500 x 0.14 x $95) = $22,800 + $1,323,350 — worse. At threshold 0.65: precision 84 %, recall 52 %, expected cost = $57,600 + $415,800 = $473,400. The 0.65 threshold wins given this cost structure.

How do you choose the classification threshold for a model when the goal is a business outcome, not pure accuracy?

Why 0.5 is almost never optimal

Framework

Keep practising

Explore further