Machine Learning Medium Asked at StripeAsked at GoogleAsked at AmazonAsked at Airbnb

How do you choose the optimal decision threshold for a binary classifier?

For Data Scientist ML Engineer AI / LLM Engineer

The short answer

The optimal threshold depends on the business cost of false positives versus false negatives, not on defaulting to 0.5. You choose it by plotting the PR or ROC curve on a held-out set, computing the metric that captures your cost function (e.g., F-beta, revenue, expected cost) at each threshold, and selecting the point that maximises it. Threshold tuning is free and should always precede resampling or model changes.

How to think about it

Explain why 0.5 is wrong by default, then cover each method for finding the right threshold.

Why 0.5 is not the default

Logistic regression and most probabilistic models output P(Y=1 | x). The threshold converts that probability into a hard prediction. At 0.5, predicting positive and negative are given equal implicit cost — a reasonable assumption only when the classes are balanced and FP and FN have equal business impact, which is rarely true.

For a fraud model where catching fraud is worth $200 but a wrongly blocked transaction costs $2 in customer service, the optimal threshold is far below 0.5.

Method 1: business cost function

Define cost(threshold) = FP_rate * cost_FP + FN_rate * cost_FN on a validation set. Sweep the threshold and pick the value minimising total cost. This is the most rigorous approach when you have reliable cost estimates.

Method 2: F-beta optimisation

If you can express the relative importance of precision vs. recall as a ratio, optimise F-beta over the validation set:

from sklearn.metrics import fbeta_score
import numpy as np

thresholds = np.linspace(0, 1, 200)
scores = [fbeta_score(y_val, (y_prob >= t).astype(int), beta=2)
          for t in thresholds]
best_threshold = thresholds[np.argmax(scores)]

Method 3: PR curve — maximum F1 point

On the PR curve, the threshold at the rightmost point of the curve where precision equals recall is the F1-maximising threshold. Most sklearn utilities expose precision_recall_curve which returns per-threshold precision and recall arrays, making this a one-liner.

Method 4: ROC curve — Youden’s J

Youden’s J = Sensitivity + Specificity - 1 = TPR - FPR. Maximising J on the ROC curve gives the threshold that maximises the combined true-positive and true-negative rate — appropriate when both classes matter equally.

Applying the chosen threshold

Once chosen on the validation set, apply it to the test set without re-tuning. Tuning threshold on the test set inflates estimated performance.

In deployment, expose the threshold as a configurable parameter so operations teams can adjust it when business costs change — without retraining the model.

Learn it properly Metrics that matter