Why your model made that prediction: SHAP in production

A risk team at a mid-size lender shipped a gradient-boosted model for loan approvals. It beat their old scorecard on every metric they cared about. Then a regulator asked a simple question: when you deny someone, why? Not “the model said no” — the specific principal reasons, in plain English, within thirty days, as the Equal Credit Opportunity Act has required since 1974.

They reached for SHAP, generated reason codes, and felt safe. Six months later an internal audit found that the third-ranked reason on a chunk of denials was a feature the deployed model barely used — its credit went to a correlated cousin under the explainer mode someone had picked without thinking about it. The reasons were defensible-looking. They were also, for those applicants, not quite true.

This is the SHAP trap in miniature. The library is extraordinary — shap/shap sits around 25.5k GitHub stars and ships something like 15 million downloads a month — and the math underneath it is some of the cleanest in all of machine learning. That is exactly what makes it dangerous. SHAP gives you a number with a guarantee, and a number with a guarantee feels like truth. Most of the time it is answering a narrower question than the one you think you asked.

This is a field guide to that gap.

What Shapley values actually compute

The idea predates machine learning by half a century. Lloyd Shapley, working in cooperative game theory in 1953, asked: when a coalition of players produces some total payout, what is each player’s fair share? His answer — the Shapley value — is the average marginal contribution of a player across every possible order in which the coalition could have formed. It is the unique allocation satisfying a short list of fairness axioms, and it won him a share of the 2012 Nobel in economics.

Scott Lundberg and Su-In Lee’s contribution, in A Unified Approach to Interpreting Model Predictions at NIPS 2017, was to cast model explanation as exactly this game. The “players” are your features. The “payout” is the model’s prediction for one row. Each feature’s SHAP value is its fair share of the credit for pushing that prediction away from a baseline. The paper proved their formulation is the only additive feature-attribution method satisfying local accuracy, missingness, and consistency — and showed that LIME, DeepLIFT, and layer-wise relevance propagation were all approximations of it. That unification is why SHAP ate the field; the paper now has tens of thousands of citations.

The one property you must internalize is additivity, also called efficiency or local accuracy:

That accounting identity is the whole game. It means you can take a single prediction, decompose it into per-feature contributions, and know the parts genuinely add up to the whole. No other popular explainability method gives you that. It is also — we will get to this — the single feature that makes SHAP suitable for a regulator.

The waterfall is the honest picture of one local explanation. The additivity guarantee is what makes the bars reach exactly from base value to prediction — and why the units on the axis matter so much.

Local first, global as a smell test

SHAP is fundamentally a local method: one explanation, one row. That waterfall above is the native object. “Global importance” — the bar chart of mean(|SHAP|) across your dataset, the beeswarm plot everyone screenshots — is not a separate thing the algorithm computes. It is an aggregate of many local explanations, and it inherits every caveat of the locals it averages.

This matters more than it sounds. When you collapse thousands of signed local attributions into one magnitude per feature, you throw away direction. A feature can rank near the top of a global importance plot while its effect is strongly positive for half your population and strongly negative for the other half. The global plot tells you the feature moves the output a lot here. It does not tell you which way, and it does not tell you the feature matters in the real world. Treat global SHAP as a smell test for “what is this model leaning on,” never as ground truth about importance.

Local versus global is the difference between two products built from the same primitive. Keep them mentally separate.

TreeSHAP vs KernelSHAP: the cost gulf is enormous

Computing exact Shapley values is brutal. The honest definition requires evaluating the model on every possible coalition of present-and-absent features — 2^M subsets for M features. At 30 features that is a billion model evaluations per row. You cannot do this in production.

There are two escape routes, and choosing the wrong one is a six-figure compute mistake.

KernelSHAP is model-agnostic. It works on anything that maps inputs to outputs — a black-box API, a stacked ensemble, whatever. It cleverly reframes Shapley estimation as a weighted least-squares problem, then samples a subset of coalitions weighted by the Shapley kernel and solves for the attributions. The catch: it returns an approximation, not the exact value. Under-sample it and your attributions get noisy and unstable; under correlated predictors the approximation can be imprecise enough to flip an attribution’s sign. There is active research, like Improving the Sampling Strategy in KernelSHAP, precisely because the sampling is the weak point.

TreeSHAP is the one to reach for whenever you can. Introduced in Lundberg et al.’s From local explanations to global understanding with explainable AI for trees (Nature Machine Intelligence, 2020), it exploits tree structure to compute exact Shapley values for tree ensembles in polynomial time — collapsing the cost from the exponential O(TL·2^M) of brute force to O(TLD²) (trees, max leaves, depth, features). For XGBoost, LightGBM, or any random forest, this is not a marginal win. It is the difference between feasible and impossible.

Even exact TreeSHAP gets expensive at scale, which is why an entire cottage industry of accelerators exists. LinkedIn open-sourced FastTreeSHAP after hitting a wall they describe candidly: explaining 20 million samples for a 400-tree random forest at depth 12 took as long as 30 hours on a 50-core server — the scale at which they need to explain user-level models like feed ranking and job search. Their v2 variant runs roughly 2.5x faster. NVIDIA’s GPUTreeShap, now an XGBoost backend, reports up to 19x speedups for SHAP values and up to 340x for interaction values on a single V100 versus a multi-core CPU baseline — in the extreme case, six hours down to one minute. Eight V100s hit 1.2 million rows per second, throughput they estimate would need around 6,850 CPU cores to match.

The headline: SHAP is cheap to try and decidedly not free to run at scale. Plan for it like any other production workload.

Where SHAP misleads

Here is the part most tutorials skip, and the reason this post exists. SHAP’s guarantees are real but narrow, and the gap between what it guarantees and what people read into it is where careers and compliance programs get hurt.

1. Correlated features split credit arbitrarily

Give SHAP a correlated cluster — say {credit utilization, balance, available credit} — and watch what happens. The model may load almost entirely on one of them; SHAP hands that feature most of the credit and the others near-zero. Which feature “wins” can be partly an artifact of how the trees split, not a statement about real importance. Worse, the answer depends on a mode setting you may not know you chose.

TreeSHAP has two perturbation modes that give genuinely different numbers. The interventional mode computes classic Shapley values and is “true to the model” — it credits only features the model actually uses. The tree_path_dependent (observational) mode respects feature correlations and is “true to the data” — and under it, a feature with no influence on the prediction can receive a nonzero SHAP value, because credit gets spread across correlated proxies the model never touched. Lundberg’s own co-authored paper, True to the Model or True to the Data?, frames the observational value as spreading importance among correlated features such that “intervening on such features will not impact the model’s output.”

This is not a bug to be fixed. It is a question you must answer: do you want to know what the model used, or what the data relationships are? The mistake is picking blindly — or, like our opening lender, comparing or stacking explanations computed under different modes without realizing they answer different questions. The shap package’s default has shifted toward interventional over time; do not assume, check your version. And cluster your correlated features (hierarchical clustering, shap.utils partitioning) before you draw conclusions about any one of them.

2. It shows association, not causation

This is the most common and most dangerous error, and SHAP’s own documentation warns about it in bold: “making correlations transparent does not make them causal.”

Their canonical worked example deserves repeating because it is so clean. A churn model learns that customers who report more software bugs are more likely to renew — SHAP dutifully shows a positive attribution for bug reports. Read causally, you would conclude you should ship more bugs. The truth is an unmeasured confounder: customers with high product need both use the product harder (hitting more bugs) and renew more. The model captured a real predictive association and zero causal effect. Intervene on bugs and renewals will move the opposite direction SHAP’s sign suggests.

SHAP attributions are predictive associations the model learned, full stop. The academic critique is sharp here too — Kumar et al.’s Problems with Shapley-value-based explanations as feature importance measures (ICML 2020) argues the mathematical properties don’t necessarily align with human explanation goals, and that fixing it requires causal machinery the base method simply lacks. Acting on SHAP values as policy levers — “feature X is important, so let’s change X” — is the single most seductive mistake in the entire field.

3. Base-value and unit confusion

The base value E[f(X)] is not a constant of nature. It depends entirely on the background dataset you handed the explainer; change the background, change the baseline, change every attribution. And for classifiers, SHAP values usually live in model-output units — log-odds — not probability. People read a log-odds waterfall as if the bars were percentage points, or compare SHAP magnitudes across two models with different backgrounds as if the numbers were commensurable. They are not. Always know your units and your background before you quote a SHAP number to anyone.

4. Misreading the beeswarm plot

The summary/beeswarm plot is the most screenshotted and most misread object in the library. Two independent things are encoded: the x-position is the SHAP value (signed impact on output), and the color is the feature value (high/low). People conflate them constantly. And the feature ranking — mean(|SHAP|) — is magnitude of impact on this model’s output on this dataset, not real-world importance and not direction. A feature high on the list may have its effect split positive and negative across the population, washing out to “important but directionless.”

Where SHAP genuinely earns its keep

After all that, you might think I’m down on SHAP. I am not. I reach for it constantly — but for the jobs it’s actually good at.

Model debugging is where SHAP is at its absolute best. When a SHAP plot shows your model leaning hard on account_id or application_timestamp, you have just caught target leakage in seconds. When a feature you expected to dominate sits near zero, you have a data-pipeline bug or a spurious proxy. SHAP’s honest job is fidelity to the model’s behavior on your data — and for catching leakage, surprising splits, and proxies, that fidelity is exactly what you want. This is also the one place it touches production monitoring: when a model’s quality degrades, SHAP-on-the-drifted-slice tells you which features shifted the behavior, a natural companion to the drift detection covered in model monitoring in 2026.

Stakeholder trust is real and underrated. A product manager who sees a waterfall explaining one prediction trusts the system more than one handed an AUC. Used honestly — as “here is what the model did,” never “here is what’s true” — SHAP is a superb communication tool.

Regulatory reason codes are where the additivity guarantee becomes a business asset. Under ECOA and Regulation B (12 CFR 1002.9), a creditor must give specific principal reasons for adverse action within 30 days; the official commentary says disclosing more than four reasons is “not likely to be helpful,” and vague boilerplate like “failed to achieve a qualifying score” is explicitly insufficient. The CFPB closed the AI loophole hard with Circular 2022-03: complex and machine-learning models do not escape the duty to give accurate, specific reasons.

This is the perfect job for SHAP. Because contributions provably sum to (prediction − base value), you can rank the genuine top drivers of a single denial and defend the ranking with math a regulator can audit — not a heatmap, an accounting identity. Vendors live here: Zest AI markets adverse-action key-factor generation for ML underwriting (positioned for ECOA/FCRA/SR 11-7), and notably patented its own attribution method, Generalized Integrated Gradients, citing SHAP’s limitations for credit; FICO ships explainable-ML reason codes in its platform. Note that “SHAP-style reason codes” is the honest phrasing — these stacks aren’t always the open-source library.

The honest cost-benefit

SHAP is cheap to add, high-value for debugging and compliance, and genuinely defensible where it counts. It is also not free at scale, easy to misread in at least four distinct ways, and — this is the crux — it answers “what did the model do?” and not “what should we change?” Deploy it enthusiastically for the former and refuse it for the latter.

The frontier has already moved past per-feature attribution anyway. The interesting truth is that credit between correlated and interacting features is genuinely shared, and tools like shapiq (NeurIPS 2024) now compute any-order Shapley interactions, with a TreeSHAP-IQ fast path for boosted trees. If you’ve ever felt that “feature A got the credit but really it’s A-and-B together,” interaction values are the principled answer.

What to take away

Four lines, earned the hard way:

The additivity guarantee is the whole value proposition. Attributions sum exactly to prediction minus base value. That’s what you show a regulator, and it’s the one thing no competing method matches. Everything else SHAP tells you is softer than it looks.
Match the explainer to the model. TreeSHAP is exact and polynomial for tree ensembles — use it. Save KernelSHAP for genuinely black-box models and pay for it knowingly.
SHAP explains the model, not the world. It is excellent for debugging and trust, defensible for reason codes, and dangerous for causal or policy questions unless you bolt on a causal framework. Association is not causation, no matter how clean the waterfall.
Know which question you’re asking before you read the answer. Local or global, interventional or observational, log-odds or probability — pick deliberately. The lender in the opening didn’t, and produced reason codes that were defensible-looking and quietly wrong.

SHAP is one of the best tools we have for understanding our models. Respect what it guarantees, distrust what you’ve projected onto it, and it will earn its place. Confuse “what the model did” with “what is true,” and it will hand you a confident, beautifully-rendered mistake.

Further reading: Lundberg & Lee’s original SHAP paper (NIPS 2017) and the tree paper (Nature MI 2020) are the primary sources. Christoph Molnar’s SHAP chapter in Interpretable ML is the best free reference, and Aidan Cooper’s non-technical guide to interpreting SHAP is the clearest plain-English walkthrough of the plots.