datarekha
MLOps Medium

What goes in a model card, and how do you provide explainability for production decisions?

The short answer

A model card documents a model's intended use, training data, evaluation results broken down by relevant subgroups, known limitations, and ethical considerations, so stakeholders can judge whether and where it should be used. Explainability is provided through methods like SHAP or LIME for feature attributions, plus logging the inputs and reasons behind each decision so it can be audited or contested. Together they support transparency, oversight, and regulatory requirements for high-risk systems.

How to think about it

The short answer

A model card is a short, standardized document of a model’s intended use, training data, evaluation (disaggregated by subgroup), limitations, and ethical considerations — so stakeholders can judge whether and where to use it. Explainability is provided via attribution methods (SHAP, LIME) plus logging the inputs and reasons behind each decision so it can be audited or contested.

What goes in a model card

  • Intended use & out-of-scope use: what it’s for and explicitly not for.
  • Training data: source, time range, provenance, known biases.
  • Evaluation: metrics overall and broken down by relevant subgroups (the disaggregation is what surfaces fairness issues).
  • Limitations & ethical considerations: failure modes, populations where it underperforms.
  • Versioning/owner: tied to the registry version and a responsible team.

Disaggregated evaluation is the key part — a model can look great on average while failing a subgroup, and the card forces that to be visible.

Providing explainability

  • Global: feature importance to understand overall behavior.
  • Local / per-decision: SHAP or LIME to explain why this specific prediction came out the way it did — essential for high-risk decisions a user can contest (loans, hiring).
  • Logging: persist the inputs, model version, and attribution for each decision so it’s auditable after the fact. This supports the EU AI Act’s logging and human-oversight requirements.

Concrete example

A loan applicant is declined. The system returns SHAP attributions showing the top factors, logs them with the model version, and a reviewer can inspect and override. The model card already documented that the model underperforms on a thin-credit-file subgroup, so reviewers apply extra scrutiny there.

Common follow-up / trap

A common probe: “Is SHAP enough to be ‘explainable’ for regulators?” Attributions help but aren’t sufficient alone — you also need documented intended use, subgroup evaluation, human oversight, and decision logging. The trap is equating a single XAI library with responsible AI; the model card plus logged, contestable decisions is the fuller answer.

Learn it properly Responsible-AI ops

Keep practising

All MLOps questions

Explore further

Skip to content