How would you detect whether a metric is being gamed or is otherwise artificially inflated?
Gaming is detectable by looking for statistical signatures: abnormal distribution tails, sudden regime changes that correlate with incentive changes rather than product changes, divergence between the metric and correlated downstream outcomes, and segment-level anomalies that cancel out in the aggregate. The detective work combines anomaly detection with causal reasoning about who benefits from inflating the number.
How to think about it
Why metrics get gamed
Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure. Once a metric is tied to an OKR, promotion criteria, or A/B test outcome, engineers and product managers (and sometimes users) have incentive to optimise the number rather than the underlying behaviour.
Detection signals
1. Distribution shape changes. Plot the metric distribution, not just its mean. If daily session-count distribution develops a spike at exactly 1 session (the threshold for counting a user as “active”), someone may be sending push notifications timed to open the app once and close it immediately.
2. Metric-outcome divergence. A gamed metric diverges from the downstream outcome it is supposed to predict. If DAU rises 10 % but D7 retention, revenue, and NPS are flat or declining, the DAU growth is likely not real engagement growth. Plot the metric vs its downstream correlates over time.
3. Regime change correlated with incentive change. If a metric inflects exactly when a team started being measured against it (an OKR quarter boundary, a bonus cycle), the causality is suspicious.
4. Segment-level cancellation. Aggregate improvements that vanish when you break down by geography or device segment suggest different user sub-populations are being treated differently to hit a number.
5. Velocity anomalies. Use statistical process control: flag weeks where the metric moves more than 3 standard deviations from the historical mean without a corresponding product change or external event.
Worked example. A notification team’s OKR is “notification CTR”. CTR rises from 2.3 % to 4.1 % in one quarter. Investigation: D7 opt-out rate for notifications rose from 8 % to 19 % in the same period. Segment analysis: CTR improvement entirely driven by Android users who had opt-out disabled by a UX change. The metric was gamed by removing friciton on opt-out, not by improving notification quality.