What are the key regression metrics — MAE, RMSE, MAPE, R² — and what are their failure modes?
MAE, RMSE, MAPE, and R² each measure a different aspect of regression quality and each has a regime where it misleads. RMSE is dominated by outliers; MAE is robust but hides large-error tails; MAPE is undefined at zero and asymmetrically penalises under-prediction; R² can appear high even when absolute errors are large, and can be negative, yet is still commonly misread as a percentage-correct. Choosing the right metric requires knowing the cost structure of the prediction task.
How to think about it
Cover each metric’s formula, intuition, failure mode, and best use case in a scannable format.
MAE — Mean Absolute Error
MAE = (1/n) * sum |y_i - y_hat_i|
Intuition. Average absolute deviation in the same units as the target. A MAE of $500 on a house price model means predictions are off by $500 on average.
Best use case. When you want errors to be understandable to non-technical stakeholders and when large errors should not dominate the metric.
Failure mode. MAE hides the distribution of errors. A model with 999 errors of $1 and 1 error of $10,000 has MAE of $11 — looking excellent while being catastrophic in one case.
RMSE — Root Mean Squared Error
RMSE = sqrt((1/n) * sum (y_i - y_hat_i)²)
Intuition. Same units as MAE, but large errors are weighted quadratically. RMSE >= MAE always; the gap reflects how much the error distribution is skewed by outliers.
Best use case. When large prediction errors are disproportionately costly — demand planning, energy load forecasting, financial risk.
Failure mode. A single extreme outlier can drive RMSE to look terrible even if the model is excellent on 99% of cases. Always inspect the error distribution, not just RMSE.
MAPE — Mean Absolute Percentage Error
MAPE = (100/n) * sum |(y_i - y_hat_i) / y_i|
Intuition. Scale-free error as a percentage of actual value. Easy to communicate: “Our forecast is off by 8% on average.”
Best use case. Comparing forecast quality across products or targets with very different scales (e.g., forecasting sales for both a $1 and a $10,000 item).
Failure modes:
- Undefined when any
y_i = 0. - Asymmetric: a 50% under-prediction contributes 50%; a 50% over-prediction also contributes 50% — but a 200% over-prediction contributes 200%, while you can never under-predict by more than 100%.
- Biases optimised models toward over-predicting, because over-predictions are less penalised near zero.
Alternatives. sMAPE (symmetric MAPE) and MASE (Mean Absolute Scaled Error) address these issues; MASE is generally preferred for time series.
R² — Coefficient of Determination
R² = 1 - SS_res / SS_tot, where SS_res = sum(y - y_hat)², SS_tot = sum(y - y_bar)²
Intuition. Fraction of target variance explained by the model. Baseline (always predict mean) = 0.0; perfect = 1.0.
Failure modes:
- High R² does not mean small errors — depends entirely on target variance magnitude.
- Adding features to a linear regression can only increase R² on training data, even for noise features. Use adjusted R² for model comparison.
- R² can be negative on test data (model worse than the mean).
- R² is not the “percentage of variance explained” in a causal sense — it is a comparison to a specific baseline (the mean).