Machine Learning Medium Asked at AmazonAsked at GoogleAsked at UberAsked at Lyft

What are the key regression metrics — MAE, RMSE, MAPE, R² — and what are their failure modes?

For Data Scientist ML Engineer Data Analyst

The short answer

MAE, RMSE, MAPE, and R² each measure a different aspect of regression quality and each has a regime where it misleads. RMSE is dominated by outliers; MAE is robust but hides large-error tails; MAPE is undefined at zero and asymmetrically penalises under-prediction; R² can appear high even when absolute errors are large, and can be negative, yet is still commonly misread as a percentage-correct. Choosing the right metric requires knowing the cost structure of the prediction task.

How to think about it

Cover each metric’s formula, intuition, failure mode, and best use case in a scannable format.

MAE — Mean Absolute Error

MAE = (1/n) * sum |y_i - y_hat_i|

Intuition. Average absolute deviation in the same units as the target. A MAE of $500 on a house price model means predictions are off by $500 on average.

Best use case. When you want errors to be understandable to non-technical stakeholders and when large errors should not dominate the metric.

Failure mode. MAE hides the distribution of errors. A model with 999 errors of $1 and 1 error of $10,000 has MAE of $11 — looking excellent while being catastrophic in one case.

RMSE — Root Mean Squared Error

RMSE = sqrt((1/n) * sum (y_i - y_hat_i)²)

Intuition. Same units as MAE, but large errors are weighted quadratically. RMSE >= MAE always; the gap reflects how much the error distribution is skewed by outliers.

Best use case. When large prediction errors are disproportionately costly — demand planning, energy load forecasting, financial risk.

Failure mode. A single extreme outlier can drive RMSE to look terrible even if the model is excellent on 99% of cases. Always inspect the error distribution, not just RMSE.

MAPE — Mean Absolute Percentage Error

MAPE = (100/n) * sum |(y_i - y_hat_i) / y_i|

Intuition. Scale-free error as a percentage of actual value. Easy to communicate: “Our forecast is off by 8% on average.”

Best use case. Comparing forecast quality across products or targets with very different scales (e.g., forecasting sales for both a $1 and a $10,000 item).

Failure modes:

Undefined when any y_i = 0.
Asymmetric: a 50% under-prediction contributes 50%; a 50% over-prediction also contributes 50% — but a 200% over-prediction contributes 200%, while you can never under-predict by more than 100%.
Biases optimised models toward over-predicting, because over-predictions are less penalised near zero.

Alternatives. sMAPE (symmetric MAPE) and MASE (Mean Absolute Scaled Error) address these issues; MASE is generally preferred for time series.

R² — Coefficient of Determination

R² = 1 - SS_res / SS_tot, where SS_res = sum(y - y_hat)², SS_tot = sum(y - y_bar)²

Intuition. Fraction of target variance explained by the model. Baseline (always predict mean) = 0.0; perfect = 1.0.

Failure modes:

High R² does not mean small errors — depends entirely on target variance magnitude.
Adding features to a linear regression can only increase R² on training data, even for noise features. Use adjusted R² for model comparison.
R² can be negative on test data (model worse than the mean).
R² is not the “percentage of variance explained” in a causal sense — it is a comparison to a specific baseline (the mean).

Learn it properly Metrics that matter