datarekha
Machine Learning Medium Asked at GoogleAsked at AmazonAsked at AirbnbAsked at Uber

When should you use RMSE versus MAE for regression evaluation, and what does R-squared actually tell you?

The short answer

RMSE (Root Mean Squared Error) penalises large errors quadratically, making it sensitive to outliers and appropriate when big deviations are disproportionately costly. MAE (Mean Absolute Error) treats all errors linearly, is more robust to outliers, and is easier to interpret in the units of the target. R-squared measures the proportion of target variance explained by the model — a value near 1 is desirable, but it can be high even for a bad model if the baseline variance is low, and it says nothing about prediction error magnitude.

How to think about it

Cover the four main regression metrics — RMSE, MAE, MAPE, R² — with the decision rule for choosing between them.

The metrics

MAE (Mean Absolute Error) = (1/n) * sum |y - y_hat|

Linear penalty. Robust to outliers. Units match the target (e.g., dollars, kilograms). Median prediction minimises MAE; mean minimises MSE.

RMSE (Root Mean Squared Error) = sqrt((1/n) * sum (y - y_hat)²)

Quadratic penalty. Dominated by the largest errors. Units match the target. Use when large errors are especially costly — e.g., demand forecasting where a 10x overstock is far worse than a 2x overstock.

MAPE (Mean Absolute Percentage Error) = (100/n) * sum |(y - y_hat) / y|

Scale-independent, making it useful for comparing models across different targets. Undefined when y = 0; asymmetric (over-predictions are bounded at 100% but under-predictions are not). Use with caution on targets that can be zero or near-zero.

R² (Coefficient of Determination) = 1 - SS_res / SS_tot

Where SS_res = sum (y - y_hat)² and SS_tot = sum (y - y_bar)².

R² is the fraction of variance in the target explained by the model. R² = 1 is a perfect fit; R² = 0 means the model is no better than always predicting the mean; R² can be negative (meaning the model is worse than predicting the mean).

Choosing between RMSE and MAE

SituationPrefer
Outliers are common and should not drive evaluationMAE
Large errors are disproportionately costlyRMSE
Communicating error to non-technical stakeholdersMAE (intuitive units)
Training loss for gradient-based modelsMSE (differentiable everywhere)
Comparing across datasets with different target scalesMAPE or normalised variants

R² gotchas

  • Adding any feature to a linear regression cannot decrease R² on training data, even a random noise feature. Use adjusted R² when comparing models with different numbers of features.
  • A high R² does not guarantee small absolute errors. If target variance is enormous (e.g., house prices range ±$1M), R² = 0.95 can still imply RMSE = $50,000.
  • R² is defined for the OLS comparison to the mean — it is less interpretable for non-linear models or when the target distribution is highly skewed.
Learn it properly Metrics that matter

Keep practising

All Machine Learning questions

Explore further

Skip to content