datarekha
Machine Learning Easy Asked at AmazonAsked at WalmartAsked at Capital One

Why does R-squared always increase when you add features, and when should you use adjusted R-squared instead?

The short answer

R-squared measures the proportion of variance explained by the model and can only increase or stay the same as features are added, even if those features are pure noise. Adjusted R-squared penalizes for the number of predictors, making it the right metric for comparing models with different numbers of features.

How to think about it

R-squared (coefficient of determination):

R² = 1 - SS_res / SS_tot

where SS_res = Σ(yᵢ - ŷᵢ)² and SS_tot = Σ(yᵢ - ȳ)².

Adding any predictor — even a random noise column — can only reduce SS_res or keep it the same, because OLS will simply assign it a tiny but nonzero coefficient. Therefore is monotonically non-decreasing in the number of features.

Adjusted R-squared corrects for this:

R²_adj = 1 - (1 - R²) * (n - 1) / (n - p - 1)

where n is the sample size and p is the number of predictors. Adding a predictor that contributes less than its “fair share” of variance explanation will decrease R²_adj.

Practical decision guide:

  • Single model evaluation: is fine for a quick summary.
  • Model selection / feature addition: always compare R²_adj, AIC, or BIC.
  • Out-of-sample performance: prefer cross-validated RMSE over either — both variants are in-sample metrics.
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import numpy as np

model = LinearRegression().fit(X_train, y_train)
r2 = r2_score(y_train, model.predict(X_train))
n, p = X_train.shape
r2_adj = 1 - (1 - r2) * (n - 1) / (n - p - 1)
Learn it properly Linear regression

Keep practising

All Machine Learning questions

Explore further

Skip to content