Why does R-squared always increase when you add features, and when should you use adjusted R-squared instead?
R-squared measures the proportion of variance explained by the model and can only increase or stay the same as features are added, even if those features are pure noise. Adjusted R-squared penalizes for the number of predictors, making it the right metric for comparing models with different numbers of features.
How to think about it
R-squared (coefficient of determination):
R² = 1 - SS_res / SS_tot
where SS_res = Σ(yᵢ - ŷᵢ)² and SS_tot = Σ(yᵢ - ȳ)².
Adding any predictor — even a random noise column — can only reduce SS_res or keep it the same, because OLS will simply assign it a tiny but nonzero coefficient. Therefore R² is monotonically non-decreasing in the number of features.
Adjusted R-squared corrects for this:
R²_adj = 1 - (1 - R²) * (n - 1) / (n - p - 1)
where n is the sample size and p is the number of predictors. Adding a predictor that contributes less than its “fair share” of variance explanation will decrease R²_adj.
Practical decision guide:
- Single model evaluation:
R²is fine for a quick summary. - Model selection / feature addition: always compare
R²_adj, AIC, or BIC. - Out-of-sample performance: prefer cross-validated RMSE over either — both
R²variants are in-sample metrics.
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import numpy as np
model = LinearRegression().fit(X_train, y_train)
r2 = r2_score(y_train, model.predict(X_train))
n, p = X_train.shape
r2_adj = 1 - (1 - r2) * (n - 1) / (n - p - 1)