Machine Learning Medium Asked at GoogleAsked at AmazonAsked at Apple

When should you use gradient descent over the normal equation to fit a linear regression?

For Data Scientist ML Engineer AI / LLM Engineer

The short answer

The normal equation gives an exact closed-form solution in O(p³) time but becomes impractical when the number of features p is large (typically above ~10,000) because matrix inversion is cubic. Gradient descent scales as O(np) per iteration, making it the only viable option for large feature spaces or online learning.

How to think about it

Normal equation: β = (XᵀX)⁻¹Xᵀy

Exact solution, no learning rate, no iterations. The cost is computing (XᵀX)⁻¹, which is O(p³). For p = 1,000 that is trivial; for p = 100,000 it is already 10¹⁵ operations — infeasible.

Gradient descent: iteratively updates β ← β - α * (2/n) Xᵀ(Xβ - y)

Each step costs O(np). Batch gradient descent sees all n samples per step; stochastic (SGD) processes one sample; mini-batch is the practical default.

Head-to-head comparison:

Criterion	Normal Equation	Gradient Descent
Time complexity	O(p³ + np²)	O(np) per step
Convergence	Exact, single pass	Iterative, needs tuning
Hyperparameters	None	Learning rate, epochs
Works with regularization	Yes (Ridge adds λI)	Yes
Online / streaming	No	Yes (SGD)
Large p (features)	Impractical	Preferred
Large n (samples)	Bottleneck in np²	Efficient (mini-batch)

Regularized normal equation (Ridge):

β = (XᵀX + λI)⁻¹Xᵀy

Adding λI also regularizes the inversion — the matrix is always positive definite, so no rank-deficiency issues even under multicollinearity.

from sklearn.linear_model import LinearRegression, SGDRegressor

# Normal equation path (via LAPACK, used for small p)
lr = LinearRegression().fit(X_train, y_train)

# Gradient descent (explicit, scales to large p)
sgd = SGDRegressor(max_iter=1000, learning_rate="adaptive", eta0=0.01)
sgd.fit(X_train, y_train)

Learn it properly Gradient descent

When should you use gradient descent over the normal equation to fit a linear regression?

Keep practising

Explore further