datarekha
Machine Learning Medium Asked at GoogleAsked at Jane StreetAsked at Two Sigma

How does Ordinary Least Squares derive the coefficient vector, and what is the closed-form solution?

The short answer

OLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.

How to think about it

OLS solves: minimize over β the loss L = ||y - Xβ||².

Derivation in four steps:

  1. Expand: L = (y - Xβ)ᵀ(y - Xβ) = yᵀy - 2βᵀXᵀy + βᵀXᵀXβ
  2. Differentiate with respect to β and set to zero: ∂L/∂β = -2Xᵀy + 2XᵀXβ = 0
  3. Rearrange to the normal equations: XᵀXβ = Xᵀy
  4. Solve (when XᵀX is invertible): β = (XᵀX)⁻¹Xᵀy

The matrix H = X(XᵀX)⁻¹Xᵀ is the hat matrix — it projects y onto the column space of X. Fitted values are ŷ = Hy.

When to use the normal equation vs gradient descent:

Normal EquationGradient Descent
ComplexityO(p³ + np²)O(np) per step
n, p regimesmall p (≤ ~10k)large p or sparse
Requires tuningNoYes (learning rate)
import numpy as np

# Normal equation — exact solution
beta = np.linalg.lstsq(X, y, rcond=None)[0]  # numerically stable via SVD

np.linalg.lstsq uses SVD internally rather than explicitly inverting XᵀX, which is numerically safer.

Learn it properly Linear regression

Keep practising

All Machine Learning questions

Explore further

Skip to content