Machine Learning Medium Asked at GoogleAsked at Jane StreetAsked at Two Sigma

How does Ordinary Least Squares derive the coefficient vector, and what is the closed-form solution?

For Data Scientist ML Engineer AI / LLM Engineer

The short answer

OLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.

How to think about it

OLS solves: minimize over β the loss L = ||y - Xβ||².

Derivation in four steps:

Expand: L = (y - Xβ)ᵀ(y - Xβ) = yᵀy - 2βᵀXᵀy + βᵀXᵀXβ
Differentiate with respect to β and set to zero: ∂L/∂β = -2Xᵀy + 2XᵀXβ = 0
Rearrange to the normal equations: XᵀXβ = Xᵀy
Solve (when XᵀX is invertible): β = (XᵀX)⁻¹Xᵀy

The matrix H = X(XᵀX)⁻¹Xᵀ is the hat matrix — it projects y onto the column space of X. Fitted values are ŷ = Hy.

When to use the normal equation vs gradient descent:

	Normal Equation	Gradient Descent
Complexity	`O(p³ + np²)`	`O(np)` per step
n, p regime	small p (≤ ~10k)	large p or sparse
Requires tuning	No	Yes (learning rate)

import numpy as np

# Normal equation — exact solution
beta = np.linalg.lstsq(X, y, rcond=None)[0]  # numerically stable via SVD

np.linalg.lstsq uses SVD internally rather than explicitly inverting XᵀX, which is numerically safer.

Learn it properly Linear regression

How does Ordinary Least Squares derive the coefficient vector, and what is the closed-form solution?

Keep practising

Explore further