Quadratic Forms & Definiteness
A quadratic form x transpose A x measures a matrix's curvature. Its sign is set by the eigenvalues, and its extreme values on the unit sphere are the largest and smallest eigenvalues — the heart of PCA.
What you'll learn
- A quadratic form is x transpose A x with A symmetric
- Positive definite means all eigenvalues greater than 0; positive semidefinite means all eigenvalues at least 0
- The max of x transpose A x over the unit sphere is the largest eigenvalue, and the min is the smallest eigenvalue
- A Gram matrix X transpose X is always positive semidefinite
Before you start
A symmetric matrix has a hidden landscape inside it. Plug a direction x into
the expression xᵀAx and out pops a single number — the “height” in that
direction. Sweep x around and a surface emerges: maybe a bowl, maybe an inverted
bowl, maybe a saddle. That expression is a quadratic form, and its shape is
set entirely by the eigenvalues of A. Reading the curvature off the eigenvalues
is what definiteness means, and it’s the engine behind PCA and behind every
“is this minimum really a minimum?” question in optimisation.
The quadratic form and its curvature
For a symmetric matrix A, the quadratic form is the scalar xᵀAx. In two
dimensions with A = [[a, b], [b, c]],
xᵀAx = a·x₁² + 2b·x₁x₂ + c·x₂²
a pure degree-2 expression. The geometry depends on the signs of the eigenvalues:
Definiteness classifies the form by eigenvalue sign (always assume A symmetric):
- Positive definite:
xᵀAx > 0for allx ≠ 0⇔ all eigenvalues> 0. - Positive semidefinite (PSD):
xᵀAx ≥ 0for allx⇔ all eigenvalues≥ 0. - Negative definite / semidefinite: flip the inequalities (all
< 0/ all≤ 0). - Indefinite: eigenvalues of both signs — a saddle.
A Gram matrix XᵀX is always PSD, because xᵀ(XᵀX)x = (Xx)ᵀ(Xx) = ‖Xx‖² ≥ 0
for every x. Covariance matrices are Gram-like, which is why they are PSD.
The optimisation fact behind PCA
Here is the result GATE tests most. Constrain x to the unit sphere (‖x‖ = 1)
and ask how big xᵀAx can get. Diagonalise A = QΛQᵀ and write x in the
orthonormal eigenbasis with coordinates y = Qᵀx (so ‖y‖ = ‖x‖ = 1). Then
xᵀAx = yᵀΛy = λ₁y₁² + λ₂y₂² + … + λₙyₙ², with y₁² + … + yₙ² = 1
This is a weighted average of the eigenvalues, with weights yᵢ² that sum to 1. A
weighted average is maximised by dumping all the weight on the largest eigenvalue
and minimised by dumping it on the smallest:
So max₍‖x‖=1₎ xᵀAx = λ_max (achieved at the top eigenvector) and the minimum is
λ_min. PCA is precisely this: the first principal direction maximises the variance
xᵀΣx on the unit sphere, so it is the top eigenvector of the covariance Σ.
PCA is the cleanest place to see this max-on-the-unit-sphere result in action.
Reshape the cloud below and watch how the top arrow — the direction of maximum
variance — always lands along the eigenvector of the covariance matrix with the
largest eigenvalue. The variance along that arrow is the maximum of xᵀΣx over
the unit circle.
How GATE asks this
Usually a NAT or MCQ: “Maximise xᵀAx subject to ‖x‖ = 1” — the answer is the
largest eigenvalue of A, full stop. Sometimes the matrix is given numerically
and you compute its eigenvalues first; sometimes the eigenvalues are handed to you.
The companion question asks for the minimum, which is the smallest eigenvalue.
GATE DA 2025 and 2026 both ran this pattern, sometimes phrasing it as the variance
maximised by PCA.
Worked example — the 2026 maximisation question
Let
Abe a symmetric matrix with eigenvalues5and2. Find the maximum and minimum ofxᵀAxsubject to‖x‖ = 1.
Set up the eigenbasis. Write x in A’s orthonormal eigenvectors, with
coordinates y₁, y₂ and y₁² + y₂² = 1. Then
xᵀAx = 5·y₁² + 2·y₂², with y₁² + y₂² = 1
Maximise. Substitute y₂² = 1 − y₁²:
xᵀAx = 5y₁² + 2(1 − y₁²) = 2 + 3y₁²
This grows with y₁², which is largest (= 1) when x is the eigenvector for
λ = 5. So the maximum is 5, achieved at the top eigenvector.
Minimise. The same expression 2 + 3y₁² is smallest when y₁² = 0, i.e. x is
the eigenvector for λ = 2. So the minimum is 2.
The constrained range of xᵀAx is exactly [2, 5] — the smallest to the largest
eigenvalue. No calculus and no matrix entries needed; only the eigenvalues matter.
Quick check
Quick check
Practice this in an interview
All questionsPCA finds the orthogonal directions of maximum variance in the data and projects onto a lower-dimensional subspace, reducing features while retaining most information. It is most useful before distance-based models or when training is bottlenecked by dimensionality. Its main limits are loss of interpretability, sensitivity to scale, and an assumption of linear structure.
OLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.
Skewness measures the asymmetry of a distribution's tails — positive skew means a longer right tail, negative skew a longer left tail. Kurtosis measures the heaviness of the tails relative to a normal distribution; excess kurtosis above zero indicates more probability mass in the tails and peak than a Gaussian, which matters for risk and outlier frequency.
The ACF measures correlation between a series and its own lags including indirect effects; the PACF strips out those indirect effects to show direct correlation at each lag. A cut-off in the PACF after lag p signals an AR(p) process; a cut-off in the ACF after lag q signals an MA(q) process.