What is PCA, when should you use it, and what are its key limitations?

PCA finds the orthogonal directions of maximum variance in the data and projects onto a lower-dimensional subspace, reducing features while retaining most information. It is most useful before distance-based models or when training is bottlenecked by dimensionality. Its main limits are loss of interpretability, sensitivity to scale, and an assumption of linear structure.

What are the assumptions and limitations of PCA, and when would it hurt your model?

PCA assumes linear relationships, that variance equals importance, and that components should be orthogonal. It can hurt when the predictive signal lives in low-variance directions, when relationships are nonlinear, or when interpretability matters, since components mix original features. It's also sensitive to scaling and outliers and is unsupervised, so it ignores the target.

How does Ordinary Least Squares derive the coefficient vector, and what is the closed-form solution?

OLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.

How does PCA work, and how do you choose the number of components?

PCA finds orthogonal directions (principal components) of maximum variance by computing the eigenvectors of the covariance matrix, then projects data onto the top components. Choose the number of components by the cumulative explained variance ratio (e.g. enough to retain 95%), a scree-plot elbow, or downstream task performance. Always standardize features first, since PCA is variance-driven.

Quadratic Forms & Definiteness — GATE DA

Quadratic Forms & Definiteness

Feed a direction into xᵀAx and one number pops out — a warped 'size'. Sweep the direction around and a surface appears: a bowl, a saddle, an upside-down bowl. The eigenvalue signs set the shape, and the extreme values on the unit sphere are the largest and smallest eigenvalues — the heart of PCA.

8 min read Advanced GATE DA Lesson 29 of 122

The last lesson kept running into symmetric matrices and promised a payoff: feed a direction x into the expression xᵀAx and one number pops out — a warped “size” of x in that direction. A symmetric matrix hides a whole landscape this way. Plug in a direction, read off a height; sweep the direction around and a surface emerges — maybe a bowl, maybe an upside-down bowl, maybe a saddle. That expression is a quadratic form, its shape set entirely by the eigenvalues of A, and reading the curvature off them is what definiteness means — the engine behind PCA and behind every “is this minimum really a minimum?” in optimisation.

The form and its curvature

For a symmetric A, the quadratic form is the scalar xᵀAx. With A = [[a, b], [b, c]] in two dimensions it expands to a pure degree-2 expression:

xᵀAx = a·x₁² + 2b·x₁x₂ + c·x₂²

and its geometry is fixed by the signs of the eigenvalues:

Eigenvalue signs set the shape: all positive gives a bowl, mixed signs give a saddle.

Definiteness names the form by those signs (always with A symmetric):

Positive definite: xᵀAx > 0 for all x ≠ 0 ⇔ all eigenvalues > 0 (a bowl).
Positive semidefinite (PSD): xᵀAx ≥ 0 for all x ⇔ all eigenvalues ≥ 0.
Negative definite / semidefinite: flip the inequalities (all < 0 / ≤ 0).
Indefinite: eigenvalues of both signs — a saddle.

A Gram matrix XᵀX is always PSD, because xᵀ(XᵀX)x = (Xx)ᵀ(Xx) = ‖Xx‖² ≥ 0 for every x. Covariance matrices are Gram-like, which is exactly why they come out PSD.

The optimisation fact behind PCA

Here is the result GATE tests most. Constrain x to the unit sphere (‖x‖ = 1) and ask how large xᵀAx can grow. Diagonalise A = QΛQᵀ and write x in the orthonormal eigenbasis with coordinates y = Qᵀx (so ‖y‖ = 1):

xᵀAx = λ₁y₁² + λ₂y₂² + … + λₙyₙ²,   with  y₁² + … + yₙ² = 1

That is a weighted average of the eigenvalues, weights yᵢ² summing to 1. A weighted average is biggest when all the weight piles onto the largest eigenvalue, smallest when it piles onto the smallest:

The constrained range of the quadratic form is exactly the eigenvalue interval.

So max₍‖x‖=1₎ xᵀAx = λ_max (at the top eigenvector) and the minimum is λ_min. PCA is exactly this: the first principal direction maximises the variance xᵀΣx on the unit sphere, so it is the top eigenvector of the covariance Σ. Reshape the cloud below — the longest arrow, the direction of maximum spread, always lands along the top eigenvector, and the variance along it is the maximum of xᵀΣx.

TryPCA axes

Drag points — watch the principal axes re-fit live

Variance explained

PC191.7%

PC28.3%

mean(0.4, 0.3)

PC1 (solid) points along maximum variance. PC2 (dashed) is perpendicular. Toggling "project onto PC1" collapses the cloud to 1D — that's the 2D-to-1D reduction in action.

A worked example — the 2026 maximisation question

Let A be symmetric with eigenvalues 5 and 2. Find the maximum and minimum of xᵀAx subject to ‖x‖ = 1.

Write x in A’s orthonormal eigenvectors, coordinates y₁, y₂ with y₁² + y₂² = 1, then substitute y₂² = 1 − y₁²:

xᵀAx = 5·y₁² + 2·y₂² = 5y₁² + 2(1 − y₁²) = 2 + 3y₁²

This grows with y₁², largest (= 1) at the eigenvector for λ = 5 → maximum 5; smallest (y₁² = 0) at the eigenvector for λ = 2 → minimum 2. The constrained range is exactly [2, 5], smallest to largest eigenvalue — no calculus, no matrix entries.

A question to carry forward

To prove this we leaned on writing A as a product, A = QΛQᵀ — a decomposition that laid its action bare. Factoring a matrix into a useful product is a recurring trick. Here is the thread onward: the row-elimination you ran back in lesson four can itself be packaged as a product of two triangular matrices, so that the expensive part is done once and reused. What is that factorisation, and why would you ever want it?

In one breath

A quadratic form xᵀAx (symmetric A) is a warped “size” — sweep x and you trace a bowl, saddle, or dome set by eigenvalue signs.
Definiteness: positive definite ⇔ all λ > 0; PSD ⇔ all λ ≥ 0; indefinite ⇔ mixed signs (saddle).
Gram matrix XᵀX is always PSD (xᵀXᵀXx = ‖Xx‖² ≥ 0) — why covariance matrices are PSD.
On ‖x‖ = 1: max xᵀAx = λ_max, min = λ_min (at the matching eigenvectors). PCA’s top direction = top eigenvector of Σ.
Trap: it’s the eigenvalues, not the entries — [[2,−1],[−1,2]] is positive definite despite a negative entry.

Practice

Quick check

0/6

Q1Recall: a symmetric matrix A is positive definite exactly when…

Q2Trace: a symmetric A has eigenvalues 7 and 3. What is the maximum of xᵀAx on ‖x‖ = 1?numerical answer — type a number

Q3Trace: same matrix (eigenvalues 7 and 3). What is the minimum of xᵀAx on ‖x‖ = 1?numerical answer — type a number

Q4Apply: a symmetric matrix has eigenvalues 4, 0, and −2. How is its quadratic form classified?

Q5Apply: which statements about a Gram matrix G = XᵀX are true? (select all that apply)select all that apply

Q6Create: A = [[3, 0], [0, 6]]. State the max of xᵀAx on ‖x‖ = 1, the unit vector achieving it, and the definiteness — with reasoning.

Quadratic Forms & Definiteness

What you'll learn

Before you start

The form and its curvature

The optimisation fact behind PCA

Drag points — watch the principal axes re-fit live

A worked example — the 2026 maximisation question

A question to carry forward

In one breath

Practice

Quick check

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further