What is PCA, when should you use it, and what are its key limitations?

PCA finds the orthogonal directions of maximum variance in the data and projects onto a lower-dimensional subspace, reducing features while retaining most information. It is most useful before distance-based models or when training is bottlenecked by dimensionality. Its main limits are loss of interpretability, sensitivity to scale, and an assumption of linear structure.

How does PCA work, and how do you choose the number of components?

PCA finds orthogonal directions (principal components) of maximum variance by computing the eigenvectors of the covariance matrix, then projects data onto the top components. Choose the number of components by the cumulative explained variance ratio (e.g. enough to retain 95%), a scree-plot elbow, or downstream task performance. Always standardize features first, since PCA is variance-driven.

What are the assumptions and limitations of PCA, and when would it hurt your model?

PCA assumes linear relationships, that variance equals importance, and that components should be orthogonal. It can hurt when the predictive signal lives in low-variance directions, when relationships are nonlinear, or when interpretability matters, since components mix original features. It's also sensitive to scaling and outliers and is unsupervised, so it ignores the target.

How does Ordinary Least Squares derive the coefficient vector, and what is the closed-form solution?

OLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.

Projections & Idempotent Matrices — GATE DA

Projections & Idempotent Matrices

A projection casts a shadow — it squashes every vector onto a subspace and leaves vectors already there alone, so the shadow of a shadow is the same shadow: P² = P. From that one fact the eigenvalues, rank, and nullity all fall out, and the centering matrix behind PCA is exactly this.

9 min read Advanced GATE DA Lesson 28 of 122

What you'll learn

A projection matrix is idempotent: P² = P, with eigenvalues in {0, 1}

An orthogonal projection is also symmetric (P = Pᵀ)

For a projection onto a subspace U of Rⁿ: rank(P) = dim(U), nullity(P) = n − dim(U)

The centering matrix C = I − (1/n)·11ᵀ is symmetric and idempotent — PCA's mean-subtraction

The last lesson’s matrix preserved everything, turning space rigidly. Here is its opposite — a matrix that throws information away on purpose. Shine a light straight down on a vector and look at the shadow it casts on the floor. That shadow is a projection — the vector squashed onto a lower-dimensional subspace. And shadows have a quirk that is obvious once you say it aloud: the shadow of a shadow is the same shadow. A point already on the floor projects to itself, so applying the projection a second time changes nothing.

That “twice changes nothing” is exactly the equation P² = P. Matrices with this property are called idempotent, and from that single fact the rank, the nullity, and the allowed eigenvalues all fall out — which is why GATE keeps reaching for it.

Projection onto a subspace

A projection P sends each vector to the closest point in some subspace U, and leaves vectors already living in U untouched. Picture dropping a perpendicular onto a line:

The dashed perpendicular is the part removed; Px is what survives on the subspace U.

Three properties follow, and they are the whole exam syllabus here:

Idempotent: P² = P (so P³ = P·P² = P, and every higher power too).
Eigenvalues in {0, 1}: if Pv = λv then Pv = P²v = λ²v, so λ² = λ, giving λ = 0 or 1. Vectors in U keep their length (eigenvalue 1); vectors perpendicular to U are killed (eigenvalue 0).
Rank and nullity: the output of P fills exactly U, so rank(P) = dim(U), and rank-nullity gives nullity(P) = n − dim(U).

An orthogonal projection (the perpendicular kind, as drawn) carries one extra property: it is symmetric, P = Pᵀ. Not every idempotent matrix is symmetric — oblique projections drop the shadow at a slant — but the perpendicular ones are. Set the playground matrix to [[1, 0], [0, 0]] (projection onto the x-axis): the unit shape collapses to a line segment, points already on the line stay put, and applying it again gives the identical result — P² = P in pictures.

TryLinear maps · drag î and ĵ

A matrix is a function on space — its columns are where î and ĵ land

col 1 = îcol 2 = ĵ

Determinant (signed area)1.25areas scale by 1.25×

Drag the tip of î and ĵ — they are the matrix's two columns. Everything else follows linearly, so the whole grid warps with them. The shaded square's area is |det|; flip a column past the other and orientation reverses (det goes negative).

The centering matrix — PCA’s engine

The single most important projection in data analysis subtracts the mean from a data vector. With the all-ones vector 1, the centering matrix is

C = I − (1/n)·11ᵀ

Here 11ᵀ is the n × n all-ones matrix, so (1/n)·11ᵀ averages the entries and Cx = x − mean(x)·1 removes that average. C is symmetric and idempotent — the orthogonal projection onto the mean-zero subspace — and PCA begins by applying C to every feature, which is why this matrix turns up in DA papers.

A worked example — the 2024 and 2026 questions

Let M be the orthogonal projection of R³ onto a 2-dimensional subspace U. Evaluate M², M³, rank(M), and nullity(M).

A projection satisfies M² = M, so M³ = M·M² = M as well — every positive power equals itself. Its output fills exactly U, so rank(M) = dim(U) = 2; and rank-nullity in R³ gives nullity(M) = 3 − 2 = 1 — the one direction perpendicular to U, the part the shadow discards.

Now the 2026 centering-matrix part. Show C² = C for C = I − (1/n)11ᵀ, using the one key fact 1ᵀ1 = n (summing n ones):

C² = (I − (1/n)11ᵀ)(I − (1/n)11ᵀ)
   = I − (1/n)11ᵀ − (1/n)11ᵀ + (1/n²)·1(1ᵀ1)1ᵀ
   = I − (2/n)11ᵀ + (1/n²)·1·(n)·1ᵀ        [1ᵀ1 = n]
   = I − (2/n)11ᵀ + (1/n)11ᵀ
   = I − (1/n)11ᵀ = C   ✓

The two −(1/n)11ᵀ terms and the recovered +(1/n)11ᵀ collapse back to one, proving C is idempotent.

A question to carry forward

Symmetric matrices keep appearing — orthogonal projections are symmetric, covariance matrices are symmetric. They clearly have a special structure. Here is the thread onward: feed a vector into the expression xᵀA x and you get a single number, a kind of warped “size” of x. For a symmetric A, when is that number always positive, and why would anyone care?

In one breath

A projection casts a shadow onto a subspace U; doing it twice changes nothing → idempotent P² = P (so Pᵏ = P).
Eigenvalues ∈ 1 (λ²=λ): 1 for vectors kept (in U), 0 for vectors killed (perpendicular).
rank(P) = dim(U), nullity(P) = n − dim(U) (rank-nullity).
Orthogonal projection ⇒ symmetric (P=Pᵀ); oblique ones are idempotent but not symmetric. Only the identity is an invertible projection.
Centering matrix C = I − (1/n)11ᵀ: symmetric, idempotent (1ᵀ1=n makes C²=C), subtracts the mean — PCA’s first step. rank = n−1.

Practice

Quick check

0/6

Q1Recall: a matrix P with P² = P is called…

Q2Trace: M is a projection of R⁵ onto a 3-dimensional subspace. What is the nullity of M?numerical answer — type a number

Q3Trace: for the centering matrix C = I − (1/n)11ᵀ on n = 6 points, what is rank(C)?numerical answer — type a number

Q4Apply: a matrix P satisfies P² = P. Which statements MUST be true? (select all that apply)select all that apply

Q5Apply: an idempotent matrix P (P² = P) is also invertible. What is P?

Q6Create: M projects R⁴ onto a 2-D subspace U. State M³, rank(M), nullity(M), and the full list of eigenvalues, with reasoning.

Projections & Idempotent Matrices

What you'll learn

Before you start

Projection onto a subspace

A matrix is a function on space — its columns are where î and ĵ land

The centering matrix — PCA’s engine

A worked example — the 2024 and 2026 questions

A question to carry forward

In one breath

Practice

Quick check

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further