datarekha

Projections & Idempotent Matrices

A projection matrix maps every vector onto a subspace and fixes vectors already in it, so applying it twice changes nothing: P squared = P. This idempotence underlies PCA's centering matrix.

9 min read Advanced GATE DA Lesson 28 of 122

What you'll learn

  • A projection matrix is idempotent: P squared = P, and its eigenvalues lie in the set 0, 1
  • An orthogonal projection is also symmetric (P = P transpose)
  • For a projection onto a subspace U of R-n, rank(P) = dim(U) and nullity(P) = n minus dim(U)
  • The centering matrix C = I minus (1/n) times 11 transpose is symmetric and idempotent — the engine behind PCA's mean-subtraction

Before you start

Shine a light straight down on a vector and look at the shadow it casts on the floor. That shadow is a projection — the vector squashed onto a lower-dim subspace. The neat thing about shadows is obvious once you say it out loud: the shadow of a shadow is the same shadow. A point already on the floor projects to itself. Apply the projection twice and nothing extra happens.

That “applying it twice changes nothing” idea is exactly the equation P² = P. Matrices with this property get a name — idempotent — and from that one fact the rank, the nullity, and the allowed eigenvalues all fall out. Which is why GATE keeps reaching for it.

Projection onto a subspace

A projection P sends each vector to the closest point in some subspace U and leaves vectors that already live in U untouched. Picture dropping a perpendicular onto a line:

subspace UxPxoriginPx is the foot of the perpendicular from x onto U
The dashed perpendicular is the part removed; Px is what survives on the subspace U.

Three properties follow, and they are the whole exam syllabus here:

  • Idempotent: P² = P. (And therefore P³ = P·P² = P·P = P, and so on.)
  • Eigenvalues in {0, 1}: if Pv = λv, then Pv = P²v = λ²v, so λ² = λ, giving λ = 0 or λ = 1. Vectors in U have eigenvalue 1 (kept); vectors perpendicular to U have eigenvalue 0 (killed).
  • Rank and nullity: the range of P is U, so rank(P) = dim(U). By rank–nullity, nullity(P) = n − dim(U).

An orthogonal projection (the perpendicular kind, as drawn) has the extra property that it is symmetric: P = Pᵀ. Not every idempotent matrix is symmetric — oblique projections drop the shadow at a slant — but the orthogonal ones are.

Want to feel the squash directly? In the playground below, set the matrix to [[1, 0], [0, 0]] — the orthogonal projection onto the x-axis. The whole unit shape collapses down to a line segment, and any point already on that line is left untouched. Apply the matrix again and the result is identical: that’s P² = P in pictures.

The centering matrix — PCA’s engine

The single most important projection in data analysis subtracts the mean from a data vector. With the all-ones vector 1, the centering matrix is

C = I − (1/n)·11ᵀ

Here 11ᵀ is the n × n matrix of all ones, so (1/n)·11ᵀ averages the entries, and Cx = x − mean(x)·1 removes that average. C is symmetric and idempotent — it is the orthogonal projection onto the subspace of mean-zero vectors. PCA begins by applying C to every feature, which is why this matrix shows up in DA papers.

How GATE asks this

Two flavours. As an MSQ: a matrix M is described as a projection onto a k-dimensional subspace, and you select the true statements — M² = M, M³ = M, rank(M) = k, eigenvalues in {0, 1}. As a NAT: you are given the subspace dimension and asked for the nullity (n − k) or the rank. GATE DA 2024 used a projection onto a 2-D subspace of ; GATE DA 2026 returned to the theme via the centering matrix and its C² = C property.

Worked example — the 2024 and 2026 questions

Let M be the orthogonal projection of onto a 2-dimensional subspace U. Evaluate , , rank(M), and nullity(M).

Idempotence. A projection satisfies M² = M. Then M³ = M·M² = M·M = M as well — every positive power of a projection equals itself.

Rank. The output of M fills exactly U, so rank(M) = dim(U) = 2.

Nullity. Rank–nullity in : nullity(M) = 3 − rank(M) = 3 − 2 = 1. The null space is the 1-D direction perpendicular to U (the part the shadow discards).

Now the centering matrix part used in 2026. Show C² = C for C = I − (1/n)11ᵀ. The key fact is 1ᵀ1 = n (summing n ones):

C² = (I − (1/n)11ᵀ)(I − (1/n)11ᵀ)
   = I − (1/n)11ᵀ − (1/n)11ᵀ + (1/n²)·1(1ᵀ1)1ᵀ
   = I − (2/n)11ᵀ + (1/n²)·1·(n)·1ᵀ          [since 1ᵀ1 = n]
   = I − (2/n)11ᵀ + (1/n)11ᵀ
   = I − (1/n)11ᵀ  =  C   ✓

The two −(1/n)11ᵀ terms and the recovered +(1/n)11ᵀ collapse back to a single −(1/n)11ᵀ, proving C is idempotent. Concretely for n = 3, (1/3)11ᵀ is the 3-by-3 matrix of all 1/3s; squaring it returns itself, so C² = C numerically too.

Quick check

Quick check

0/6
Q1M is a projection of R⁵ onto a 3-dimensional subspace. What is the nullity of M?numerical answer — type a number
Q2A matrix P satisfies P² = P. Which statements must be true? (select all that apply)select all that apply
Q3For the centering matrix C = I − (1/n)11ᵀ on n = 6 points, what is rank(C)?numerical answer — type a number
Q4Which properties does the centering matrix C = I − (1/n)11ᵀ have? (select all that apply)select all that apply
Q5An idempotent matrix P (P² = P) is also invertible. What can you conclude about P?
Q6M projects R⁴ onto a 2-D subspace U. Which are correct? (select all that apply)select all that apply

Practice this in an interview

All questions
What is PCA, when should you use it, and what are its key limitations?

PCA finds the orthogonal directions of maximum variance in the data and projects onto a lower-dimensional subspace, reducing features while retaining most information. It is most useful before distance-based models or when training is bottlenecked by dimensionality. Its main limits are loss of interpretability, sensitivity to scale, and an assumption of linear structure.

How does Ordinary Least Squares derive the coefficient vector, and what is the closed-form solution?

OLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.

What are t-SNE and UMAP, how do they differ from PCA, and what are their limitations for ML workflows?

t-SNE and UMAP are nonlinear dimensionality reduction algorithms designed primarily for 2D/3D visualization of high-dimensional data. Unlike PCA, they preserve local neighborhood structure rather than global variance, producing cleaner cluster separations in plots. Neither should be used as a preprocessing step for training a supervised model because they are transductive and their output is not stable across runs.

What is a 1x1 convolution and why is it useful?

A 1x1 convolution applies a learned linear combination across channels at each spatial position, without looking at any spatial neighbourhood. It is used to change the number of channels cheaply, add non-linearity between pointwise operations, and build the bottleneck blocks at the core of Inception and ResNet-50+.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content