Projections & Idempotent Matrices
A projection matrix maps every vector onto a subspace and fixes vectors already in it, so applying it twice changes nothing: P squared = P. This idempotence underlies PCA's centering matrix.
What you'll learn
- A projection matrix is idempotent: P squared = P, and its eigenvalues lie in the set 0, 1
- An orthogonal projection is also symmetric (P = P transpose)
- For a projection onto a subspace U of R-n, rank(P) = dim(U) and nullity(P) = n minus dim(U)
- The centering matrix C = I minus (1/n) times 11 transpose is symmetric and idempotent — the engine behind PCA's mean-subtraction
Before you start
Shine a light straight down on a vector and look at the shadow it casts on the floor. That shadow is a projection — the vector squashed onto a lower-dim subspace. The neat thing about shadows is obvious once you say it out loud: the shadow of a shadow is the same shadow. A point already on the floor projects to itself. Apply the projection twice and nothing extra happens.
That “applying it twice changes nothing” idea is exactly the equation P² = P.
Matrices with this property get a name — idempotent — and from that one fact
the rank, the nullity, and the allowed eigenvalues all fall out. Which is why GATE
keeps reaching for it.
Projection onto a subspace
A projection P sends each vector to the closest point in some subspace U and
leaves vectors that already live in U untouched. Picture dropping a perpendicular
onto a line:
Three properties follow, and they are the whole exam syllabus here:
- Idempotent:
P² = P. (And thereforeP³ = P·P² = P·P = P, and so on.) - Eigenvalues in
{0, 1}: ifPv = λv, thenPv = P²v = λ²v, soλ² = λ, givingλ = 0orλ = 1. Vectors inUhave eigenvalue 1 (kept); vectors perpendicular toUhave eigenvalue 0 (killed). - Rank and nullity: the range of
PisU, sorank(P) = dim(U). By rank–nullity,nullity(P) = n − dim(U).
An orthogonal projection (the perpendicular kind, as drawn) has the extra
property that it is symmetric: P = Pᵀ. Not every idempotent matrix is symmetric
— oblique projections drop the shadow at a slant — but the orthogonal ones are.
Want to feel the squash directly? In the playground below, set the matrix to
[[1, 0], [0, 0]] — the orthogonal projection onto the x-axis. The whole unit
shape collapses down to a line segment, and any point already on that line is
left untouched. Apply the matrix again and the result is identical: that’s
P² = P in pictures.
The centering matrix — PCA’s engine
The single most important projection in data analysis subtracts the mean from a data
vector. With the all-ones vector 1, the centering matrix is
C = I − (1/n)·11ᵀ
Here 11ᵀ is the n × n matrix of all ones, so (1/n)·11ᵀ averages the entries,
and Cx = x − mean(x)·1 removes that average. C is symmetric and idempotent
— it is the orthogonal projection onto the subspace of mean-zero vectors. PCA begins
by applying C to every feature, which is why this matrix shows up in DA papers.
How GATE asks this
Two flavours. As an MSQ: a matrix M is described as a projection onto a
k-dimensional subspace, and you select the true statements — M² = M,
M³ = M, rank(M) = k, eigenvalues in {0, 1}. As a NAT: you are given the
subspace dimension and asked for the nullity (n − k) or the rank. GATE DA 2024
used a projection onto a 2-D subspace of R³; GATE DA 2026 returned to the theme via
the centering matrix and its C² = C property.
Worked example — the 2024 and 2026 questions
Let
Mbe the orthogonal projection ofR³onto a 2-dimensional subspaceU. EvaluateM²,M³,rank(M), andnullity(M).
Idempotence. A projection satisfies M² = M. Then M³ = M·M² = M·M = M as
well — every positive power of a projection equals itself.
Rank. The output of M fills exactly U, so rank(M) = dim(U) = 2.
Nullity. Rank–nullity in R³: nullity(M) = 3 − rank(M) = 3 − 2 = 1. The null
space is the 1-D direction perpendicular to U (the part the shadow discards).
Now the centering matrix part used in 2026. Show C² = C for C = I − (1/n)11ᵀ.
The key fact is 1ᵀ1 = n (summing n ones):
C² = (I − (1/n)11ᵀ)(I − (1/n)11ᵀ)
= I − (1/n)11ᵀ − (1/n)11ᵀ + (1/n²)·1(1ᵀ1)1ᵀ
= I − (2/n)11ᵀ + (1/n²)·1·(n)·1ᵀ [since 1ᵀ1 = n]
= I − (2/n)11ᵀ + (1/n)11ᵀ
= I − (1/n)11ᵀ = C ✓
The two −(1/n)11ᵀ terms and the recovered +(1/n)11ᵀ collapse back to a single
−(1/n)11ᵀ, proving C is idempotent. Concretely for n = 3,
(1/3)11ᵀ is the 3-by-3 matrix of all 1/3s; squaring it returns itself, so
C² = C numerically too.
Quick check
Quick check
Practice this in an interview
All questionsPCA finds the orthogonal directions of maximum variance in the data and projects onto a lower-dimensional subspace, reducing features while retaining most information. It is most useful before distance-based models or when training is bottlenecked by dimensionality. Its main limits are loss of interpretability, sensitivity to scale, and an assumption of linear structure.
OLS minimizes the sum of squared residuals. Setting the gradient of the loss to zero yields the normal equations, whose unique solution is the projection of y onto the column space of X. The closed-form is the hat matrix formula β = (XᵀX)⁻¹Xᵀy.
t-SNE and UMAP are nonlinear dimensionality reduction algorithms designed primarily for 2D/3D visualization of high-dimensional data. Unlike PCA, they preserve local neighborhood structure rather than global variance, producing cleaner cluster separations in plots. Neither should be used as a preprocessing step for training a supervised model because they are transductive and their output is not stable across runs.
A 1x1 convolution applies a learned linear combination across channels at each spatial position, without looking at any spatial neighbourhood. It is used to change the number of channels cheaply, add non-linearity between pointwise operations, and build the bottleneck blocks at the core of Inception and ResNet-50+.