PCA & Dimensionality Reduction
Rotate to the axes of maximum variance: PCA reads off the eigenvectors of the covariance matrix as new orthogonal components, keeping the top few to shrink dimensions.
What you'll learn
- PCA finds orthogonal directions (principal components) of maximum variance
- The recipe: centre the data, form the covariance matrix, take its eigenvectors and eigenvalues
- Variance explained by a component = its eigenvalue divided by the sum of all eigenvalues
- Principal components are mutually orthogonal; PCA is unsupervised (unlike LDA)
Before you start
A dataset with many features is hard to see and slow to model — yet often the data really lives along just a couple of directions. PCA (Principal Component Analysis) finds those directions. It rotates the axes so the first new axis points along the direction of maximum variance (spread), the next along the most variance left over, and so on. Keep the first few of these axes and you have fewer dimensions with almost all the information intact.
From covariance to components
PCA’s new axes are the principal components. The recipe to find them is short:
- Centre the data — subtract the mean of each feature (usually standardise too).
- Form the covariance matrix of the features.
- Take its eigenvectors (these are the principal components, the new axis directions) and eigenvalues (the variance along each component).
Two facts fall straight out of this and are exactly what GATE tests:
- The components are mutually orthogonal — the covariance matrix is symmetric, so its eigenvectors are perpendicular. PC1 ⟂ PC2 ⟂ PC3 …, every pair at 90°.
- The variance explained by a component is its eigenvalue as a fraction of the total:
Drag the points below: PC1 (the long axis) and PC2 (the short, perpendicular one) re-fit live, and the panel shows each component’s explained-variance share.
How GATE asks this
Two recurring shapes. An MCQ on the geometry: because components are orthonormal, the angle between any two of them is 90° — GATE DA 2026 asked precisely this. And a NAT on variance explained: given the eigenvalues, compute the fraction one component captures. The arithmetic is always eigenvalue over the sum.
Worked example — real GATE DA questions
(1) Orthogonality — a real 2026 question. Principal components are orthonormal, so any two distinct components are perpendicular. The angle between PC1 and PC10 is therefore 90° — no calculation needed, it follows from orthogonality alone.
(2) Variance explained. Suppose the covariance matrix has eigenvalues
[12, 3, 1]. The fraction of total variance captured by the first component is
its eigenvalue over the sum:
total variance = 12 + 3 + 1 = 16
fraction for PC1 = 12 / 16 = 0.75 → 75%
So PC1 alone explains 0.75 (75%) of the variance — keeping just that one component would retain three-quarters of the spread.
Quick check
Quick check
Practice this in an interview
All questionsPCA finds the orthogonal directions of maximum variance in the data and projects onto a lower-dimensional subspace, reducing features while retaining most information. It is most useful before distance-based models or when training is bottlenecked by dimensionality. Its main limits are loss of interpretability, sensitivity to scale, and an assumption of linear structure.
t-SNE and UMAP are nonlinear dimensionality reduction algorithms designed primarily for 2D/3D visualization of high-dimensional data. Unlike PCA, they preserve local neighborhood structure rather than global variance, producing cleaner cluster separations in plots. Neither should be used as a preprocessing step for training a supervised model because they are transductive and their output is not stable across runs.
As the number of features grows, the volume of the feature space increases exponentially, so training data becomes exponentially sparse. Distance-based algorithms degrade because points become approximately equidistant; density estimation requires data that grows exponentially; and overfitting risk rises for any fixed training set size.
In high-dimensional spaces all pairwise distances concentrate around the same value, so the concept of a 'nearest' neighbour breaks down — the k-th nearest neighbour is almost as far as every other point. KNN's accuracy degrades sharply as dimensionality increases unless the data has much lower intrinsic dimensionality.