How does PCA work, and how do you choose the number of components?

PCA finds orthogonal directions (principal components) of maximum variance by computing the eigenvectors of the covariance matrix, then projects data onto the top components. Choose the number of components by the cumulative explained variance ratio (e.g. enough to retain 95%), a scree-plot elbow, or downstream task performance. Always standardize features first, since PCA is variance-driven.

What is PCA, when should you use it, and what are its key limitations?

PCA finds the orthogonal directions of maximum variance in the data and projects onto a lower-dimensional subspace, reducing features while retaining most information. It is most useful before distance-based models or when training is bottlenecked by dimensionality. Its main limits are loss of interpretability, sensitivity to scale, and an assumption of linear structure.

What are the assumptions and limitations of PCA, and when would it hurt your model?

PCA assumes linear relationships, that variance equals importance, and that components should be orthogonal. It can hurt when the predictive signal lives in low-variance directions, when relationships are nonlinear, or when interpretability matters, since components mix original features. It's also sensitive to scaling and outliers and is unsupervised, so it ignores the target.

What's the difference between feature selection and dimensionality reduction like PCA?

Feature selection keeps a subset of the original features and discards the rest, so the surviving features stay interpretable. Dimensionality reduction like PCA creates new features that are combinations of the originals, compressing information but losing direct interpretability. Choose feature selection when you need to explain which inputs matter, and PCA when you mainly need a compact representation and don't need named features.

PCA & Dimensionality Reduction — GATE DA

What you'll learn

PCA finds orthogonal directions (principal components) of maximum variance

The recipe: centre the data, form the covariance matrix, take its eigenvectors and eigenvalues

Variance explained by a component = its eigenvalue divided by the sum of all eigenvalues

Principal components are mutually orthogonal; PCA is unsupervised (unlike LDA)

Last lesson left clustering stranded in high dimensions, where distances blur and no plot can show you the data — and pointed at the cure: stop shrinking the rows and start shrinking the columns. A table with fifty features is hard to see and slow to model, yet the data often really lives along just two or three directions, the rest being noise or redundancy. PCA (Principal Component Analysis) finds those directions. And it is, at last, the very method LDA was forever measured against.

PCA rotates the axes so the first new axis points along the direction of maximum variance — the way the cloud is most spread out — the next along the most variance left over, perpendicular to the first, and so on. Keep the first few of these axes and you have fewer dimensions carrying almost all the information. Where LDA used the labels to pull classes apart, PCA ignores labels entirely and chases raw spread; that single difference is the whole contrast the earlier lesson promised.

From covariance to components

PCA’s new axes are the principal components, and the recipe to find them is short:

PCA = eigendecomposition of the covariance matrix. Eigenvectors are the directions (components); eigenvalues are the variance captured along each.

Centre the data — subtract the mean of each feature (usually standardise too).
Form the covariance matrix of the features.
Take its eigenvectors (these are the principal components, the new axis directions) and eigenvalues (the variance along each component).

Two facts fall straight out of this, and they are exactly what GATE tests:

The components are mutually orthogonal — the covariance matrix is symmetric, so its eigenvectors are perpendicular. PC1 ⟂ PC2 ⟂ PC3 …, every pair at 90°.
The variance explained by a component is its eigenvalue as a fraction of the total:

Bigger eigenvalue = more spread captured. Keep the top-k components to retain most of the total variance with fewer dimensions.

Drag the points below: PC1 (the long axis) and PC2 (the short, perpendicular one) re-fit live, and the panel shows each component’s explained-variance share.

TryPCA axes

Drag points — watch the principal axes re-fit live

Variance explained

PC191.7%

PC28.3%

mean(0.4, 0.3)

PC1 (solid) points along maximum variance. PC2 (dashed) is perpendicular. Toggling "project onto PC1" collapses the cloud to 1D — that's the 2D-to-1D reduction in action.

How GATE asks this

Two recurring shapes. An MCQ on the geometry: because components are orthonormal, the angle between any two of them is 90° — GATE DA 2026 asked precisely this. And a NAT on variance explained: given the eigenvalues, compute the fraction one component captures. The arithmetic is always eigenvalue over the sum.

Worked example — real GATE DA questions

(1) Orthogonality — a real GATE DA 2026 question. Principal components are orthonormal, so any two distinct components are perpendicular. The angle between PC1 and PC10 is therefore 90° — no calculation needed; it follows from orthogonality alone.

(2) Variance explained. Suppose the covariance matrix has eigenvalues [12, 3, 1]. The fraction of total variance captured by the first component is its eigenvalue over the sum:

total variance = 12 + 3 + 1 = 16

fraction for PC1 = 12 / 16 = 0.75   →   75%

So PC1 alone explains 0.75 (75%) of the variance — keeping just that one component retains three-quarters of the spread, exactly as the lopsided eigenvalues suggested.

In one breath

PCA shrinks dimensions by rotating to new orthogonal axes of greatest variance: centre the data, form the covariance matrix, and take its eigenvectors (the principal components, every pair at 90° because the matrix is symmetric) and eigenvalues (the variance along each), so the variance explained by a component is λᵢ / Σλ and keeping the top-k components retains most of the spread in fewer dimensions — and it is unsupervised, using only feature variance and ignoring the labels that its supervised rival LDA exploits.

Practice

Quick check

0/6

Q1Recall — Which statements about PCA are TRUE? (select all that apply)select all that apply

Q2Recall — How does PCA differ from LDA? (select all that apply)select all that apply

Q3Recall — Why must data be centred (mean-subtracted), and usually standardised, before PCA?

Q4Trace — The covariance matrix of a dataset has eigenvalues [12, 3, 1]. What fraction of the total variance is explained by the first principal component? (2 decimals)numerical answer — type a number

Q5Trace — Principal components are orthonormal. What is the angle (in degrees) between PC1 and PC10? (the 2026 PYQ)numerical answer — type a number

Q6Apply — Eigenvalues of the covariance matrix are [5, 3, 2]. What fraction of variance do the TOP TWO components together explain? (2 decimals)numerical answer — type a number

A question to carry forward

That closes the machine-learning chapter — and it is worth seeing what every single method in it shared. Regression, the classifiers, the neural net, clustering, PCA: each one learned from data. Hand it enough examples and it generalised, fitting weights or finding structure that the data itself revealed. The data was always the teacher.

But picture a problem with no data to learn from at all. Solve this maze. Win this game of chess. Find the cheapest flights from Delhi to Lisbon. There is no dataset of “solved mazes” to train on; there are only rules and a goal, and the answer must be reasoned out by exploring possibilities, not generalised from examples. This is the other great branch of artificial intelligence, and the next chapter opens it. Here is the thread onward: how do you turn a maze, a puzzle, a route map, or a game into something a computer can explore systematically — what are the handful of pieces every such problem reduces to, and what does the space of all possibilities look like once you lay it out?

PCA & Dimensionality Reduction

What you'll learn

Before you start

From covariance to components

Drag points — watch the principal axes re-fit live

How GATE asks this

Worked example — real GATE DA questions

In one breath

Practice

Quick check

A question to carry forward

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further