How does PCA work, and how do you choose the number of components?
PCA finds orthogonal directions (principal components) of maximum variance by computing the eigenvectors of the covariance matrix, then projects data onto the top components. Choose the number of components by the cumulative explained variance ratio (e.g. enough to retain 95%), a scree-plot elbow, or downstream task performance. Always standardize features first, since PCA is variance-driven.
How to think about it
The crisp answer
PCA is a linear dimensionality-reduction method that finds new orthogonal axes — principal components — along which the data varies most, then keeps only the top few. Each component is a linear combination of the original features, ordered by how much variance it captures.
How it works
Standardize the data, compute the covariance matrix, and take its eigenvectors and eigenvalues (or equivalently use SVD). The Analytics Vidhya PCA question set notes that each eigenvalue equals the variance explained by its component, with the largest eigenvalue giving the first principal component. You project the data onto the top components to get a lower-dimensional representation that preserves most of the variance.
Choosing the number of components
- Cumulative explained variance: keep enough components to retain a target like 95% of total variance.
- Scree plot: look for the elbow where eigenvalues level off.
- Downstream performance: if PCA feeds a classifier, pick the count that maximizes validation accuracy.
- Kaiser rule (standardized data): keep components with eigenvalue > 1.
The common trap
Forgetting to standardize first — PCA maximizes variance, so an unscaled large-magnitude feature dominates the first component regardless of importance. Other gotchas: PCA components are not interpretable like original features, it assumes linear correlations, and it’s unsupervised so it can discard variance that’s actually predictive. Expected follow-up: “PCA vs t-SNE/UMAP?” — PCA is linear, fast, and deterministic for compression; t-SNE/UMAP are nonlinear and meant for visualization, not feature compression.