datarekha

How does PCA work, and how do you choose the number of components?

The short answer

PCA finds orthogonal directions (principal components) of maximum variance by computing the eigenvectors of the covariance matrix, then projects data onto the top components. Choose the number of components by the cumulative explained variance ratio (e.g. enough to retain 95%), a scree-plot elbow, or downstream task performance. Always standardize features first, since PCA is variance-driven.

How to think about it

The crisp answer

PCA is a linear dimensionality-reduction method that finds new orthogonal axes — principal components — along which the data varies most, then keeps only the top few. Each component is a linear combination of the original features, ordered by how much variance it captures.

How it works

Standardize the data, compute the covariance matrix, and take its eigenvectors and eigenvalues (or equivalently use SVD). The Analytics Vidhya PCA question set notes that each eigenvalue equals the variance explained by its component, with the largest eigenvalue giving the first principal component. You project the data onto the top components to get a lower-dimensional representation that preserves most of the variance.

Choosing the number of components

  • Cumulative explained variance: keep enough components to retain a target like 95% of total variance.
  • Scree plot: look for the elbow where eigenvalues level off.
  • Downstream performance: if PCA feeds a classifier, pick the count that maximizes validation accuracy.
  • Kaiser rule (standardized data): keep components with eigenvalue > 1.

The common trap

Forgetting to standardize first — PCA maximizes variance, so an unscaled large-magnitude feature dominates the first component regardless of importance. Other gotchas: PCA components are not interpretable like original features, it assumes linear correlations, and it’s unsupervised so it can discard variance that’s actually predictive. Expected follow-up: “PCA vs t-SNE/UMAP?” — PCA is linear, fast, and deterministic for compression; t-SNE/UMAP are nonlinear and meant for visualization, not feature compression.

Learn it properly PCA & dimensionality reduction

Keep practising

All Machine Learning questions

Explore further

Skip to content