Machine Learning Medium Asked at GoogleAsked at AmazonAsked at MicrosoftAsked at Netflix

What is PCA, when should you use it, and what are its key limitations?

For Data Scientist ML Engineer AI / LLM Engineer

The short answer

PCA finds the orthogonal directions of maximum variance in the data and projects onto a lower-dimensional subspace, reducing features while retaining most information. It is most useful before distance-based models or when training is bottlenecked by dimensionality. Its main limits are loss of interpretability, sensitivity to scale, and an assumption of linear structure.

How to think about it

PCA is often described as “dimensionality reduction,” but the precise goal is variance preservation: keep the directions that explain the most spread in the data.

What PCA does

Center the data (subtract the mean of each feature).
Compute the covariance matrix.
Decompose it into eigenvectors (principal components) and eigenvalues (variance explained per component).
Project the data onto the top-k eigenvectors.

The scree plot of cumulative explained variance helps choose k — a common heuristic is to retain enough components to explain 95% of variance.

Each point is projected (dashed lines) onto PC1, the direction of maximum variance. The 2D dataset becomes 1D.

When to use PCA

High-dimensional, dense numeric data (images, spectral data, sensor arrays) fed to distance-based models where the curse of dimensionality degrades performance.
Multicollinearity in linear models: PCA produces orthogonal components, eliminating correlated inputs.
Training speed bottleneck: fewer features means faster matrix operations.
Visualization of high-dimensional data (project to 2–3 components for scatter plots).

When not to use PCA

When features are categorical (PCA operates on covariance, which is meaningless for binary/nominal columns).
When interpretability is required: principal components are linear combinations of all original features and have no direct business meaning.
When the signal is nonlinear — use t-SNE or UMAP for exploration instead.

Practical code

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

pipe = Pipeline([
    ("scale", StandardScaler()),   # mandatory before PCA
    ("pca",   PCA(n_components=0.95, random_state=42)),  # keep 95% variance
    ("clf",   SVC()),
])
pipe.fit(X_train, y_train)
print(pipe.named_steps["pca"].n_components_)

Learn it properly PCA from scratch