What are the assumptions and limitations of PCA, and when would it hurt your model?

For Data Scientist ML Engineer research-engineer

The short answer

PCA assumes linear relationships, that variance equals importance, and that components should be orthogonal. It can hurt when the predictive signal lives in low-variance directions, when relationships are nonlinear, or when interpretability matters, since components mix original features. It's also sensitive to scaling and outliers and is unsupervised, so it ignores the target.

How to think about it

The crisp answer

PCA rests on three assumptions: relationships are linear, high variance equals importance, and the meaningful axes are orthogonal. When any of these break, PCA can discard exactly the signal you need.

Why these matter

PCA captures directions of maximum variance, but variance and predictive power aren’t the same thing. As discussed in the Analytics Vidhya PCA interview questions, a low-variance direction can carry the class-separating signal, and PCA — being unsupervised — will happily throw it away because it never looks at the target.

When it hurts

Signal in low-variance directions: discarding the smallest components drops the discriminative feature.
Nonlinear structure: PCA only finds linear axes; use kernel PCA, autoencoders, or UMAP instead.
Interpretability needs: each component blends all original features, so you lose “feature X drove this.”
Heavy outliers or unscaled features: variance gets dominated by them, skewing components.

Concrete example

In a fraud dataset where fraud differs subtly along a low-variance axis, projecting onto the top components can erase the fraud signal and tank recall — a case where dropping PCA improves the model.

The common trap

Using PCA reflexively for “dimensionality reduction = better.” Often regularization or supervised feature selection generalizes better while staying interpretable. Also, PCA must be fit on training data only and applied to test data to avoid leakage. Follow-up: “How would you handle nonlinearity?” — kernel PCA or an autoencoder; for visualization, t-SNE or UMAP.

Learn it properly PCA & dimensionality reduction

What are the assumptions and limitations of PCA, and when would it hurt your model?

The crisp answer

Why these matter

When it hurts

Concrete example

The common trap

Keep practising

Explore further