What are the assumptions and limitations of PCA, and when would it hurt your model?
PCA assumes linear relationships, that variance equals importance, and that components should be orthogonal. It can hurt when the predictive signal lives in low-variance directions, when relationships are nonlinear, or when interpretability matters, since components mix original features. It's also sensitive to scaling and outliers and is unsupervised, so it ignores the target.
How to think about it
The crisp answer
PCA rests on three assumptions: relationships are linear, high variance equals importance, and the meaningful axes are orthogonal. When any of these break, PCA can discard exactly the signal you need.
Why these matter
PCA captures directions of maximum variance, but variance and predictive power aren’t the same thing. As discussed in the Analytics Vidhya PCA interview questions, a low-variance direction can carry the class-separating signal, and PCA — being unsupervised — will happily throw it away because it never looks at the target.
When it hurts
- Signal in low-variance directions: discarding the smallest components drops the discriminative feature.
- Nonlinear structure: PCA only finds linear axes; use kernel PCA, autoencoders, or UMAP instead.
- Interpretability needs: each component blends all original features, so you lose “feature X drove this.”
- Heavy outliers or unscaled features: variance gets dominated by them, skewing components.
Concrete example
In a fraud dataset where fraud differs subtly along a low-variance axis, projecting onto the top components can erase the fraud signal and tank recall — a case where dropping PCA improves the model.
The common trap
Using PCA reflexively for “dimensionality reduction = better.” Often regularization or supervised feature selection generalizes better while staying interpretable. Also, PCA must be fit on training data only and applied to test data to avoid leakage. Follow-up: “How would you handle nonlinearity?” — kernel PCA or an autoencoder; for visualization, t-SNE or UMAP.