What's the difference between feature selection and dimensionality reduction like PCA?

For Data Scientist ML Engineer Data Analyst

The short answer

Feature selection keeps a subset of the original features and discards the rest, so the surviving features stay interpretable. Dimensionality reduction like PCA creates new features that are combinations of the originals, compressing information but losing direct interpretability. Choose feature selection when you need to explain which inputs matter, and PCA when you mainly need a compact representation and don't need named features.

How to think about it

The crisp answer

Both shrink the input space, but differently. Feature selection picks a subset of the original features and throws the rest away — the features you keep are the real, named ones. Dimensionality reduction like PCA constructs new features (linear combinations of the originals) that compress the information into fewer dimensions.

Why the distinction matters

The key consequence is interpretability. After feature selection you can still say “income and tenure drive the prediction.” After PCA you have “component 1,” a blend of all features, which is hard to explain to a stakeholder or regulator. As the Analytics Vidhya PCA questions note, PCA’s components are combinations weighted by variance, not original variables.

How they handle redundancy

Feature selection removes irrelevant or redundant features but can’t merge correlated ones into a single signal.
PCA explicitly decorrelates: correlated features collapse into shared components, which is great for multicollinearity.

When to use which

Feature selection: interpretability or regulatory requirements, reducing data collection cost, removing noise features.
PCA / dimensionality reduction: many correlated numeric features, you need a compact dense representation, or you’re feeding a downstream model and don’t need named inputs.

The common trap

Using PCA when stakeholders need explanations, or using it on data with predictive signal in low-variance directions, which PCA discards. Both must be fit on training data only to avoid leakage. Follow-up: “Can you combine them?” — yes: select to drop junk features for interpretability, then optionally PCA the survivors, but watch that you don’t reintroduce a non-interpretable representation if explainability was the goal.

Learn it properly Feature selection

What's the difference between feature selection and dimensionality reduction like PCA?

The crisp answer

Why the distinction matters

How they handle redundancy

When to use which

The common trap

Keep practising

Explore further