Machine Learning Medium Asked at GoogleAsked at AmazonAsked at Microsoft

What are filter, wrapper, and embedded feature selection methods, and when do you use each?

For Data Scientist ML Engineer AI / LLM Engineer

The short answer

Filter methods score features independently of the model using statistics like mutual information or correlation; they are fast but ignore feature interactions. Wrapper methods search subsets by actually training the model, finding better subsets at high computational cost. Embedded methods perform selection during training — LASSO and tree-based feature importances are the most common — offering a balance of quality and speed.

How to think about it

Feature selection reduces overfitting, shortens training time, and improves interpretability. Knowing which method to reach for depends on dataset size and whether you can afford repeated model training.

Filter methods

Score each feature independently, then threshold. Common statistics:

Variance threshold: drop near-constant features.
Mutual information: measures nonlinear dependence between a feature and the target.
ANOVA F-statistic: linear association for classification tasks.
Pearson correlation: linear dependence; misses nonlinear relationships.

from sklearn.feature_selection import SelectKBest, mutual_info_classif

selector = SelectKBest(mutual_info_classif, k=20)
X_selected = selector.fit_transform(X_train, y_train)

Use when: the dataset is large and model-training loops would be prohibitive; as a fast first pass before wrapper or embedded methods.

Wrapper methods

Train the model on every candidate subset. Common strategies:

Recursive Feature Elimination (RFE): train the model, remove the weakest feature, repeat.
Forward / backward selection: greedily add or remove features based on validation score.

from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

rfe = RFE(LogisticRegression(max_iter=1000), n_features_to_select=10)
rfe.fit(X_train, y_train)
X_reduced = rfe.transform(X_train)

Use when: the feature count is moderate (<100) and you can afford multiple model fits. Produces the best subset for the chosen model but is slow and prone to overfitting on small datasets.

Embedded methods

Selection happens as part of fitting the model:

LASSO (L1 regularization): drives irrelevant feature coefficients exactly to zero.
Tree feature importances: impurity-based or permutation importance from Random Forest or gradient-boosted trees.
ElasticNet: combines L1 and L2, useful when features are correlated.

from sklearn.linear_model import LassoCV

lasso = LassoCV(cv=5).fit(X_train, y_train)
important = X_train.columns[lasso.coef_ != 0]

Use when: you want selection and modeling in one step; scales to high-dimensional data.

Comparison

Method	Speed	Interaction-aware	Model-dependent
Filter	Fast	No	No
Wrapper	Slow	Yes	Yes
Embedded	Medium	Partially	Yes

What are filter, wrapper, and embedded feature selection methods, and when do you use each?

Filter methods

Wrapper methods

Embedded methods

Comparison

Keep practising

Explore further