Machine Learning Easy Asked at GoogleAsked at AmazonAsked at MetaAsked at Microsoft

What is the difference between standardization and normalization, and which models require feature scaling?

For Data Scientist ML Engineer Data Analyst AI / LLM Engineer

The short answer

Standardization rescales features to zero mean and unit variance; normalization squashes values into a fixed range, usually [0, 1]. Distance-based and gradient-based models are sensitive to scale and require one of these; tree-based models split on rank order and are scale-invariant.

How to think about it

Answer the why before the how: features on wildly different scales make some algorithms treat large-valued columns as artificially more important.

The two transforms

Standardization (Z-score scaling) subtracts the mean and divides by the standard deviation so each feature has mean = 0 and std = 1. It does not bound the output, so outliers stay as outliers — just in standard-deviation units.

Min-max normalization maps each value linearly into [0, 1] using (x − min) / (max − min). The range is fixed, but a single extreme outlier compresses all other values into a small slice of that range.

Which models need it

Model family	Needs scaling?	Why
Linear / logistic regression	Yes	Gradient steps are scale-dependent
SVM, KNN, K-means	Yes	Distance or kernel functions use raw magnitudes
Neural networks	Yes	Activations saturate; convergence slows without it
PCA	Yes	Variance is the criterion; large-scale features dominate
Decision trees, Random Forest, XGBoost	No	Splits use rank order, not magnitude
Naive Bayes	No	Probabilities are computed per feature independently

Practical code

from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression

pipe = Pipeline([
    ("scaler", StandardScaler()),   # swap for MinMaxScaler() if needed
    ("clf", LogisticRegression()),
])
pipe.fit(X_train, y_train)
score = pipe.score(X_test, y_test)

Wrapping the scaler in a Pipeline guarantees it is fit only on training data and applied consistently to test and inference data.

When to prefer which

Use standardization as your default — it handles outliers more gracefully and is required before PCA. Use min-max normalization when the algorithm explicitly expects bounded inputs (e.g., image pixel values fed to a CNN, or when you want outputs in a known range for a neural network’s final layer).

Learn it properly The scikit-learn API

What is the difference between standardization and normalization, and which models require feature scaling?

The two transforms

Which models need it

Practical code

When to prefer which

Keep practising

Explore further