datarekha
Machine Learning Easy Asked at GoogleAsked at AmazonAsked at MetaAsked at Microsoft

What is the difference between standardization and normalization, and which models require feature scaling?

The short answer

Standardization rescales features to zero mean and unit variance; normalization squashes values into a fixed range, usually [0, 1]. Distance-based and gradient-based models are sensitive to scale and require one of these; tree-based models split on rank order and are scale-invariant.

How to think about it

Answer the why before the how: features on wildly different scales make some algorithms treat large-valued columns as artificially more important.

The two transforms

Standardization (Z-score scaling) subtracts the mean and divides by the standard deviation so each feature has mean = 0 and std = 1. It does not bound the output, so outliers stay as outliers — just in standard-deviation units.

Min-max normalization maps each value linearly into [0, 1] using (x − min) / (max − min). The range is fixed, but a single extreme outlier compresses all other values into a small slice of that range.

Which models need it

Model familyNeeds scaling?Why
Linear / logistic regressionYesGradient steps are scale-dependent
SVM, KNN, K-meansYesDistance or kernel functions use raw magnitudes
Neural networksYesActivations saturate; convergence slows without it
PCAYesVariance is the criterion; large-scale features dominate
Decision trees, Random Forest, XGBoostNoSplits use rank order, not magnitude
Naive BayesNoProbabilities are computed per feature independently

Practical code

from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression

pipe = Pipeline([
    ("scaler", StandardScaler()),   # swap for MinMaxScaler() if needed
    ("clf", LogisticRegression()),
])
pipe.fit(X_train, y_train)
score = pipe.score(X_test, y_test)

Wrapping the scaler in a Pipeline guarantees it is fit only on training data and applied consistently to test and inference data.

When to prefer which

Use standardization as your default — it handles outliers more gracefully and is required before PCA. Use min-max normalization when the algorithm explicitly expects bounded inputs (e.g., image pixel values fed to a CNN, or when you want outputs in a known range for a neural network’s final layer).

Learn it properly The scikit-learn API

Keep practising

All Machine Learning questions

Explore further

Skip to content