What is the difference between standardization and normalization, and which models require feature scaling?
Standardization rescales features to zero mean and unit variance; normalization squashes values into a fixed range, usually [0, 1]. Distance-based and gradient-based models are sensitive to scale and require one of these; tree-based models split on rank order and are scale-invariant.
How to think about it
Answer the why before the how: features on wildly different scales make some algorithms treat large-valued columns as artificially more important.
The two transforms
Standardization (Z-score scaling) subtracts the mean and divides by the standard deviation so each feature has mean = 0 and std = 1. It does not bound the output, so outliers stay as outliers — just in standard-deviation units.
Min-max normalization maps each value linearly into [0, 1] using (x − min) / (max − min). The range is fixed, but a single extreme outlier compresses all other values into a small slice of that range.
Which models need it
| Model family | Needs scaling? | Why |
|---|---|---|
| Linear / logistic regression | Yes | Gradient steps are scale-dependent |
| SVM, KNN, K-means | Yes | Distance or kernel functions use raw magnitudes |
| Neural networks | Yes | Activations saturate; convergence slows without it |
| PCA | Yes | Variance is the criterion; large-scale features dominate |
| Decision trees, Random Forest, XGBoost | No | Splits use rank order, not magnitude |
| Naive Bayes | No | Probabilities are computed per feature independently |
Practical code
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
pipe = Pipeline([
("scaler", StandardScaler()), # swap for MinMaxScaler() if needed
("clf", LogisticRegression()),
])
pipe.fit(X_train, y_train)
score = pipe.score(X_test, y_test)
Wrapping the scaler in a Pipeline guarantees it is fit only on training data and applied consistently to test and inference data.
When to prefer which
Use standardization as your default — it handles outliers more gracefully and is required before PCA. Use min-max normalization when the algorithm explicitly expects bounded inputs (e.g., image pixel values fed to a CNN, or when you want outputs in a known range for a neural network’s final layer).