Which models require feature scaling and which don't, and why?
Distance-based and gradient-based models (KNN, K-means, SVM, PCA, linear/logistic regression with regularization, neural networks) need scaling because they're sensitive to feature magnitudes. Tree-based models (decision trees, random forests, gradient boosting) are scale-invariant because they split on thresholds per feature. Standardization and min-max scaling are the usual choices, fit on training data only.
How to think about it
The crisp answer
Models that rely on distances or on gradient descent over weighted sums need feature scaling; tree-based models do not. The reason is whether the algorithm treats feature magnitude as meaningful.
Why some models need it
KNN, K-means, and SVM (especially with RBF kernels) compute distances between points. If one feature ranges 0–1 and another 0–100,000, the large-magnitude feature dominates the distance and the small one is effectively ignored. PCA maximizes variance, so unscaled high-variance features hijack the components. Regularized linear models and neural nets converge faster and penalize coefficients fairly when features share a scale.
Why trees don’t need it
Decision trees, random forests, and gradient-boosted trees (XGBoost, LightGBM) split one feature at a time on a threshold like age > 30. Any monotonic rescaling preserves the ordering and thus the same split, so scaling has no effect on the model.
Concrete example
For an SVM, the SVM interview discussion on Analytics Vidhya stresses that talking about tuning C and gamma without mentioning scaling is a red flag, because both the margin and the kernel depend on distances.
The common trap
Fit the scaler on the training set only, then transform validation and test with those same statistics — fitting on all data leaks test distribution into training. Use a pipeline so this happens automatically inside cross-validation. Follow-up to expect: standardization (zero mean, unit variance) versus min-max (bounded [0,1]) — use standardization when data is roughly Gaussian or has outliers handled, min-max when you need bounded inputs (e.g. some neural nets).