datarekha
Machine Learning Medium Asked at GoogleAsked at FacebookAsked at MicrosoftAsked at Apple

What is the curse of dimensionality, and how does it affect machine learning models?

The short answer

As the number of features grows, the volume of the feature space increases exponentially, so training data becomes exponentially sparse. Distance-based algorithms degrade because points become approximately equidistant; density estimation requires data that grows exponentially; and overfitting risk rises for any fixed training set size.

How to think about it

Coined by Bellman (1957), the curse of dimensionality refers to a cluster of related phenomena that emerge as feature dimensionality d grows.

Exponential volume growth. A unit hypercube in d dimensions has volume 1 regardless of d, but a ball of radius r inside it has volume proportional to r^d. To cover 10% of the input space with a ball, the ball radius must be r = 0.1^(1/d). At d = 10, r ≈ 0.79 — the “local” neighborhood is nearly the entire space.

Distance concentration. In high dimensions, the ratio (max distance - min distance) / min distance approaches 0. All pairwise distances become nearly equal, so k-NN’s notion of “nearest” neighbor loses discriminative power. This affects any kernel method or metric-learning approach.

Data sparsity. To maintain the same sampling density as a 1D problem with 10 points, a 10D problem needs 10^10 points. With a fixed training set of size n, the effective data density drops exponentially with d.

Overfitting. With many features and few samples, models can find spurious correlations. A linear model with d > n is under-determined; tree-based models can construct splits that perfectly separate training noise.

Practical consequences and mitigations:

ProblemMitigation
Sparse neighborhoodsPCA, t-SNE, autoencoders for dimensionality reduction
Spurious correlationsRegularization (L1 drops irrelevant features), feature selection
Slow distance computationApproximate nearest neighbors (FAISS, HNSW)
OverfittingCross-validation, regularization, more data

L1 regularization (Lasso) is particularly powerful: it produces sparse solutions that zero out irrelevant dimensions, effectively performing implicit feature selection.

Learn it properly Curse of Dimensionality

Keep practising

All Machine Learning questions

Explore further

Skip to content