What's the difference between t-SNE and UMAP, and what are the pitfalls of interpreting their plots?

For Data Scientist ML Engineer research-engineer

The short answer

Both are nonlinear dimensionality-reduction methods for visualization that preserve local neighborhood structure, but UMAP is faster, scales better, and tends to preserve more global structure, while t-SNE emphasizes tight local clusters. The main pitfall is over-interpreting the plots: cluster sizes, densities, and distances between clusters are not meaningful, and results depend heavily on hyperparameters like perplexity or n_neighbors. Neither should be used as features for a downstream model.

How to think about it

The crisp answer

t-SNE and UMAP are both nonlinear techniques that project high-dimensional data to 2D/3D for visualization, preserving which points are neighbors. The practical differences: UMAP is faster, scales to larger data, and preserves more global structure, while t-SNE produces very tight, well-separated local clusters but is slower and tends to distort global layout.

How they differ

The PCA vs t-SNE vs UMAP guide summarizes it: t-SNE converts distances to probabilities and minimizes KL divergence between high- and low-dimensional neighbor distributions, emphasizing local structure. UMAP builds a fuzzy topological graph and optimizes a low-dimensional layout, which is faster and keeps relative cluster positions more meaningful. PCA, by contrast, is linear and used for compression, not just visualization.

The big pitfalls

Cluster sizes and densities are not meaningful — t-SNE in particular equalizes density, so a tight cluster may just be an artifact.
Distances between clusters mostly don’t mean anything in t-SNE (UMAP is somewhat better but still unreliable).
Hyperparameters dominate: perplexity (t-SNE) and n_neighbors/min_dist (UMAP) change the picture dramatically; always try several.
Both are stochastic (different runs differ) unless seeded.

The common trap

Reading these plots as ground truth — drawing conclusions from gaps, cluster sizes, or inter-cluster distances. Even the arXiv critique “Stop Misusing t-SNE and UMAP for Visual Analytics” warns about over-interpretation. Also, never feed t-SNE/UMAP embeddings into a downstream model as features — they’re for exploration, and the transform isn’t a stable, generalizable mapping. Follow-up: “PCA vs these?” — use PCA for compression/preprocessing (linear, deterministic, fast), t-SNE/UMAP for visual exploration only.

Learn it properly t-SNE & UMAP

What's the difference between t-SNE and UMAP, and what are the pitfalls of interpreting their plots?

The crisp answer

How they differ

The big pitfalls

The common trap

Keep practising

Explore further