datarekha

t-SNE & UMAP

See the structure in high-dimensional data by projecting it to 2D — the right way. How nonlinear embeddings reveal clusters PCA misses, and the traps that make their plots easy to misread.

6 min read Intermediate Machine Learning Lesson 29 of 33

What you'll learn

  • Why nonlinear projections reveal clusters that linear PCA flattens
  • How t-SNE and UMAP differ (and when to use each)
  • The traps — cluster sizes, distances, and random seeds can mislead

Before you start

PCA is linear — it can only rotate and project, so curved structure gets flattened and overlapping clusters smear together. t-SNE and UMAP are nonlinear projections built for one job: making a 2D picture where similar high-dimensional points land near each other, so you can see clusters that PCA hides. They’re the standard tools for visualizing embeddings, gene expression, and any wide dataset.

The idea: preserve neighborhoods, not distances

Both methods optimize a 2D layout so that points which were close in high-dimensional space stay close in the picture. They don’t try to preserve global distances — only local neighborhoods. That’s exactly why they reveal clusters: tight high-dimensional groups become visually separated blobs.

PCA (linear): overlapgroups smeared togethert-SNE / UMAP: separatedthree clean clusters emerge
Linear PCA can leave groups overlapping; nonlinear t-SNE/UMAP pulls neighborhoods apart into visible clusters.

t-SNE vs UMAP

  • t-SNE — the original. Beautiful local structure, but slow, and it destroys global structure (distances between clusters are meaningless). Tuned by perplexity (roughly, how many neighbors to consider).
  • UMAP — newer, much faster, scales to millions of points, and preserves somewhat more global structure. It’s now the default for most embedding visualization. Tuned by n_neighbors and min_dist.

Quick check

Quick check

0/3
Q1Why do t-SNE and UMAP reveal clusters that PCA can miss?
Q2In a t-SNE plot, two clusters appear very far apart. What can you conclude?
Q3Should you use t-SNE/UMAP output as input features for a classifier?

Next

You’ve now seen the full unsupervised toolkit. The last practical lesson: AutoML — when to let a tool do the search for you.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Practice this in an interview

All questions
What are t-SNE and UMAP, how do they differ from PCA, and what are their limitations for ML workflows?

t-SNE and UMAP are nonlinear dimensionality reduction algorithms designed primarily for 2D/3D visualization of high-dimensional data. Unlike PCA, they preserve local neighborhood structure rather than global variance, producing cleaner cluster separations in plots. Neither should be used as a preprocessing step for training a supervised model because they are transductive and their output is not stable across runs.

What's the difference between t-SNE and UMAP, and what are the pitfalls of interpreting their plots?

Both are nonlinear dimensionality-reduction methods for visualization that preserve local neighborhood structure, but UMAP is faster, scales better, and tends to preserve more global structure, while t-SNE emphasizes tight local clusters. The main pitfall is over-interpreting the plots: cluster sizes, densities, and distances between clusters are not meaningful, and results depend heavily on hyperparameters like perplexity or n_neighbors. Neither should be used as features for a downstream model.

Why shouldn't you use t-SNE output as features for a downstream model, and what would you use instead?

t-SNE is a visualization method that optimizes a non-parametric 2D/3D embedding preserving local neighborhoods; it has no stable transform, distorts global structure and distances, and is stochastic, so its coordinates aren't reliable predictive features. It also can't naturally project new (test) points. For feature compression use PCA, autoencoders, or supervised embeddings, which provide a consistent, reusable mapping.

What is PCA, when should you use it, and what are its key limitations?

PCA finds the orthogonal directions of maximum variance in the data and projects onto a lower-dimensional subspace, reducing features while retaining most information. It is most useful before distance-based models or when training is bottlenecked by dimensionality. Its main limits are loss of interpretability, sensitivity to scale, and an assumption of linear structure.

Related lessons

Explore further

Skip to content