What's the difference between t-SNE and UMAP, and what are the pitfalls of interpreting their plots?

Both are nonlinear dimensionality-reduction methods for visualization that preserve local neighborhood structure, but UMAP is faster, scales better, and tends to preserve more global structure, while t-SNE emphasizes tight local clusters. The main pitfall is over-interpreting the plots: cluster sizes, densities, and distances between clusters are not meaningful, and results depend heavily on hyperparameters like perplexity or n_neighbors. Neither should be used as features for a downstream model.

Why shouldn't you use t-SNE output as features for a downstream model, and what would you use instead?

t-SNE is a visualization method that optimizes a non-parametric 2D/3D embedding preserving local neighborhoods; it has no stable transform, distorts global structure and distances, and is stochastic, so its coordinates aren't reliable predictive features. It also can't naturally project new (test) points. For feature compression use PCA, autoencoders, or supervised embeddings, which provide a consistent, reusable mapping.

What are t-SNE and UMAP, how do they differ from PCA, and what are their limitations for ML workflows?

t-SNE and UMAP are nonlinear dimensionality reduction algorithms designed primarily for 2D/3D visualization of high-dimensional data. Unlike PCA, they preserve local neighborhood structure rather than global variance, producing cleaner cluster separations in plots. Neither should be used as a preprocessing step for training a supervised model because they are transductive and their output is not stable across runs.

What is PCA, when should you use it, and what are its key limitations?

PCA finds the orthogonal directions of maximum variance in the data and projects onto a lower-dimensional subspace, reducing features while retaining most information. It is most useful before distance-based models or when training is bottlenecked by dimensionality. Its main limits are loss of interpretability, sensitivity to scale, and an assumption of linear structure.

t-SNE & UMAP — Machine Learning

PCA is linear — it can only rotate and project, so curved structure gets flattened and overlapping clusters smear together. t-SNE and UMAP are nonlinear projections built for one job: making a 2D picture where similar high-dimensional points land near each other, so you can see clusters that PCA hides. They’re the standard tools for visualizing embeddings, gene expression, and any wide dataset.

The idea: preserve neighborhoods, not distances

Both methods optimize a 2D layout so that points which were close in high-dimensional space stay close in the picture. They don’t try to preserve global distances — only local neighborhoods. That’s exactly why they reveal clusters: tight high-dimensional groups become visually separated blobs.

Linear PCA can leave groups overlapping; nonlinear t-SNE/UMAP pulls neighborhoods apart into visible clusters.

t-SNE vs UMAP

t-SNE — the original. Beautiful local structure, but slow, and it destroys global structure (distances between clusters are meaningless). Tuned by perplexity (roughly, how many neighbors to consider).
UMAP — newer, much faster, scales to millions of points, and preserves somewhat more global structure. It’s now the default for most embedding visualization. Tuned by n_neighbors and min_dist.

import numpy as np
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
from sklearn.datasets import load_digits

X, y = load_digits(return_X_y=True)   # 64-dim handwritten digits

# PCA to 2D: fast, but classes overlap. t-SNE: slower, classes separate.
pca2 = PCA(n_components=2).fit_transform(X)
tsne2 = TSNE(n_components=2, perplexity=30, random_state=0).fit_transform(X)

# A rough "how separated are the classes" score: mean nearest-neighbor same-label rate
def sep(emb):
    from sklearn.neighbors import NearestNeighbors
    nn = NearestNeighbors(n_neighbors=2).fit(emb)
    idx = nn.kneighbors(emb, return_distance=False)[:, 1]
    return (y[idx] == y).mean()

print(f"PCA   neighbor-purity: {sep(pca2):.2f}")
print(f"t-SNE neighbor-purity: {sep(tsne2):.2f}  <- much higher separation")

In one breath

PCA is linear, so curved or overlapping structure flattens; t-SNE and UMAP are nonlinear projections built to make clusters visible in 2D.
They preserve local neighborhoods, not global distances — that’s why tight high-dimensional groups become separate blobs.
t-SNE is the slower original (tuned by perplexity); UMAP is faster, scales to millions, and keeps somewhat more global structure — now the default.
Read the plots with care: cluster sizes and between-cluster distances are not meaningful, and the result shifts with the seed and hyperparameters.
Use them to see and generate hypotheses, never as proof or as features for a downstream model — for reduction-as-preprocessing, use PCA.

Quick check

0/3

Q1Why do t-SNE and UMAP reveal clusters that PCA can miss?

Q2In a t-SNE plot, two clusters appear very far apart. What can you conclude?

Q3Should you use t-SNE/UMAP output as input features for a classifier?

You’ve now seen the full unsupervised toolkit. The last practical lesson: AutoML — when to let a tool do the search for you.

t-SNE & UMAP

What you'll learn

Before you start

The idea: preserve neighborhoods, not distances

t-SNE vs UMAP

In one breath

Quick check

Quick check

Next

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further