Why shouldn't you use t-SNE output as features for a downstream model, and what would you use instead?
t-SNE is a visualization method that optimizes a non-parametric 2D/3D embedding preserving local neighborhoods; it has no stable transform, distorts global structure and distances, and is stochastic, so its coordinates aren't reliable predictive features. It also can't naturally project new (test) points. For feature compression use PCA, autoencoders, or supervised embeddings, which provide a consistent, reusable mapping.
How to think about it
The crisp answer
t-SNE is built for visualization, not feature engineering. Its output coordinates aren’t a stable, generalizable representation: it distorts global structure and distances, it’s stochastic, and classic t-SNE has no parametric transform to map new points consistently. So using its 2D coordinates as model inputs leaks visualization artifacts into your features and won’t generalize.
Why it breaks as a feature extractor
- No out-of-sample mapping: standard t-SNE re-optimizes the whole embedding; you can’t cleanly
transform()a test set the way you can with PCA, so train and test embeddings live in incompatible coordinate systems — a direct path to leakage or inconsistency. - Distances and global layout are unreliable: as the bugfree.ai dimensionality-reduction guide notes, t-SNE preserves local neighborhoods but not global geometry, so the coordinates don’t encode meaningful magnitudes for a model to learn from.
- Stochastic: different seeds/hyperparameters give different embeddings, so features aren’t reproducible.
What to use instead
- PCA: linear, deterministic, has a stable
transformfor new data — the default for feature compression. - Autoencoders: nonlinear compression with a reusable encoder, good for complex data.
- Supervised/learned embeddings (e.g. from a neural net trained on the task) when you want representations tuned to the target.
- UMAP is somewhat better than t-SNE here (it can transform new points), but it’s still primarily a visualization tool and risky as features.
The common trap
Seeing nice separated clusters in a t-SNE plot and assuming those coordinates will help a classifier — the separation is often a low-dimensional artifact, not stable signal. Follow-up: “Which method has a proper transform for new data?” — PCA and autoencoders; t-SNE does not, UMAP partially does.