datarekha
NLP & LLMs Easy Asked at GoogleAsked at MetaAsked at Amazon

Why do dense word embeddings outperform one-hot vectors?

The short answer

One-hot vectors are high-dimensional, sparse, and treat all words as equidistant — they carry zero semantic information. Dense embeddings place similar words close together in a low-dimensional space, enabling models to generalize from seen words to unseen but related ones.

How to think about it

One-hot encoding represents each word as a vector of zeros with a single 1 at the word’s index. For a vocabulary of size V, every word is a V-dimensional sparse vector.

Problems:

  1. Dimensionality: a 50,000-word vocabulary means 50,000-dimensional input — memory and compute explode.
  2. No similarity: the dot product of any two distinct one-hot vectors is 0. “Cat” and “kitten” are as distant as “cat” and “galaxy”.
  3. No generalization: a model trained on “cat” learns nothing transferable to “kitten”.

Dense embeddings (Word2Vec, GloVe, fastText) compress each word into a 50-300 dimensional real-valued vector learned from distributional co-occurrence:

import numpy as np
from gensim.models import Word2Vec

sentences = [["the","cat","sat"],["the","kitten","slept"],["a","dog","ran"]]
model = Word2Vec(sentences, vector_size=10, window=2, min_count=1, epochs=200)

cat = model.wv["cat"]
kitten = model.wv["kitten"]
dog = model.wv["dog"]

def cosine(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

print(cosine(cat, kitten))  # high
print(cosine(cat, dog))     # moderate

Comparison summary

PropertyOne-hotDense embedding
DimensionsV (50k+)50-300
SparseYesNo
Semantic similarityNoneEncoded
GeneralizationNoneStrong

Practical impact: classifiers trained on embeddings require far fewer labelled examples because the embedding already encodes prior knowledge about word relationships. A model that sees “cat” examples implicitly understands “kitten” examples too.

Learn it properly BERT, GPT, T5

Keep practising

All NLP & LLMs questions

Explore further

Skip to content