NLP & LLMs Medium Asked at GoogleAsked at MetaAsked at Amazon

How does Word2Vec work, and what is the difference between Skip-gram and CBOW?

For Data Scientist ML Engineer AI / LLM Engineer

The short answer

Word2Vec trains a shallow neural network to predict context from a target word (Skip-gram) or a target word from its context (CBOW), learning dense vector representations as a by-product. Skip-gram works better for rare words; CBOW is faster and suits large corpora.

How to think about it

Word2Vec (Mikolov et al., 2013) exploits the distributional hypothesis — words appearing in similar contexts have similar meanings — to learn vector embeddings from unlabelled text.

Skip-gram: given a center word, predict each surrounding word within a window. Objective: maximize P(context | center). Because the model must reconstruct many context words from one signal, it captures rare word contexts well.

CBOW (Continuous Bag of Words): average the context word vectors and predict the center word. Faster, smoother embeddings, better for frequent words.

Both architectures train a single hidden layer. The learned weight matrix from input to hidden layer is the embedding table.

from gensim.models import Word2Vec

sentences = [
    ["the", "king", "rules", "the", "land"],
    ["the", "queen", "rules", "the", "kingdom"],
    ["man", "and", "woman", "are", "equal"],
]

# Skip-gram (sg=1); CBOW is sg=0
model = Word2Vec(sentences, vector_size=50, window=2, sg=1, min_count=1, epochs=100)

print(model.wv.most_similar("king", topn=3))
# Classic analogy test
result = model.wv.most_similar(positive=["king", "woman"], negative=["man"])
print(result[0])  # ideally close to 'queen'

king - man + woman ≈ queen: parallel vector offsets encode gender

Negative sampling (NS) makes training tractable: instead of a full softmax over the vocabulary, the model contrasts the true context word against a small set of randomly sampled “noise” words.

Learn it properly BERT, GPT, T5

How does Word2Vec work, and what is the difference between Skip-gram and CBOW?

Keep practising

Explore further