Deep Learning Easy Asked at GoogleAsked at OpenAIAsked at MetaAsked at Spotify

What are embeddings and why are they central to modern deep learning?

For ML Engineer AI / LLM Engineer Data Scientist

The short answer

An embedding is a dense, learned vector representation of a discrete or high-dimensional object — a word, image, user, product — in a continuous low-dimensional space. Proximity in embedding space reflects semantic or behavioural similarity, making embeddings a universal interface between raw data and neural networks.

How to think about it

Before embeddings, discrete inputs like words were one-hot encoded — a vocabulary of 50 000 words becomes a 50 000-dimensional sparse vector where 49 999 entries are zero. This is memory-inefficient and contains zero semantic information: “king” and “queen” are as far apart as “king” and “banana”.

What an embedding layer does

It is a learned lookup table: integer index → dense vector of dimension d. During training the vectors shift so that similar items end up close in d-dimensional space.

import torch.nn as nn

# Map a vocabulary of 10,000 tokens to 256-dimensional vectors
embed = nn.Embedding(num_embeddings=10_000, embedding_dim=256)

token_ids = torch.tensor([42, 7, 1337])  # batch of token indices
vecs = embed(token_ids)                  # shape: (3, 256)

Why they matter

Compression — 256 floats instead of 50 000 binary values.
Generalisation — the model can reason about unseen token combinations because nearby embeddings share gradient updates during training.
Reuse — pretrained embeddings (Word2Vec, GloVe, BERT’s token embeddings) transfer across tasks. This is the backbone of transfer learning in NLP.
Multimodal alignment — systems like CLIP train image and text encoders jointly so that embed("a dog") ≈ embed(photo_of_dog). This enables zero-shot retrieval.

Applications beyond NLP

Recommender systems embed users and items (Netflix, Spotify). Knowledge graphs embed entities and relations. Molecule encoders embed atoms and bonds for drug discovery. Anywhere you have discrete or complex objects, an embedding gives the network something to work with.

The dot product or cosine similarity between two embedding vectors is a fast, differentiable proxy for semantic similarity — which is why vector databases and approximate nearest-neighbour search are now infrastructure primitives.

What are embeddings and why are they central to modern deep learning?

Keep practising

Explore further