Deep Learning Easy Asked at GoogleAsked at OpenAIAsked at MetaAsked at Anthropic

What do the query, key, and value vectors represent in attention?

For ML Engineer AI / LLM Engineer Data Scientist

The short answer

The query represents what a token is looking for, the key represents what a token is advertising about itself, and the value is the content it contributes if selected. Attention scores measure query-key compatibility, and the output is a soft retrieval: a weighted sum of values where the weights come from those compatibility scores.

How to think about it

The query/key/value abstraction comes from the idea of a differentiable associative lookup:

Query (Q): “What kind of information do I need?” — projected from the current token.
Key (K): “What kind of information do I provide?” — projected from every token in the sequence.
Value (V): “What is the actual content I provide?” — also projected from every token.

The dot product Q_i · K_j measures how well token j can answer the information need of token i. After scaling by sqrt(d_k) and softmax, this score becomes a weight:

a_{ij} = softmax(Q_i K_j^T / sqrt(d_k))

output_i = sum_j a_{ij} V_j

Think of it as a soft database query: instead of retrieving exactly one matching row (hard lookup), attention blends all rows weighted by their relevance to the query.

Why three separate projections instead of one?

If Q = K = V = X (raw input), the model has no freedom to learn separate “what I need” versus “what I offer” representations. Having three independent weight matrices W_Q, W_K, W_V lets the model project each notion into a task-optimal subspace. In practice the learned Q and K spaces often differ substantially from V.

import torch.nn.functional as F

# d_k = 64, n = sequence length
scores = (Q @ K.transpose(-2, -1)) / (64 ** 0.5)  # (n, n)
weights = F.softmax(scores, dim=-1)                  # rows sum to 1
output = weights @ V                                  # (n, d_v)

Learn it properly Self-attention

What do the query, key, and value vectors represent in attention?

Keep practising

Explore further