datarekha
Deep Learning Easy Asked at GoogleAsked at OpenAIAsked at MetaAsked at Anthropic

What do the query, key, and value vectors represent in attention?

The short answer

The query represents what a token is looking for, the key represents what a token is advertising about itself, and the value is the content it contributes if selected. Attention scores measure query-key compatibility, and the output is a soft retrieval: a weighted sum of values where the weights come from those compatibility scores.

How to think about it

The query/key/value abstraction comes from the idea of a differentiable associative lookup:

  • Query (Q): “What kind of information do I need?” — projected from the current token.
  • Key (K): “What kind of information do I provide?” — projected from every token in the sequence.
  • Value (V): “What is the actual content I provide?” — also projected from every token.

The dot product Q_i · K_j measures how well token j can answer the information need of token i. After scaling by sqrt(d_k) and softmax, this score becomes a weight:

a_{ij} = softmax(Q_i K_j^T / sqrt(d_k))

output_i = sum_j a_{ij} V_j

Think of it as a soft database query: instead of retrieving exactly one matching row (hard lookup), attention blends all rows weighted by their relevance to the query.

Why three separate projections instead of one?

If Q = K = V = X (raw input), the model has no freedom to learn separate “what I need” versus “what I offer” representations. Having three independent weight matrices W_Q, W_K, W_V lets the model project each notion into a task-optimal subspace. In practice the learned Q and K spaces often differ substantially from V.

import torch.nn.functional as F

# d_k = 64, n = sequence length
scores = (Q @ K.transpose(-2, -1)) / (64 ** 0.5)  # (n, n)
weights = F.softmax(scores, dim=-1)                  # rows sum to 1
output = weights @ V                                  # (n, d_v)
Learn it properly Self-attention

Keep practising

All Deep Learning questions

Explore further

Skip to content