What are positional encodings and why are they needed in transformers?

For ML Engineer research-engineer AI / LLM Engineer

The short answer

Self-attention is permutation-invariant and has no inherent notion of token order, so positional encodings inject information about each token's position. They can be fixed sinusoidal functions or learned embeddings added to inputs, or relative schemes like RoPE that modulate attention by relative distance.

How to think about it

Self-attention is permutation-invariant and has no inherent notion of token order, so positional encodings inject information about each token’s position. They can be fixed sinusoidal functions or learned embeddings added to inputs, or relative schemes like RoPE that modulate attention by relative distance.

Learn it properly Positional encodings & RoPE

What are positional encodings and why are they needed in transformers?

Keep practising

Explore further