datarekha

What are positional encodings and why are they needed in transformers?

The short answer

Self-attention is permutation-invariant and has no inherent notion of token order, so positional encodings inject information about each token's position. They can be fixed sinusoidal functions or learned embeddings added to inputs, or relative schemes like RoPE that modulate attention by relative distance.

How to think about it

Self-attention is permutation-invariant and has no inherent notion of token order, so positional encodings inject information about each token’s position. They can be fixed sinusoidal functions or learned embeddings added to inputs, or relative schemes like RoPE that modulate attention by relative distance.

Learn it properly Positional encodings & RoPE

Keep practising

All Deep Learning questions

Explore further

Skip to content