Deep Learning Medium
What are positional encodings and why are they needed in transformers?
The short answer
Self-attention is permutation-invariant and has no inherent notion of token order, so positional encodings inject information about each token's position. They can be fixed sinusoidal functions or learned embeddings added to inputs, or relative schemes like RoPE that modulate attention by relative distance.
How to think about it
Self-attention is permutation-invariant and has no inherent notion of token order, so positional encodings inject information about each token’s position. They can be fixed sinusoidal functions or learned embeddings added to inputs, or relative schemes like RoPE that modulate attention by relative distance.