Visualization of how Rotary Position Embeddings (RoPE) encode relative positions through vector rotation
RoPE Rotation shows how position is encoded as an angle. Each 2D pair in the embedding is rotated by a position-dependent angle. The key insight: The dot product between two rotated vectors depends only on their relative position – perfect for Attention.
This animation makes the mathematical idea of RoPE tangible. It complements the theoretical RoPE-ALiBi overview with an interactive visualization.
RoPE is the position encoding in Llama, Mistral, Qwen and most open-source models. Understanding how it works explains why these models can be extended to 128K+ tokens.
Rotary Position Embedding (RoPE) rotates vectors in 2D subspaces based on their position. The key: Relative position between two tokens corresponds to the rotation difference of their embeddings. This enables zero-shot length extrapolation and is used in Llama, PaLM and GPT-NeoX.
m: Position of the token
θ: Rotation frequency (e.g., 1/100002i/d)
x₀, x₁: 2D subspace of the vector
The dot product between Query at position m and Key at position n:
Depends only on the relative position (m-n), not on absolute positions! This enables length extrapolation: If the model was trained on 2K tokens, it often still works at 8K+ tokens.