Complete visualization of the Positional Encoding Matrix as a heatmap – revealing the periodic patterns
The Position Encoding Matrix shows all position vectors at a glance. The characteristic stripes reveal how different frequencies encode different position scales.
This visualization complements the Sine/Cosine visualization (1.3) with a holistic perspective on the entire encoding matrix.
The heatmap reveals why the encoding works: High frequencies (left) distinguish neighboring positions, low frequencies (right) encode global position in the sequence.
Each position receives a unique vector through sine and cosine functions of different frequencies. Low dimensions change quickly (high frequency), high dimensions slowly (low frequency). This allows the model to distinguish both local and global positions.
pos: Position of the token (0 to n-1)
i: Dimension index (0 to dmodel/2)
dmodel: Embedding dimensions (e.g., 512)
Vertical stripes: Show the periodic nature of sine/cosine functions. Low dimensions have narrow stripes (high frequency), high dimensions wide stripes (low frequency).
Horizontal variation: Shows how encoding changes across positions. Each position has a unique fingerprint.
Position Encoding is added to token embeddings:
Original Transformer (2017) uses sinusoidal PE. Modern models often use: