Sequence Length: dk:
Speed:
Step 0 / 16 0%
Ready
Click "Next Step" or "Play" to start the calculation.
Query Matrix
Q (Query)
n × dk
×
Key Matrix (Transposed)
KT (Key transposed)
dk × n
=
Attention Scores
n × n
Current Calculation
Select a step to see the calculation.
Attention(Q, K, V) = softmax(QKT / √dk) · V
Scaling by √dk prevents large values from saturating the Softmax.
💡 What's happening here?

Each cell in the Attention Score Matrix is the dot product of a Query row with a Key column. The value Score[i,j] measures how much Token i should "attend" to Token j. High values mean high relevance. After softmax, these scores become normalized weights that determine how much information flows from each token.