1
Query Token is activated
Sequence (Tokens)
Index Scores
Top-K Selection
Sparse Attention
Step 1: The Query token (highlighted in red) is activated. The Lightning Indexer begins calculating which Key tokens are relevant.
Lightning Indexer
The indexer quickly computes relevance scores for all tokens without calculating all attention weights.
🎯
Top-K Selection
Only the top-K most relevant tokens are selected. The rest are ignored, saving enormous memory and compute.
⚙️
Efficient Computation
Instead of O(n²) attention computation, only O(k) is calculated, where k << n. This is the key to scalability.
🚀
Production Ready
DSA with Lightning Indexer is the first production-ready sparse attention method with zero accuracy regression.