Lightning Indexer Animation

Step by step: How the Lightning Indexer works – from Query-Token through Index-Score to final Sparse Attention.

Lightning Indexer is the heart of DSA: It identifies the most important token pairs in O(n log n) before the actual attention is computed. Pre-filtering instead of post-filtering.

📖 Learning Context

🎯 Learning Objectives

Understand indexing strategy (pre-filtering)
Follow top-k token selection
Weigh approximation vs. exactness

🧭 Context

Step 4/5 in Chapter 2 "Modern Architecture Variants"

Detailed view of DSA implementation. Shows how the indexer decides which tokens are relevant.

💡 Why It Matters

The indexer uses Locality-Sensitive Hashing (LSH) or learned routing for efficient candidate search. This enables O(n log n) instead of O(n²).

🔑 Key Takeaways

Pre-Filtering: Saves compute through early selection
LSH: Efficiently approximates nearest neighbors
Two-stage: Compute index → Attend on top-k

⚡

Lightning Indexer

The indexer quickly computes relevance scores for all tokens without calculating all attention weights.

🎯

Top-K Selection

Only the top-K most relevant tokens are selected. The rest are ignored, saving enormous memory and compute.

⚙️

Efficient Computation

Instead of O(n²) attention computation, only O(k) is calculated, where k << n. This is the key to scalability.

🚀

Production Ready

DSA with Lightning Indexer is the first production-ready sparse attention method with zero accuracy regression.