Dense vs Sparse Retrieval

Comparison of BM25 keyword search and embedding-based approaches

Dense vs. Sparse Retrieval compares two paradigms: Embedding-based search (Dense) vs. token matching (Sparse/BM25). Modern RAG systems combine both for optimal results.

📖 Learning Context

🎯 Learning Objectives

Understand dense retrieval (embeddings, semantic search)
Understand sparse retrieval (BM25, lexical matching)
Know hybrid approaches (combination of both)

🧭 Context

Step 4/5 in Chapter 2 "Modern Architecture Variants"

Application of sparse concepts in RAG. Shows how retrieval optimization affects inference quality.

💡 Why It Matters

Hybrid Retrieval (e.g., Cohere Rerank) combines BM25 + Embeddings for best results. 30% BM25 + 70% Dense is a proven production standard.

🔑 Key Takeaways

Dense = semantic: Finds paraphrases and similar meanings
Sparse = lexical: Finds exact word matches quickly
Hybrid + Reranking: Combines strengths of both approaches

BM25: Fast and Simple

Keyword-based approach based on word frequencies and positions. No ML training needed, extremely fast.

Dense: Semantically Intelligent

Embedding-based, understands meaning. Better at paraphrases and semantically similar documents.

Trade-off: Speed vs Quality

BM25 is 10-100× faster, but Dense has better semantic quality. Choose based on use case.

Hybrid Approach

Combination: 30% BM25 + 70% Dense. Best balance between speed and accuracy in production.

Scaling

BM25 scales linearly, Dense requires Vector-DB (FAISS, Milvus). For large corpora: Hybrid or Dense only.

Real-World Usage

Google Search: BM25 as filter, then ranker. RAG systems: Dense Retrieval, BM25 as fallback.