Examples: machine learning · neural networks · natural language processing · computer vision · reinforcement learning · data science · probability distributions

BM25 (Sparse/Keyword)

Latency
1-5ms
Precision@5
72%
Storage
100MB

Dense (Embedding-based)

Latency
50-200ms
Precision@5
88%
Storage
5GB

BM25: Fast and Simple

Keyword-based approach based on word frequencies and positions. No ML training needed, extremely fast.

Dense: Semantically Intelligent

Embedding-based, understands meaning. Better at paraphrases and semantically similar documents.

Trade-off: Speed vs Quality

BM25 is 10-100× faster, but Dense has better semantic quality. Choose based on use case.

Hybrid Approach

Combination: 30% BM25 + 70% Dense. Best balance between speed and accuracy in production.

Scaling

BM25 scales linearly, Dense requires Vector-DB (FAISS, Milvus). For large corpora: Hybrid or Dense only.

Real-World Usage

Google Search: BM25 as filter, then ranker. RAG systems: Dense Retrieval, BM25 as fallback.