Inference Cost Calculator

📦 Model

📏 Context Length

128K tokens

📝 Query Tokens

500 tokens

🧠 Effort / Thinking

Level 5 (Balanced)

⚡ Enable Sparse Attention

Off (Dense Mode)

💾 Dense Attention Cost

Input Cost $0.384

Output Cost $0.007

Thinking Cost $0.000

Total Cost $0.391

Est. Time 15.2s

⚡ Sparse Attention Cost

Input Cost $0.154

Output Cost $0.003

Thinking Cost $0.000

Total Cost $0.157

Est. Time 4.3s

💰 Cost Comparison Over Context Length

Model Pricing Strategy

Input tokens cheaper than output (context is "one-time", output is generated). Claude 4.5: 1:5 ratio. GPT-5.1: 1:3. DeepSeek: nearly equal (training cost-intensive).

Sparse Attention Saves 60%

Sparse Attention reduces KV-Cache compute by 60%, but costs extra for Lightning Indexer. Net: 40-60% cost savings on long contexts (256K+).

Long Context Trade-offs

Under 32K: Sparse not worth it (overhead). 32K-256K: Sparse wins. Over 256K: Sparse is a MUST (Dense becomes prohibitively expensive).

Effort / Thinking Budget

Effort parameter multiplies thinking tokens: Effort 1 = 100 tokens, Effort 10 = 1000 tokens. Linear cost relationship. Per level: +10% cost, +5% quality.

Cost per Quality Metric

DeepSeek-V3.2: Cheaper, but less reasoning. Claude 4.5: More expensive, but better effort control. GPT-5.1: Adaptive (auto-selects thinking). ROI depends on task.

Production Optimization

Smart caching: Request 1: full cost. Request 2 (same context): only output tokens. Hybrid: Sparse + Dense (rerank top-10). Prompt caching: -50% input cost.

Inference Cost Calculator

Learning Objectives

Context: Where are we?

Why It Matters

Key Takeaways