0 Total Models
0 Displayed
Model Parameters Context License Attention Reasoning Multimodal Sparse Cost Release MMLU HumanEval

No models found. Try a different search or filter combination.

Open Open Source License
Closed Proprietary/Commercial
Research Research Preview

📊 Visualizations: Model Evolution

Parameter vs. Context Window

Model Timeline 2024-2025

Feature Adoption 2025

📌 Insights: LLM Trends 2024-2025

🚀
Reasoning Emergence

DeepSeek-R1 (Jan 2025) showed that Chain-of-Thought reasoning can emerge during GRPO training. All major labs now follow the reasoning-first approach.

💭
Effort Parameter

Claude 4.5 (Nov 2025) introduces the "Effort" parameter: Users directly control thinking time and accuracy. Enables Dual-Mode (Fast + Deep) in one model.

🎨
Early Fusion Multimodal

Llama 4 + Claude 4.5 use Early Fusion: Text and Vision tokens together in the LLM. Enables true cross-modal reasoning, not just image→text.

Sparse Attention Production

DeepSeek-V3.2 (Dec 2025) deploys Sparse Attention in production: 60% memory savings, 4-5× faster at same quality up to 1M+ Token context.

📋
Specialized Benchmarks

New benchmarks (ThinkBench, ELAIPBench) show: Reasoning ability is separate from knowledge ability. Some models excel only in reasoning.

💰
Cost-Performance Tradeoff

DeepSeek-V3.2 breaks the pricing model: 75% cheaper than Claude/GPT at comparable performance. Sparse Attention + MoE Routing enable cost reduction.