2024: Era of Hidden Reasoning

OpenAI o1 & DeepSeek-R1 show: LLMs can think internally in complex ways. Reasoning is not "prompt engineering" but an emergent capability trainable via GRPO & RL.

2025: Era of Visible Thinking

Claude 4.5 Effort Parameter, GPT-5.1 Adaptive Thinking, Qwen3 Budget: Users explicitly control reasoning depth. Shift from hidden to user-controlled.

Effort & Budget as UI Primitives

No longer: "prompt X for more CoT". Instead: Slider from 1-10 effort. Models themselves decide how many tokens are needed for thinking. More intuitive API.

Sparse Attention Production-Ready

DeepSeek DSA (Dec 2025): 60% cost reduction, 3.5x speed increase, 70% memory reduction. Sparse is no longer research - it's now standard for long contexts (1M+).

Multimodal Early Fusion

Llama 4, Gemini 3: Vision + text in the same sequence instead of separate pipes. Cross-modal reasoning better than Late Fusion. Unified architecture wins.

Dual-Mode Models are the Future

One model, two modes: Fast (instant) + Deep (thinking). Users choose based on task. Efficient for simple questions, capable for complex ones. Best solution.