Reasoning & Test-Time Compute

You now understand how modern LLMs solve complex problems: Chain-of-Thought prompting, hidden reasoning, and how test-time compute improves performance on difficult tasks.

Chain-of-Thought (CoT) o1/o3 Hidden Reasoning DeepSeek R1 & GRPO Compute Allocation Test-Time Scaling Effort Parameter Thinking Budget Dual-Mode Models

Continue with Chapter 4

Optimizations & Memory

Learn the techniques that make LLMs fast and memory-efficient: KV-Cache, RoPE, ALiBi, Sliding Window Attention, Ring Attention, Paged Attention, and RAG pipelines.

Progress: Chapter 3 of 8