Modern Architecture Variants

You now know the advanced architectures that make modern LLMs more efficient and powerful: From Mixture of Experts to Flash Attention, Sparse Attention, and native multimodality.

Mixture of Experts (MoE) Load Balancing Grouped Query Attention Flash Attention Sparse Attention (DSA) Dense vs. Sparse Retrieval Native Multimodal Early Fusion

Continue with Chapter 3

Reasoning & Test-Time Compute

Learn how LLMs learn to "think": Chain-of-Thought reasoning, hidden reasoning in o1/o3, DeepSeek R1, and how flexible inference strategies improve performance on complex tasks.

Progress: Chapter 2 of 8