Interactive Tutorial

How Modern Large Language Models Work

From input to output – explore the architecture behind GPT-4, Claude, and Llama. Click on any component to learn more.

📝
Input
Text / Prompt
Tokenizer
BPE / ~100k Tokens
🎯
Embeddings
d = 8,192 dim
🔮
Transformer Stack
x80 Blocks (Llama 3 70B)
Transformer Block Structure
Residual + RMSNorm
SwiGLU Feedforward
Residual + RMSNorm
📊
Output Layer
🎲
Sampling
Temperature / Top-p
📝 Input
Chapter 1

What happens here?

The user enters a text prompt. This can be a question, an instruction, or context.

Learn more →

Technical Details