Interactive Tutorial

How Modern Large Language Models Work

From input to output – explore the architecture behind GPT-4, Claude, and Llama. Click on any component to learn more.

📝

Input

Text / Prompt

✂

Tokenizer

BPE / ~100k Tokens

🎯

Embeddings

d = 8,192 dim

🔮

Transformer Stack

x80 Blocks (Llama 3 70B)

Transformer Block Structure

🔮

Multi-Head Attention

➕

Residual + RMSNorm

⚡

SwiGLU Feedforward

➕

Residual + RMSNorm

📊

Output Layer

🎲

Sampling

📝 Input

Chapter 1

The user enters a text prompt. This can be a question, an instruction, or context.