From input to response, explore the architecture behind it GPT-4, Claude and Llama. Click on a building block to find out more.
The user enters a text prompt. This can be a question, be an instruction or a context.
Tokenization, embeddings, self-attention and the mathematical foundations of modern LLMs.
MoE, Grouped Query Attention, Flash Attention, Sparse Attention (DSA) and Native Multimodal.
Chain-of-Thought, o1/o3 Hidden Reasoning, DeepSeek R1, Effort Parameter and Thinking Budget.
KV-Cache, RoPE, Sliding Window Attention, Paged Attention and RAG Pipeline.
In-Context Learning, System Prompts, Lost-in-the-Middle and Few-Shot Patterns.
RLHF, DPO, sampling strategies, quantization and speculative decoding.
Benchmark evolution, emergence timeline and attention scaling in historical context.
Parameter calculator, model database, vocabulary explorer and interactive tools.