From input to output – explore the architecture behind GPT-4, Claude, and Llama. Click on any component to learn more.
The user enters a text prompt. This can be a question, an instruction, or context.
Tokenization, Embeddings, Self-Attention, and the mathematical foundations of modern LLMs.
MoE, Grouped Query Attention, Flash Attention, Sparse Attention (DSA), and Native Multimodal.
Chain-of-Thought, o1/o3 Hidden Reasoning, DeepSeek R1, Effort Parameter, and Thinking Budget.
KV-Cache, RoPE, Sliding Window Attention, Paged Attention, and RAG Pipeline.
In-Context Learning, System Prompts, Lost-in-the-Middle, and Few-Shot Patterns.
RLHF, DPO, Sampling Strategies, Quantization, and Speculative Decoding.
Benchmark Evolution, Emergence Timeline, and Attention Scaling in historical context.
Parameter Calculator, Model Database, Vocabulary Explorer, and interactive tools.
How familiar are you with Large Language Models?
Choose what interests you most
Based on your answers, we recommend:
Learn how text is split into tokens – the first step in any LLM processing.
Alternatively: