Transformer Basics

You now understand the fundamental building blocks of modern LLMs: From tokenization through embeddings to the complete transformer block with self-attention, multi-head attention, and feedforward networks.

Tokenization (BPE) Embeddings Positional Encoding Self-Attention Multi-Head Attention Feedforward Networks Residual & LayerNorm Transformer Block

Continue with Chapter 2

Modern Architecture Variants

Discover advanced architectures like Mixture of Experts (MoE), Grouped Query Attention, Flash Attention, Sparse Attention, and native multimodality with Early Fusion.

Progress: Chapter 1 of 8