LLM Explorer – Embedding Lookup

Embedding Lookup Animation

Step by step: How Token IDs are converted into high-dimensional vectors through a simple lookup operation in the Embedding Matrix.

The Embedding Lookup is the first step in every LLM computation: A token ID (e.g., 15234) becomes row 15234 of the Embedding Matrix – and thus transforms into a 768- or 4096-dimensional vector that encodes meaning.

📖 Learning Context ▼

Understand the lookup as a matrix operation
Contextualize embedding dimensions
Grasp the role of the embedding matrix in training

Step 4/5 Chapter 8: Tools & Glossary

Embeddings & Tokens (4/5) explains how text becomes vectors – the input for all Transformer operations.

Without embeddings, no deep learning on text. The animation shows the fundamental operation that brings words into the geometric space where meaning is encoded as position.

Lookup: Token ID → Row of the matrix
Dimension: 768 (BERT) to 8192 (GPT-4)
Trainable: Embeddings are learned during training

Embedding Lookup:
E ∈ ℝ^(V×d) ... Embedding Matrix
V = Vocabulary Size (e.g., 50,000)
d = Embedding Dimension (e.g., 512)

Lookup Operation:
embedding = E[token_id, :]

This is a simple row selection – no matrix multiplication needed!

Why Lookup?

The embedding matrix is a trainable lookup table. Each row corresponds to a token and contains its learned vector. The lookup is an O(1) operation.

Dimensions in Practice

Original Transformer: 512. BERT: 768. GPT-3: 12,288. Llama 2 7B: 4,096. Llama 3 70B: 8,192. Larger dimensions = more capacity, but also more parameters.

Parameter Count

Embedding matrix has V × d parameters. For GPT-4 (~100K vocabulary, estimated 12K dimension): 1.2 billion parameters just for embeddings!

Training

Embedding vectors are learned during pretraining through backpropagation. Semantically similar tokens develop similar vectors (see embedding-space-2d.html).

Embedding Lookup Animation

Learning Objectives

Context: Where are we?

Why It Matters

Key Takeaways