Chapter 1.2 · Visualization

Embedding Space in 2D

Words as points in semantic space – similar meanings are close together. This t-SNE projection reduces ~8,000 dimensions to 2D.

Embeddings transform discrete token IDs into continuous vectors. These vectors live in a high-dimensional space where semantic similarity is represented by geometric proximity: "king" and "queen" are close together.

📖 Learning Context ▼

Understand how token IDs are transformed into continuous vectors
Recognize that semantic similarity means geometric proximity
Understand the role of embedding dimension (d_model) for model capacity

Step 2/8 Transformer Fundamentals

After tokenization (Step 1), we have discrete token IDs. Embeddings transform these into continuous vectors that enable mathematical operations. These vectors then flow through Positional Encoding (Step 3) into the Attention calculation.

The embedding dimension (d_model) defines the model's capacity to represent meaning. Larger dimensions (GPT-4: 12,288, Llama 3 70B: 8,192) enable finer semantic distinctions but require more compute. The embedding matrix often contains the first quarter of all model parameters.

Embeddings are learnable lookup tables (vocabulary × d_model parameters)
Cosine similarity measures semantic proximity between vectors
Typical dimensions: 4,096-12,288 (depending on model size)

Word Embedding Space (t-SNE Projection)

Dimension 1 (t-SNE)

Dimension 2

💡 What does this visualization show?

Every word in an LLM is represented by a high-dimensional vector (e.g., d = 8,192 for Llama 3 70B). These vectors capture semantic relationships: words with similar meaning have similar vectors and are close together in space. The t-SNE projection makes this structure visible in 2D – notice the clear clusters for animals, countries, verbs, and adjectives.

Embedding Space in 2D

Learning Objectives

Context: Where are we?

Why it matters

Key Takeaways