In-Context Learning Demo

Interactive demonstration: How the model learns from a few examples in context to solve a new example

In-Context Learning is the secret behind GPT-3's "few examples, big impact". Without finetuning, the model learns new tasks only through examples in the prompt. This demo shows how attention patterns recognize example structure and transfer it to new inputs.

📖 Learning Context ▼

Understand how LLMs learn from examples without parameter updates
Recognize the role of Induction Heads in pattern matching
Follow the difference between Zero-Shot, Few-Shot, and Fine-Tuning

Step 1/4 In-Context Learning & Prompting

ICL is the foundation for Prompt Engineering. All further techniques – System Prompts, Few-Shot, Chain-of-Thought – build on this mechanism.

ICL enables using a single model for thousands of different tasks without retraining. This makes LLMs "general purpose" tools instead of specialists.

No Gradients: Weights remain unchanged, only the context changes
Induction Heads: Special attention patterns copy patterns from examples
Emergent Ability: ICL only appears reliably at ~100M parameters

Sentiment Analysis Example

Few-Shot Examples (in context):

Input: "The product is great!" → Label: Positive

Input: "Terrible, totally disappointed." → Label: Negative

Input: "It's okay, nothing special." → Label: Neutral

New example to classify:

How ICL Works

Pattern Recognition: The model recognizes the format: "Input → Label". It looks for recurring patterns in the sequence and applies them to new inputs.

Induction Heads Circuit: Research shows that special attention heads (Induction Heads) implement this mechanism: They copy the next token based on repetition of previous patterns.

Non-parametric Learning: Unlike traditional machine learning, the model is not retrained. Instead, it uses the context window (up to 128K!) to "program" new tasks.

Min et al. Discovery (2022): "Demonstrations are even more important than what is shown." Format and structure play a larger role than correct labels. The model learns mainly from the format.

Best Practices: Use XML/Markdown tags to provide structure (<text>Example</text> helps more than plain text). Relevant examples are important. More than 5-10 examples usually brings no further improvement.

Practical Limits: Large models (100B+) show strong ICL. Small models (7B-13B) show weak ICL. This is a form of "Emergence": The ability only appears at a certain model size.

In-Context Learning Demo

Learning Objectives

Context: Where are we?

Why It Matters

Key Takeaways

Sentiment Analysis Example

Try Other Tasks

How ICL Works