CHAPTER 5.4c · PROMPT ENGINEERING

Few-Shot Learning

In-Context Learning: How a few examples steer model behavior

Few-Shot Learning shows ICL in action: 2-5 examples in the prompt are enough to tune the model to a new task. This demo illustrates how example quantity and quality affect performance – and where the limits lie.

📖 Learning Context ▼

Understand the effect of example quantity on task performance
Recognize which examples are particularly effective (diverse, clear, relevant)
Understand the limits of few-shot vs. fine-tuning

Step 1/4 In-Context Learning & Prompting

Few-shot is the practical application of ICL. Complements the theoretical ICL demo with experimental insights on example selection and quantity.

GPT-3's paper was titled "Language Models are Few-Shot Learners" – that was the core discovery. Knowing how many examples are optimal saves tokens and improves results.

Sweet Spot: 3-5 examples are often optimal; more yields diminishing returns
Quality over Quantity: Diverse, representative examples beat many similar ones
Order matters: The last examples have the greatest influence

Concept: Few-Shot Learning

Few-Shot Learning means the model learns to recognize a pattern through a few input-output examples in the prompt and applies it to new inputs – without updating any parameters.

Attention Pattern Matching

Attention recognizes the structure of examples and applies it to new inputs.

Format > Correctness

The structure of examples is more important than content correctness (Min et al.).

Diminishing Returns

Performance rises quickly with 1-5 examples, then plateaus.

Experiment: Correct vs. Wrong Labels

Comparison: Same format with correct labels vs. random labels

✓ With Correct Labels

Input: "Great product!"

Output: Positive

Input: "Broken device"

Output: Negative

Input: "It works fine"

Output: Positive

Expected Accuracy: 85-90%

⚡ With Random Labels

Input: "Great product!"

Output: Negative

Input: "Broken device"

Output: Positive

Input: "It works fine"

Output: Neutral

Expected Accuracy: 60-70%

Key Insight (Min et al., 2022):
Even with random labels, the model achieves better performance than without any examples! This proves: The structure and format of examples is more important than content correctness.

N-Shot vs. Accuracy (Diminishing Returns)

Observation: Accuracy rises steeply from 0-Shot → 1-Shot → 5-Shot, then plateaus. After ~8-10 examples, each additional example brings little improvement (Diminishing Returns).

Practical Recommendation: 3-5 high-quality examples are usually optimal. More examples lengthen the prompt (higher costs) without significant improvement.

Best Practices for Few-Shot Prompts

1. Consistent Format

All examples must have the same format (XML tags, JSON, Markdown).

2. Relevant Examples

Examples should cover the variety of expected inputs.

3. Structured Tags

Use XML/JSON for clear demarcation of input and output.

4. Optimal Number

Start with 1-3 examples, test up to max 10.

5. Correctness Matters

While format is more important, labels should still be correct.

6. Positioning

Place high-quality examples preferably at the beginning.

Example: Few-Shot Prompt Structure

// System Prompt

You are a sentiment analyzer.

// Few-Shot Examples

<output>Positive</output>

</example1>

<input>Terrible experience</input>

<output>Negative</output>

</example2>

// New Query

Few-Shot Learning

Learning Objectives

Context: Where are we?

Why It Matters

Key Takeaways

Concept: Few-Shot Learning

Attention Pattern Matching

Format > Correctness

Diminishing Returns

Experiment: Correct vs. Wrong Labels

N-Shot vs. Accuracy (Diminishing Returns)

Best Practices for Few-Shot Prompts

1. Consistent Format

2. Relevant Examples

3. Structured Tags

4. Optimal Number

5. Correctness Matters

6. Positioning

Example: Few-Shot Prompt Structure