Few-Shot Scaling Plateau

N-Shot Scaling and the Plateau Phenomenon

Why accuracy rises quickly but reaches a saturation point after 5-10 examples – an analysis of scaling behavior

N-Shot Scaling reveals a surprising plateau: Accuracy rises quickly with the first 5-10 examples, then barely increases. More examples don't automatically mean better results – this has profound practical consequences.

📖 Learning Context ▼

Understand the plateau phenomenon in few-shot learning
Estimate the optimal number of examples
Recognize diminishing returns in N-shot scaling

Step 4/4 In-Context Learning & Prompting

After ICL fundamentals (1/4), System Prompts (2/4), and Attention Distribution (3/4), we now examine the scaling behavior of few-shot learning.

Understanding the plateau effect optimizes prompt length and thus costs: More than 5-10 examples consume tokens without proportional benefit. This insight saves budget and context window space.

Rapid Rise: 0→5 examples provide the greatest gain
Plateau at ~10: Marginal improvements afterward
Practical: 5-10 examples are usually optimal for cost/benefit

📉

Plateau After 5-10 Examples

The gain curve follows: fast at the beginning → gradual flattening → stagnation. After 10 examples, the marginal gain is often <1% per additional example.

⏱️

Task Difficulty Determines Plateau

Simple tasks: Plateau at 2-3 examples. Medium: at 5-7. Complex: at 10-15. The task complexity limits how much the model can learn from examples.

💰

Token Costs Outweigh Gains

15+ examples cost ~1500 additional tokens with latency impact. The accuracy gain is then minimal. Optimal: 5-10 examples based on task complexity.

🎯

Format is Early, Semantics is Late

Examples 1-2: Model learns format. Examples 3-5: Semantic patterns. Examples 6+: Nuances. But the saturation threshold is predetermined by task nature/complexity.

📊

Power-Law Scaling

Accuracy follows power-law: A(n) = A_∞ - c·n^(-α) where α≈0.4-0.6. This explains the plateau: The exponent is small, gains decrease exponentially faster.

🚀

Larger Models Show Flatter Plateau

70B models: Plateau at 3-5 examples. 7B models: Plateau at 8-12. Larger models have better prior knowledge, need less structure definition.

Task Type	Baseline	Optimal N	Accuracy @ Opt	Gain	Recommendation
Simple	70%	2	88%	+18 pp	2-3 examples, more yields <1% gain
Medium	60%	5	85%	+25 pp	5-7 examples, cost-benefit sweet spot
Complex	45%	10	75%	+30 pp	8-12 examples, over 12 marginal
Very Complex	30%	15	68%	+38 pp	Reconsider: Prompt engineering or fine-tuning?

Task Type

Baseline

Optimal N

Accuracy @ Opt

Gain

Recommendation

Simple

70%

88%

+18 pp

2-3 examples, more yields <1% gain

Medium

60%

85%

+25 pp

5-7 examples, cost-benefit sweet spot

Complex

45%

75%

+30 pp

8-12 examples, over 12 marginal

Very Complex

30%

68%

+38 pp

Reconsider: Prompt engineering or fine-tuning?

N-Shot Scaling and the Plateau Phenomenon

Learning Objectives

Context: Where are we?

Why It Matters

Key Takeaways