Label Format Experiment

Format vs. Content: The Label Experiment

Why even random labels help more than no labels – The surprising finding from Min et al. (2022)

Format vs. Content shows a fundamental insight about ICL: The model doesn't primarily learn from label assignments, but from the format of examples. Even random labels help – as long as the input-output format is clear.

📖 Learning Context ▼

Understand why labels are less important than formatting
Comprehend the experiment by Min et al. (2022)
Derive implications for prompt design

Step 4/4 In-Context Learning & Prompting

After ICL basics (1/4), System Prompts (2/4) and Attention Distribution (3/4), we now examine the mechanisms behind format vs. content learning.

This insight revolutionizes prompt engineering: Consistent format is more important than perfect examples. This also explains why Few-Shot often works better than Zero-Shot – the model learns the response format.

Format > Content: Labels can be random as long as format is consistent
Structure Signal: Examples define the expected output format
Practical Implication: For new tasks, format demonstration is enough

📋

Format > Content

The model primarily learns the format (Input → Label structure), not the semantics. Random labels: 87%, No labels: 60%. Format alone brings +27 points!

🎯

Correct Labels Only Help 8% More

Correct (95%) vs. Random (87%) = only 8 points difference. This shows: Semantic correctness has surprisingly low impact in Few-Shot.

🔀

Inverted Labels Almost as Good as Random

Inverted (82%) vs. Random (87%) – barely any difference. The model uses the format, not the semantic consistency of the labels themselves.

📊

Diminishing Returns After 5-10 Examples

Accuracy rises quickly up to ~5 examples, then the curve flattens. More than 10 examples bring minimal gain (saturation visible).

🏗️

Structured Prompts are Critical

The model uses visual/syntactic structures (XML tags, line breaks, indentation) for pattern matching. Same structure, different semantics = strong performance.

⚠️

Larger Models are Less Format-Dependent

For models >100B, the random-label advantage decreases (as they understand better). Small models are format-dependent, large models can extrapolate semantics.

Condition	Accuracy (Sentiment)	Accuracy (NER)	Accuracy (Topic)	Insight
No Labels	58%	62%	61%	Baseline without structure
Correct Labels	95%	93%	92%	Format + Semantics optimal
Random Labels	87%	85%	86%	Format alone very helpful
Inverted Labels	82%	80%	81%	Weak semantic usage

Condition

Accuracy (Sentiment)

Accuracy (NER)

Accuracy (Topic)

Insight

No Labels

58%

62%

61%

Baseline without structure

Correct Labels

95%

93%

92%

Format + Semantics optimal

Random Labels

87%

85%

86%

Format alone very helpful

Inverted Labels

82%

80%

81%

Weak semantic usage

Format vs. Content: The Label Experiment

Learning Objectives

Context: Where are we?

Why It Matters

Key Takeaways