Why even random labels help more than no labels – The surprising finding from Min et al. (2022)
Format vs. Content shows a fundamental insight about ICL: The model doesn't primarily learn from label assignments, but from the format of examples. Even random labels help – as long as the input-output format is clear.
After ICL basics (1/4), System Prompts (2/4) and Attention Distribution (3/4), we now examine the mechanisms behind format vs. content learning.
This insight revolutionizes prompt engineering: Consistent format is more important than perfect examples. This also explains why Few-Shot often works better than Zero-Shot – the model learns the response format.
| Condition | Accuracy (Sentiment) | Accuracy (NER) | Accuracy (Topic) | Insight |
|---|---|---|---|---|
| No Labels | 58% | 62% | 61% | Baseline without structure |
| Correct Labels | 95% | 93% | 92% | Format + Semantics optimal |
| Random Labels | 87% | 85% | 86% | Format alone very helpful |
| Inverted Labels | 82% | 80% | 81% | Weak semantic usage |