5
Accuracy by Label Variant
Correct Labels
Random Labels
Inverted Labels
No Labels
Accuracy vs. #Examples
Fig. 1 | Even random labels (orange) help the model better (87% accuracy) than no labels (60%). This proves: Format and structure are more important than content. Inverted labels (purple) show that semantics play a role, but format dominates.
Examples (Current Experiment)
📋
Format > Content
The model primarily learns the format (Input → Label structure), not the semantics. Random labels: 87%, No labels: 60%. Format alone brings +27 points!
🎯
Correct Labels Only Help 8% More
Correct (95%) vs. Random (87%) = only 8 points difference. This shows: Semantic correctness has surprisingly low impact in Few-Shot.
🔀
Inverted Labels Almost as Good as Random
Inverted (82%) vs. Random (87%) – barely any difference. The model uses the format, not the semantic consistency of the labels themselves.
📊
Diminishing Returns After 5-10 Examples
Accuracy rises quickly up to ~5 examples, then the curve flattens. More than 10 examples bring minimal gain (saturation visible).
🏗️
Structured Prompts are Critical
The model uses visual/syntactic structures (XML tags, line breaks, indentation) for pattern matching. Same structure, different semantics = strong performance.
⚠️
Larger Models are Less Format-Dependent
For models >100B, the random-label advantage decreases (as they understand better). Small models are format-dependent, large models can extrapolate semantics.
Condition Accuracy (Sentiment) Accuracy (NER) Accuracy (Topic) Insight
No Labels 58% 62% 61% Baseline without structure
Correct Labels 95% 93% 92% Format + Semantics optimal
Random Labels 87% 85% 86% Format alone very helpful
Inverted Labels 82% 80% 81% Weak semantic usage