Why LLMs understand information at the beginning and end, but ignore the middle – the Lost-in-the-Middle phenomenon
Attention Distribution shows the heatmap behind the Lost-in-the-Middle effect. Different layers have different attention patterns – early layers focus locally, late layers globally. This visualization makes visible where information gets lost.
Detail visualization of the attention distribution. Complements the high-level explanation of Lost-in-the-Middle with layer-by-layer analysis.
Understanding in which layers information gets lost helps develop mitigation strategies – from sparse attention to position interpolation.
| Model | Context | U-Curve | Solution |
|---|---|---|---|
| GPT-4 | 128K | Strong (6.8) | Place documents at front |
| Claude 3.5 | 200K | Medium-weak (5.5) | Question answering format |
| Llama 3 70B | 128K | Strong (7.0) | Hybrid position engineering |
| Mistral 8×7B | 32K | Weak (4.2) | Less susceptible due to SWA |