CHAPTER 5.3 · CONTEXT UTILIZATION

Lost-in-the-Middle

Why LLMs overlook information in the middle of long contexts – and how to solve the problem

Lost in the Middle is the paper that woke up RAG developers: Even with 32K context, information in the middle is reliably ignored. This visualization shows the U-curve in action and explains countermeasures.

📖 Learning Context ▼

Understand the Lost-in-the-Middle phenomenon quantitatively
Recognize the implications for RAG and long documents
Learn practical solution strategies

Step 3/4 In-Context Learning & Prompting

Central visualization for the attention distribution topic. Shows why more context isn't automatically better.

Google's paper "Lost in the Middle" (2023) showed: With 20 documents in context, the middle document is ignored in 50% of cases. This has massive consequences for RAG architectures.

Position 1 + N: First and last chunks are reliably used
Solutions: Relevance reranking, chunk ordering, multi-query
Benchmark: NIAH (Needle in a Haystack) tests position robustness

The Phenomenon: The U-Curve of Attention

Despite large context windows (32K, 100K+ tokens), LLMs show a surprising behavior: They forget information in the middle and focus on beginning and end.

This leads to a characteristic U-shaped attention distribution (U-curve): Information at the beginning is processed well, forgotten in the middle, attended to again at the end.

Context Length: 8K Tokens

Fig. 1 | The U-Curve: Attention and information processing as a function of position in context. Beginning ✓, Middle ✗, End ✓. The slider changes the context length.

What does this mean in practice?

Beginning: System prompts and early instructions are processed well ✓
Middle: Long documents, context information is overlooked ✗
End: Current questions and requests are prioritized ✓

Why is this a problem?

In RAG pipelines or long-context QA, critical information can be in the middle of a document – exactly where the model doesn't look.

System Prompts Benefit

System prompts at the beginning are processed well. This is one of the reasons why beginning positioning of instructions is important.

Causes of the U-Curve

The U-shaped attention arises from two factors:

1. Attention Masking Techniques

Transformers use causal attention masking: Each token can only look at previous tokens. This leads to structural biases:

Early tokens receive attention from many later tokens
Late tokens receive attention from few later tokens (only the final ones)
Middle tokens: Unfortunate position in between

2. Training Data Biases

Training data has biased patterns:

Beginning: Important metadata, headings, context often at the beginning
End: Newest information, updates, conclusions often at the end
Middle: Less critical for many text patterns (filler text)

The model implicitly learns that beginning and end are more important. This trained bias manifests as the U-curve.

The Problem Mechanistically

Causal Masking + Training Bias
→ Structural bias in attention patterns
→ U-shaped attention distribution
→ Middle information gets "lost"

Result: Long-context capability is an illusion.
Models can process long contexts,
but only actively use beginning/end.

Practical Demonstration: Document Retrieval

Document Position:

Fig. 2 | RAG scenario: When a relevant document is placed at the beginning, it is processed correctly. In the middle: Model ignores it. At the end: Model attends to it again. Buttons switch the document position.

Scenario: Question Answering via Retrieval

Prompt Structure:
1. Multiple documents (from Retrieval)
2. User question at the end

Problem: When relevant document is in the middle:
→ Model doesn't find the answer
→ "I don't know" or hallucinations

Solution: Arrange documents strategically
→ Most important at beginning/end
→ Less important in the middle

Impact on RAG and Long-Context Systems

The RAG Problem

In Retrieval-Augmented Generation (RAG) pipelines, the U-curve becomes particularly problematic:

Scenario	Document Position	Success Rate	Implication
Document at beginning	Position 0%	~95%	Is attended to and processed
Document in the middle	Position 50%	~50%	Is often ignored
Document at the end	Position 100%	~90%	Is attended to (before question)

Problem: Naive Ranking

Standard retrieval ranks by relevance. But the top-K documents should be at the beginning/end, not in the middle!

Solution: Position-Aware Ranking

Found-in-the-Middle Calibration: Rank by relevance AND position. Consider the U-curve.

Found-in-the-Middle Calibration

Approach: Arrange retrieval results so that important documents don't end up in the middle.

Found-in-the-Middle Strategy

Traditional RAG:
Ranking: Top-1 (relevant) → Middle (next best) → Bottom
→ Middle documents end up in LLM context middle!

Found-in-the-Middle:
Position: Beginning (Top-1) + End (Top-2-5) + Middle (Less important)
→ Best relevance is positioned where LLM looks

Result: ~15% improvement in retrieval quality

Solutions for Lost-in-the-Middle

1. Position-Aware Ranking (RAG)

Relevance ranking combined with position-awareness
Top-K documents: Distribute between beginning and end
Less important: Middle
Result: ~15% quality gain

2. Prompt Design Strategies

System prompts at top: Position at top (processed well)
Questions at end: User question directly before the answer
Important info top/bottom: Critical material not in middle

3. Alternative: Position Shuffling

Randomize document order during training (research approach)
Goal: Model becomes position-agnostic
Status: Successful in research, not yet widespread in production

4. Architecture Improvements (Future)

ALiBi (Attention with Linear Biases): More position-independent attention
Relative position encoding: Focus on distance, not absolute position
Explicit long-context training: Models trained on U-curve mitigation

Short-term (Practical)

Position-aware RAG and prompt design. Avoid critical information in the middle.

Long-term (Research)

New training strategies and architectures can reduce or eliminate the U-curve.

System Prompts and the U-Curve

A practical reason why system prompts are positioned at the beginning: They fall in the high-attention beginning region of the U-curve!

Why does this work?

System prompts define the task and behavior
These are processed well due to the U-curve structure
Model "remembers" the instructions well
Consequence: Consistent behavior during inference

Prompt Structure in Practice:

[SYSTEM PROMPT] ← High attention (beginning)
[User Context/Documents] ← Mixed
[User Question] ← High attention (end)

This structure optimally utilizes the U-curve!

Key Insights

1️⃣ U-Shaped Attention

LLMs show structurally higher attention at the beginning and end, not in the middle – despite large context windows.

2️⃣ Training and Architecture Effect

Combination of causal masking and training data biases creates the U-curve. Not easy to fix.

3️⃣ Practical Consequences

Long contexts are less useful than they appear. Only beginning and end are actively used.

4️⃣ RAG Problem

Standard retrieval ranking ignores position. Found-in-the-Middle Calibration: +15% through better positioning.

5️⃣ Design Implication

System prompts at top, question at end = best position. Critical info not in the middle.

6️⃣ Future: Fixable?

Research on position shuffling and new architectures. But not yet standard in production.