CHAPTER 5.2b · PROMPT ENGINEERING

System Prompts

How System Prompts control model behavior: Token sequence, Attention weights and practical examples

System Prompts are the invisible hand behind ChatGPT and Claude. This demo shows how different System Prompt styles (short vs. long, vague vs. precise) influence model behavior – from tonality to factual accuracy.

📖 Learning Context ▼

Understand the structure of effective System Prompts
Learn best practices for role instructions, constraints and examples
Be able to balance token efficiency vs. precision

Step 2/4 In-Context Learning & Prompting

Practical complement to the Attention heatmap. Shows how theory (attention distribution) translates into practice (prompt formulation).

Anthropic publishes Claude's System Prompt; OpenAI keeps GPT-4's secret. Both are core parts of the product. Good System Prompts can dramatically improve quality.

Structure: Role → Context → Constraints → Examples → Output Format
Length: Longer prompts give more control, but cost tokens
Jailbreak Resistance: Clear boundaries reduce unwanted behavior

Prompt Structure (Token Sequence)

<|system|> System Prompt Start

Token: You are a helpful assistant. Answer questions accurately.

<|/system|> System Prompt End

<|user|> User Message Start

Token: What is machine learning?

<|/user|> User Message End

<|assist|> Model Output Tokens (generated)

System Prompt (stores instructions)

User Message (current request)

Assistant Output (generated)

Important: System Prompt is a normal token sequence with special markers. There's no "magic" internal treatment - it's processed like other tokens.

Attention on System Prompt

How much does the model attend to different positions (System vs. User)?

User (new)

↕

System (old)

System

→

User

Observation: System Prompt (top) receives more attention weight. User Message (bottom) attends strongly to System Prompt at the beginning due to Causal Masking and Recency Bias.

Claude (Anthropic)

System Prompt Size: ~16,739 words

Purpose: Tool definitions, guidelines

Example Hotfix: "Be more helpful"

Control: Detailed

GPT-4 (OpenAI)

System Prompt Size: ~2,218 words

Purpose: Minimal instructions

Example Hotfix: "Assistant is helpful"

Control: Minimal

Key Insights

🔑 Key Insights

Token Sequence: System Prompt is a normal token sequence, no special treatment
Positioning Effect: Placed at beginning → strong attention due to position and recency
Design Difference: Claude uses detailed prompts, OpenAI minimal
Hotfixes: Small changes can have large effects on behavior
Length Implication: Longer System Prompts = higher "input costs" but better control

System Prompts

Learning Objectives

Context: Where are we?

Why It Matters

Key Takeaways