Sampling Settings Guide

Choose ideal Temperature, Top-K, and Top-P settings based on your task – from factually precise to creative and open

Sampling settings determine how creative or deterministic an LLM responds. Temperature, Top-K, and Top-P are the most important controls – and their optimal combination depends heavily on the use case.

📖 Learning Context ▼

Understand the effect of Temperature
Know the differences between Top-K vs. Top-P
Be able to choose task-specific settings

Step 3/4 Training & Inference

After Training (1/4) and RLHF (2/4), we come to Sampling (3/4) – how the model selects during generation.

Wrong sampling settings ruin even the best model. Too high temperature for facts = hallucinations. Too low for creativity = boring. The right balance is critical.

Facts: Low temperature (0-0.3), high Top-K
Creative: Higher temperature (0.7-1.0), Top-P ~0.9
Balanced: Temperature ~0.5, Top-P ~0.95

📋 Current Recommendation

For QA & Facts:
Use Temperature 0.1-0.3 (concentrated on top logits), Top-K=0 (disabled), Top-P=0.9 (optional, usually unnecessary at low temp). Result: Accurate, consistent answers. Ideal for knowledge-intensive tasks.

Task Type	Temperature	Top-K	Top-P	Use Case	Output Style
QA & Facts	0.1-0.3	0	0.9	News, Wikipedia-style answers	Precise, Deterministic
General Chat	0.7-0.9	50	0.95	Normal conversation, Balanced	Natural, Varied
Creative Writing	1.2-1.5	100	0.98	Storytelling, Brainstorming	Creative, Surprising
Coding	0.2-0.5	20	0.95	Code Generation, Debugging	Correct, Syntactic
Summarization	0.3-0.6	0	0.9	Text Summarization	Concise, Focused

Task Type

Temperature

Top-K

Top-P

Use Case

Output Style

QA & Facts

0.1-0.3

0.9

News, Wikipedia-style answers

Precise, Deterministic

General Chat

0.7-0.9

0.95

Normal conversation, Balanced

Natural, Varied

Creative Writing

1.2-1.5

100

0.98

Storytelling, Brainstorming

Creative, Surprising

Coding

0.2-0.5

0.95

Code Generation, Debugging

Correct, Syntactic

Summarization

0.3-0.6

0.9

Text Summarization

Concise, Focused

🌡️

Temperature Scales Logits

P(x_i) = exp(z_i/T) / Σ. T→0: converges to argmax (greedy). T>1: flattens (more randomness). Top choice: T=0.7 for balance.

🔪

Top-K is Simple but Harsh

Keeps only k most probable tokens. Problem: k=50 can be good (lots of choice) or bad (too much noise). Adaptive alternative: Top-P.

📊

Top-P (Nucleus) is Adaptive

Keeps tokens until cumulative probability ≥ P. At high confidence: small nucleus (1-2 tokens). At uncertainty: larger (10+ tokens). Usually better than Top-K.

⚖️

Combination Matters

Rarely use Top-K and Top-P together (redundant). Standard: Temperature + (Top-P OR Top-K). Top-P is the modern recommendation.

🎯

Low T Doesn't Need Top-K

At T=0.3, softmax is concentrated, Top-K/Top-P usually unnecessary. At T=1.0+, Top-P needed to filter noise.

🧪

No Universal Values

Models have different baseline logits. GPT-4: T=0.5. Llama: T=0.8. Test your combination with real prompts.

Sampling Settings Guide

Learning Objectives

Context: Where are we?

Why It Matters

Key Takeaways