Select Task Type
Temperature
0.3
Low = Deterministic | High = Random
Top-K
0
0 = Disabled | >50 = Open
Top-P (Nucleus)
0.9
Adaptive Nucleus Sampling
Probability Distribution for Next Token
Fig. 1 | Green bars show tokens considered by the model. At high temperature, the distribution flattens (more diversity). At high Top-K/Top-P, weak candidates are filtered.
📋 Current Recommendation
For QA & Facts:
Use Temperature 0.1-0.3 (concentrated on top logits), Top-K=0 (disabled), Top-P=0.9 (optional, usually unnecessary at low temp). Result: Accurate, consistent answers. Ideal for knowledge-intensive tasks.
Task Type Temperature Top-K Top-P Use Case Output Style
QA & Facts 0.1-0.3 0 0.9 News, Wikipedia-style answers Precise, Deterministic
General Chat 0.7-0.9 50 0.95 Normal conversation, Balanced Natural, Varied
Creative Writing 1.2-1.5 100 0.98 Storytelling, Brainstorming Creative, Surprising
Coding 0.2-0.5 20 0.95 Code Generation, Debugging Correct, Syntactic
Summarization 0.3-0.6 0 0.9 Text Summarization Concise, Focused
🌡️
Temperature Scales Logits
P(x_i) = exp(z_i/T) / Σ. T→0: converges to argmax (greedy). T>1: flattens (more randomness). Top choice: T=0.7 for balance.
🔪
Top-K is Simple but Harsh
Keeps only k most probable tokens. Problem: k=50 can be good (lots of choice) or bad (too much noise). Adaptive alternative: Top-P.
📊
Top-P (Nucleus) is Adaptive
Keeps tokens until cumulative probability ≥ P. At high confidence: small nucleus (1-2 tokens). At uncertainty: larger (10+ tokens). Usually better than Top-K.
⚖️
Combination Matters
Rarely use Top-K and Top-P together (redundant). Standard: Temperature + (Top-P OR Top-K). Top-P is the modern recommendation.
🎯
Low T Doesn't Need Top-K
At T=0.3, softmax is concentrated, Top-K/Top-P usually unnecessary. At T=1.0+, Top-P needed to filter noise.
🧪
No Universal Values
Models have different baseline logits. GPT-4: T=0.5. Llama: T=0.8. Test your combination with real prompts.