The simplest decoding method is Greedy Decoding: Choose the token with the highest probability. This is deterministic and leads to consistent but often boring outputs.
Stochastic Sampling allows variability: Sample tokens according to their probability distribution. This increases creativity but can lead to "illogical nonsense."
To find this balance, modern LLMs use filtering strategies that limit the number of candidate tokens: