How do temperature, top-k, and top-p sampling control LLM generation?
Temperature rescales the logits before softmax: low values sharpen the distribution toward greedy deterministic output and high values flatten it for more randomness. Top-k restricts sampling to the k most likely tokens, and top-p or nucleus sampling restricts it to the smallest set of tokens whose cumulative probability exceeds p, both trimming the unlikely tail.
How to think about it
Temperature rescales the logits before softmax: low values sharpen the distribution toward greedy deterministic output and high values flatten it for more randomness. Top-k restricts sampling to the k most likely tokens, and top-p or nucleus sampling restricts it to the smallest set of tokens whose cumulative probability exceeds p, both trimming the unlikely tail.