datarekha

How do temperature, top-k, and top-p sampling control LLM generation?

The short answer

Temperature rescales the logits before softmax: low values sharpen the distribution toward greedy deterministic output and high values flatten it for more randomness. Top-k restricts sampling to the k most likely tokens, and top-p or nucleus sampling restricts it to the smallest set of tokens whose cumulative probability exceeds p, both trimming the unlikely tail.

How to think about it

Temperature rescales the logits before softmax: low values sharpen the distribution toward greedy deterministic output and high values flatten it for more randomness. Top-k restricts sampling to the k most likely tokens, and top-p or nucleus sampling restricts it to the smallest set of tokens whose cumulative probability exceeds p, both trimming the unlikely tail.

Learn it properly Structured outputs

Keep practising

All NLP & LLMs questions

Explore further

Skip to content