datarekha

What are reasoning models, and what is test-time compute?

The short answer

Reasoning models are trained to produce an extended chain of thought before answering, often via reinforcement learning, so they spend more computation deliberating on hard problems. Test-time compute is the idea of improving answer quality by allocating more inference-time compute, for example longer reasoning chains, sampling multiple solutions, or self-verification, rather than only scaling parameters.

How to think about it

Reasoning models are trained to produce an extended chain of thought before answering, often via reinforcement learning, so they spend more computation deliberating on hard problems. Test-time compute is the idea of improving answer quality by allocating more inference-time compute, for example longer reasoning chains, sampling multiple solutions, or self-verification, rather than only scaling parameters.

Learn it properly Reasoning models & test-time compute

Keep practising

All NLP & LLMs questions

Explore further

Skip to content