What are reasoning models, and what is test-time compute?

For AI / LLM Engineer research-engineer ML Engineer

The short answer

Reasoning models are trained to produce an extended chain of thought before answering, often via reinforcement learning, so they spend more computation deliberating on hard problems. Test-time compute is the idea of improving answer quality by allocating more inference-time compute, for example longer reasoning chains, sampling multiple solutions, or self-verification, rather than only scaling parameters.

How to think about it

Reasoning models are trained to produce an extended chain of thought before answering, often via reinforcement learning, so they spend more computation deliberating on hard problems. Test-time compute is the idea of improving answer quality by allocating more inference-time compute, for example longer reasoning chains, sampling multiple solutions, or self-verification, rather than only scaling parameters.

Learn it properly Reasoning models & test-time compute

Keep practising

What is Chain-of-Thought prompting and how does it aid reasoning? What is chain-of-thought prompting and when does it help? How do you choose between batch and real-time inference for a model? When and how should you trigger model retraining — scheduled vs. event-driven? What goes in a model card, and how do you provide explainability for production decisions?

All NLP & LLMs questions

Explore further

Tree of Thoughts ReWOO: plan-execute without observation Speculative Decoding

Reasoning Model Test-Time Compute Chain-of-Thought ReAct