What is chain-of-thought prompting and when does it help?
Chain-of-thought (CoT) prompting instructs the model to write out intermediate reasoning steps before producing a final answer, which improves accuracy on multi-step arithmetic, logic puzzles, and compositional questions. It is most impactful on models with at least ~10B parameters and on tasks where the answer space is large enough that guessing is hard.
How to think about it
Chain-of-thought (CoT) is a prompting technique in which the model is guided — via examples or a simple instruction — to produce a step-by-step reasoning trace before its final answer. The trace acts as a scratchpad that keeps the model on track across multi-hop inferences.
Variants
| Variant | How | When |
|---|---|---|
| Few-shot CoT | Provide 2–8 examples with full reasoning traces | When you have labeled examples and need reproducibility |
| Zero-shot CoT | Append “Let’s think step by step.” to the prompt | Quick uplift with no examples |
| Self-consistency | Sample N CoT paths, take majority vote | High-stakes tasks, extra latency acceptable |
| Least-to-most | Decompose problem into sub-problems, solve sequentially | Compositional tasks (code, planning) |
from openai import OpenAI
client = OpenAI()
SYSTEM = """You are a precise reasoning assistant.
Always reason step-by-step before giving your final answer.
Format:
Reasoning: <steps>
Answer: <final answer>"""
def cot_answer(question: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SYSTEM},
{"role": "user", "content": question},
],
temperature=0,
)
return response.choices[0].message.content
print(cot_answer("A train leaves at 9 AM going 60 mph. Another leaves at 11 AM going 90 mph. When do they meet if starting 300 miles apart?"))
Why it works
Autoregressive models condition each token on all prior tokens. Writing out intermediate steps forces the model to produce correct intermediate state in the context window, which it then conditions on when generating the next step. Without CoT, the model must compress all reasoning into its hidden states — a much harder task.
Limits
CoT does not help when the model genuinely lacks the required knowledge, when the task is purely perceptual, or when the output is a short lookup (latency cost not justified). Smaller models (sub-7B) benefit less because they often generate plausible-sounding but incorrect reasoning chains.