What is chain-of-thought prompting and when does it help?

For AI / LLM Engineer ML Engineer Data Scientist

The short answer

Chain-of-thought (CoT) prompting instructs the model to write out intermediate reasoning steps before producing a final answer, which improves accuracy on multi-step arithmetic, logic puzzles, and compositional questions. It is most impactful on models with at least ~10B parameters and on tasks where the answer space is large enough that guessing is hard.

How to think about it

Chain-of-thought (CoT) is a prompting technique in which the model is guided — via examples or a simple instruction — to produce a step-by-step reasoning trace before its final answer. The trace acts as a scratchpad that keeps the model on track across multi-hop inferences.

Variants

Variant	How	When
Few-shot CoT	Provide 2–8 examples with full reasoning traces	When you have labeled examples and need reproducibility
Zero-shot CoT	Append “Let’s think step by step.” to the prompt	Quick uplift with no examples
Self-consistency	Sample N CoT paths, take majority vote	High-stakes tasks, extra latency acceptable
Least-to-most	Decompose problem into sub-problems, solve sequentially	Compositional tasks (code, planning)

from openai import OpenAI

client = OpenAI()

SYSTEM = """You are a precise reasoning assistant.
Always reason step-by-step before giving your final answer.
Format:
Reasoning: <steps>
Answer: <final answer>"""

def cot_answer(question: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": SYSTEM},
            {"role": "user", "content": question},
        ],
        temperature=0,
    )
    return response.choices[0].message.content

print(cot_answer("A train leaves at 9 AM going 60 mph. Another leaves at 11 AM going 90 mph. When do they meet if starting 300 miles apart?"))

Why it works

Autoregressive models condition each token on all prior tokens. Writing out intermediate steps forces the model to produce correct intermediate state in the context window, which it then conditions on when generating the next step. Without CoT, the model must compress all reasoning into its hidden states — a much harder task.

Limits

CoT does not help when the model genuinely lacks the required knowledge, when the task is purely perceptual, or when the output is a short lookup (latency cost not justified). Smaller models (sub-7B) benefit less because they often generate plausible-sounding but incorrect reasoning chains.

What is chain-of-thought prompting and when does it help?

Keep practising

Explore further