datarekha

What is chain-of-thought prompting and when does it help?

The short answer

Chain-of-thought (CoT) prompting instructs the model to write out intermediate reasoning steps before producing a final answer, which improves accuracy on multi-step arithmetic, logic puzzles, and compositional questions. It is most impactful on models with at least ~10B parameters and on tasks where the answer space is large enough that guessing is hard.

How to think about it

Chain-of-thought (CoT) is a prompting technique in which the model is guided — via examples or a simple instruction — to produce a step-by-step reasoning trace before its final answer. The trace acts as a scratchpad that keeps the model on track across multi-hop inferences.

Variants

VariantHowWhen
Few-shot CoTProvide 2–8 examples with full reasoning tracesWhen you have labeled examples and need reproducibility
Zero-shot CoTAppend “Let’s think step by step.” to the promptQuick uplift with no examples
Self-consistencySample N CoT paths, take majority voteHigh-stakes tasks, extra latency acceptable
Least-to-mostDecompose problem into sub-problems, solve sequentiallyCompositional tasks (code, planning)
from openai import OpenAI

client = OpenAI()

SYSTEM = """You are a precise reasoning assistant.
Always reason step-by-step before giving your final answer.
Format:
Reasoning: <steps>
Answer: <final answer>"""

def cot_answer(question: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": SYSTEM},
            {"role": "user", "content": question},
        ],
        temperature=0,
    )
    return response.choices[0].message.content

print(cot_answer("A train leaves at 9 AM going 60 mph. Another leaves at 11 AM going 90 mph. When do they meet if starting 300 miles apart?"))

Why it works

Autoregressive models condition each token on all prior tokens. Writing out intermediate steps forces the model to produce correct intermediate state in the context window, which it then conditions on when generating the next step. Without CoT, the model must compress all reasoning into its hidden states — a much harder task.

Limits

CoT does not help when the model genuinely lacks the required knowledge, when the task is purely perceptual, or when the output is a short lookup (latency cost not justified). Smaller models (sub-7B) benefit less because they often generate plausible-sounding but incorrect reasoning chains.

Keep practising

All NLP & LLMs questions

Explore further

Skip to content