datarekha

What chunking strategies exist for RAG and how do you choose between them?

The short answer

Chunking splits source documents into retrievable units before embedding. The right strategy depends on document structure, query style, and the model's context window. Fixed-size chunks are simple but break mid-sentence; semantic or structural chunking preserves coherence; hierarchical chunking enables parent-document retrieval for richer context.

How to think about it

Chunking is one of the highest-leverage decisions in a RAG pipeline. The retriever returns chunks, not full documents, so each chunk must be independently meaningful yet small enough that irrelevant text does not dilute the signal.

Common strategies

StrategyDescriptionBest for
Fixed-size (token)Split every N tokens, overlap MFast baseline; homogeneous text
Sentence / paragraphSplit on natural boundariesProse documents, FAQs
Recursive characterLangChain default — tries paragraph, then sentence, then wordGeneral-purpose
Semantic chunkingSplit where cosine similarity between adjacent sentences dropsLong, heterogeneous docs
Structural (Markdown / HTML)Split on headers, sectionsTechnical docs, wikis
Hierarchical (parent-child)Index small child chunks, retrieve parent for contextDense knowledge bases

Choosing chunk size

  • Too large: retrieval returns verbose, noisy chunks; irrelevant sentences hurt answer quality.
  • Too small: the chunk lacks sufficient context for the model to synthesize a good answer.
  • Rule of thumb: target 256–512 tokens with a 10–20 % overlap for general prose.
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,        # tokens (approximate via character proxy)
    chunk_overlap=60,
    separators=["\n\n", "\n", ". ", " ", ""],
)
chunks = splitter.split_text(document_text)

Hierarchical retrieval (parent-document)

Index small child chunks (128 tokens) for precise matching; when a child is retrieved, look up its parent chunk (512 tokens) and send the parent to the LLM. This keeps retrieval precise while giving the model enough context to reason.

# Pseudo-code
child_chunks = split(doc, size=128)
parent_chunks = split(doc, size=512)
# Store mapping: child_id -> parent_id
# At inference: retrieve child, return parent text

Keep practising

All NLP & LLMs questions

Explore further

Skip to content