What is hybrid search and when should you use semantic vs keyword retrieval?
Keyword search (BM25) excels at exact term matching — product codes, proper nouns, rare abbreviations. Semantic search (dense embeddings) captures meaning and handles paraphrases. Hybrid search runs both in parallel and merges scores with Reciprocal Rank Fusion, giving the best of both worlds for most production RAG systems.
How to think about it
Keyword search (BM25 / TF-IDF)
Scores documents by term frequency adjusted for document length. Fast, interpretable, and zero-shot — it needs no training data. Excels when queries contain rare tokens, serial numbers, or jargon that an embedding model may cluster incorrectly.
Semantic search (dense retrieval)
Embeds both query and documents into a shared latent space and finds nearest neighbors by cosine similarity. Handles synonymy (“automobile” matches “car”), paraphrases, and cross-lingual queries. Fails on out-of-vocabulary tokens that collapse onto unrelated neighbors.
Hybrid search with Reciprocal Rank Fusion (RRF)
RRF combines ranked lists without requiring score normalization:
RRF_score(d) = Σ 1 / (k + rank_i(d)) k=60 is a common default
def reciprocal_rank_fusion(
bm25_results: list[str],
dense_results: list[str],
k: int = 60,
) -> list[str]:
scores: dict[str, float] = {}
for rank, doc_id in enumerate(bm25_results, start=1):
scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank)
for rank, doc_id in enumerate(dense_results, start=1):
scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank)
return sorted(scores, key=scores.get, reverse=True)
Many vector databases (Qdrant, Weaviate, Elasticsearch, OpenSearch) expose hybrid search natively so you do not need to implement RRF yourself.
Decision guide
| Scenario | Recommended |
|---|---|
| Product catalog, serial numbers, legal citations | BM25 or hybrid |
| Customer support, paraphrase-heavy queries | Dense |
| General enterprise knowledge base | Hybrid |
| Multilingual queries | Dense with multilingual model |
Re-ranking
After hybrid retrieval, a cross-encoder re-ranker (e.g., cross-encoder/ms-marco-MiniLM-L-6-v2) re-scores the top-20 candidates with full query-document attention, then you pass the top-5 to the LLM. This adds ~50 ms but often lifts answer quality meaningfully.