What is hybrid search and when should you use semantic vs keyword retrieval?

For AI / LLM Engineer ML Engineer Data Scientist

The short answer

Keyword search (BM25) excels at exact term matching — product codes, proper nouns, rare abbreviations. Semantic search (dense embeddings) captures meaning and handles paraphrases. Hybrid search runs both in parallel and merges scores with Reciprocal Rank Fusion, giving the best of both worlds for most production RAG systems.

How to think about it

Keyword search (BM25 / TF-IDF)

Scores documents by term frequency adjusted for document length. Fast, interpretable, and zero-shot — it needs no training data. Excels when queries contain rare tokens, serial numbers, or jargon that an embedding model may cluster incorrectly.

Semantic search (dense retrieval)

Embeds both query and documents into a shared latent space and finds nearest neighbors by cosine similarity. Handles synonymy (“automobile” matches “car”), paraphrases, and cross-lingual queries. Fails on out-of-vocabulary tokens that collapse onto unrelated neighbors.

Hybrid search with Reciprocal Rank Fusion (RRF)

RRF combines ranked lists without requiring score normalization:

RRF_score(d) = Σ 1 / (k + rank_i(d))    k=60 is a common default

def reciprocal_rank_fusion(
    bm25_results: list[str],
    dense_results: list[str],
    k: int = 60,
) -> list[str]:
    scores: dict[str, float] = {}
    for rank, doc_id in enumerate(bm25_results, start=1):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank)
    for rank, doc_id in enumerate(dense_results, start=1):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank)
    return sorted(scores, key=scores.get, reverse=True)

Many vector databases (Qdrant, Weaviate, Elasticsearch, OpenSearch) expose hybrid search natively so you do not need to implement RRF yourself.

Decision guide

Scenario	Recommended
Product catalog, serial numbers, legal citations	BM25 or hybrid
Customer support, paraphrase-heavy queries	Dense
General enterprise knowledge base	Hybrid
Multilingual queries	Dense with multilingual model

Re-ranking

After hybrid retrieval, a cross-encoder re-ranker (e.g., cross-encoder/ms-marco-MiniLM-L-6-v2) re-scores the top-20 candidates with full query-document attention, then you pass the top-5 to the LLM. This adds ~50 ms but often lifts answer quality meaningfully.

What is hybrid search and when should you use semantic vs keyword retrieval?

Keep practising

Explore further