datarekha

What is hybrid search and when should you use semantic vs keyword retrieval?

The short answer

Keyword search (BM25) excels at exact term matching — product codes, proper nouns, rare abbreviations. Semantic search (dense embeddings) captures meaning and handles paraphrases. Hybrid search runs both in parallel and merges scores with Reciprocal Rank Fusion, giving the best of both worlds for most production RAG systems.

How to think about it

Keyword search (BM25 / TF-IDF)

Scores documents by term frequency adjusted for document length. Fast, interpretable, and zero-shot — it needs no training data. Excels when queries contain rare tokens, serial numbers, or jargon that an embedding model may cluster incorrectly.

Semantic search (dense retrieval)

Embeds both query and documents into a shared latent space and finds nearest neighbors by cosine similarity. Handles synonymy (“automobile” matches “car”), paraphrases, and cross-lingual queries. Fails on out-of-vocabulary tokens that collapse onto unrelated neighbors.

Hybrid search with Reciprocal Rank Fusion (RRF)

RRF combines ranked lists without requiring score normalization:

RRF_score(d) = Σ 1 / (k + rank_i(d))    k=60 is a common default
def reciprocal_rank_fusion(
    bm25_results: list[str],
    dense_results: list[str],
    k: int = 60,
) -> list[str]:
    scores: dict[str, float] = {}
    for rank, doc_id in enumerate(bm25_results, start=1):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank)
    for rank, doc_id in enumerate(dense_results, start=1):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank)
    return sorted(scores, key=scores.get, reverse=True)

Many vector databases (Qdrant, Weaviate, Elasticsearch, OpenSearch) expose hybrid search natively so you do not need to implement RRF yourself.

Decision guide

ScenarioRecommended
Product catalog, serial numbers, legal citationsBM25 or hybrid
Customer support, paraphrase-heavy queriesDense
General enterprise knowledge baseHybrid
Multilingual queriesDense with multilingual model

Re-ranking

After hybrid retrieval, a cross-encoder re-ranker (e.g., cross-encoder/ms-marco-MiniLM-L-6-v2) re-scores the top-20 candidates with full query-document attention, then you pass the top-5 to the LLM. This adds ~50 ms but often lifts answer quality meaningfully.

Keep practising

All NLP & LLMs questions

Explore further

Skip to content