datarekha

What is a vector database and how does it enable semantic retrieval?

The short answer

A vector database stores dense numerical embeddings alongside their source documents and uses approximate nearest-neighbor (ANN) algorithms to find the most semantically similar entries for a query vector in milliseconds. Unlike a keyword index, similarity is measured in geometric space so synonyms and paraphrases match naturally. Common choices include Pinecone, Weaviate, Qdrant, and pgvector for Postgres.

How to think about it

A vector database is a storage and retrieval system optimized for high-dimensional float arrays (embeddings). Every document chunk is encoded into a vector — typically 768 to 3 072 dimensions — by an embedding model. The database indexes these vectors so that at query time it can return the k closest neighbors in sub-second latency even over hundreds of millions of vectors.

How ANN indexing works

Most production systems use one of two index families:

  • HNSW (Hierarchical Navigable Small World) — graph-based, very fast recall, higher memory cost. Used by Qdrant, Weaviate, and pgvector.
  • IVF-PQ (Inverted File + Product Quantization) — clusters vectors into buckets then compresses within each; lower memory, slightly lower recall. Faiss default.

The trade-off is recall vs. latency vs. RAM. HNSW dominates for online RAG; IVF-PQ is preferred for billion-scale offline search.

Similarity metrics

MetricWhen to use
Cosine similarityText embeddings (unit-normalized vectors)
Dot productFaster; only correct when vectors are already normalized
L2 (Euclidean)Image/audio embeddings with magnitude meaning
import qdrant_client as qc
from qdrant_client.models import Distance, VectorParams, PointStruct

client = qc.QdrantClient(":memory:")
client.create_collection(
    "docs",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

# Upsert embeddings
client.upsert("docs", points=[
    PointStruct(id=1, vector=embedding_1, payload={"text": chunk_1}),
    PointStruct(id=2, vector=embedding_2, payload={"text": chunk_2}),
])

# Retrieve top-3
hits = client.search("docs", query_vector=query_embedding, limit=3)
contexts = [h.payload["text"] for h in hits]

Keep practising

All NLP & LLMs questions

Explore further

Skip to content