What is a vector database and how does it enable semantic retrieval?
A vector database stores dense numerical embeddings alongside their source documents and uses approximate nearest-neighbor (ANN) algorithms to find the most semantically similar entries for a query vector in milliseconds. Unlike a keyword index, similarity is measured in geometric space so synonyms and paraphrases match naturally. Common choices include Pinecone, Weaviate, Qdrant, and pgvector for Postgres.
How to think about it
A vector database is a storage and retrieval system optimized for high-dimensional float arrays (embeddings). Every document chunk is encoded into a vector — typically 768 to 3 072 dimensions — by an embedding model. The database indexes these vectors so that at query time it can return the k closest neighbors in sub-second latency even over hundreds of millions of vectors.
How ANN indexing works
Most production systems use one of two index families:
- HNSW (Hierarchical Navigable Small World) — graph-based, very fast recall, higher memory cost. Used by Qdrant, Weaviate, and pgvector.
- IVF-PQ (Inverted File + Product Quantization) — clusters vectors into buckets then compresses within each; lower memory, slightly lower recall. Faiss default.
The trade-off is recall vs. latency vs. RAM. HNSW dominates for online RAG; IVF-PQ is preferred for billion-scale offline search.
Similarity metrics
| Metric | When to use |
|---|---|
| Cosine similarity | Text embeddings (unit-normalized vectors) |
| Dot product | Faster; only correct when vectors are already normalized |
| L2 (Euclidean) | Image/audio embeddings with magnitude meaning |
import qdrant_client as qc
from qdrant_client.models import Distance, VectorParams, PointStruct
client = qc.QdrantClient(":memory:")
client.create_collection(
"docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
# Upsert embeddings
client.upsert("docs", points=[
PointStruct(id=1, vector=embedding_1, payload={"text": chunk_1}),
PointStruct(id=2, vector=embedding_2, payload={"text": chunk_2}),
])
# Retrieve top-3
hits = client.search("docs", query_vector=query_embedding, limit=3)
contexts = [h.payload["text"] for h in hits]