How would you design a metric to evaluate the relevance of a content recommendation feed?

Feed relevance has no single ground-truth label, so it requires a tiered metric system: an implicit behavioural signal (long dwell time, saves, shares) as the online primary metric; an explicit user-satisfaction signal (thumbs-up/down, survey) as the periodic validation; and an offline ranking metric (NDCG computed from historical high-engagement items) for fast model iteration. The three tiers must converge to be trusted.

Compare filter, wrapper, and embedded feature-selection methods. When would you use each?

Filter methods score features by statistical relevance to the target independently of any model, so they're fast but ignore feature interactions. Wrapper methods (like recursive feature elimination) search subsets by training a model and evaluating performance, which is accurate but computationally expensive. Embedded methods select features as part of model training (like lasso or tree importances), giving a good balance of accuracy and efficiency.

What is hybrid search and why is it often better than pure vector search?

Hybrid search combines dense vector similarity with sparse keyword search such as BM25, then fuses the rankings. Dense retrieval captures semantic meaning while keyword search nails exact terms, identifiers, and rare tokens, so combining them improves recall and precision over either alone.

What is the difference between retrieval and reranking in a RAG pipeline?

Retrieval cheaply searches a large corpus and returns a candidate set, prioritizing recall. Reranking applies a more expensive query-document model to that small set and improves precision and ordering at the top. A reranker cannot recover relevant documents absent from the retrieved candidates, so evaluate first-stage recall separately.

Content-based filtering — Recommender Systems

The last lesson left us staring at an empty column — Oppenheimer, rated by nobody — and the question of how to recommend an item with zero interaction history. The answer is to stop looking at the ratings grid entirely and look at the item itself: its description, its genre, its tags. That is content-based filtering.

Why content-based filtering exists

Collaborative filtering (CF) is powerful, but it has a fundamental dependency: it needs overlap — users who have rated the same items. A new item has no overlap with anything. This is the item cold-start problem.

Content-based filtering sidesteps it entirely by ignoring other users. Instead, it asks: “What are the properties of this item, and which users have consistently liked items with those same properties?” No ratings on the new item are needed — only its description, genre tags, or metadata.

Building blocks

Item profiles

An item profile is a numeric vector that encodes the properties of one item. For a movie, the features might be its genre tags (action, sci-fi, thriller). For a product, they might be category, price tier, and keywords from the description. For a document, they might be word frequencies.

The most practical way to turn free-text descriptions into item profiles is TF-IDF (Term Frequency-Inverse Document Frequency). TF-IDF assigns a high weight to words that appear often in this document but rarely across all documents — words that actually distinguish the item. Common words like “the” or “movie” get near-zero weight; distinctive words like “cyberpunk” or “heist” score high.

User profiles

A user profile is built by aggregating the item profiles of everything the user has liked. In the simplest form: average the feature vectors of all positively-rated items. A user who liked three sci-fi films will have a profile with a strong sci-fi signal. A user who liked romantic comedies will have a strong romance-comedy signal. The profile encodes taste without ever consulting other users.

Cosine similarity

Once both items and users live in the same feature space, recommendation is a nearest-neighbor search. The distance metric of choice is cosine similarity — the cosine of the angle between two vectors. It ranges from -1 (opposite) to 1 (identical direction). Two vectors pointing in the same direction are similar even if one is much longer than the other, which matters because a user who rated 200 films will have a larger magnitude profile than a user who rated 5, but their taste direction is what we want to compare.

Vector geometry — item profiles and the user profile

The diagram below shows five item vectors in a two-dimensional feature space (imagine “sci-fi intensity” on the horizontal axis and “action intensity” on the vertical). The user profile is the average of the items the user liked (filled dots). The recommendation algorithm picks the item whose angle to the user profile vector is smallest — closest by cosine.

Items A and B were liked; their average becomes the user profile vector (amber dashed). Item C is the next-closest by angle — the top recommendation. D and E are far away.

Strengths

Handles item cold-start. A brand-new item with zero ratings can be recommended as soon as it has metadata. No other users needed.

Explainable recommendations. Because the system knows why an item was recommended — it matched specific features — you can tell the user: “Because you liked sci-fi films with ensemble casts.” Collaborative filtering rarely offers this.

No popularity bias. Every item is judged by its features, not how many people have rated it. Niche items get a fair shot.

Needs good features. If the item metadata is sparse, wrong, or missing, the system can’t do its job. Garbage features in, garbage recommendations out.

Code: TF-IDF item profiles and cosine similarity

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Small movie corpus: title + description
movies = [
    {"title": "Gravity",         "desc": "An astronaut stranded in space must survive alone after debris destroys her shuttle."},
    {"title": "Interstellar",    "desc": "A crew of astronauts travels through a wormhole in space searching for a new home for humanity."},
    {"title": "The Martian",     "desc": "An astronaut is left behind on Mars and must survive alone using science and ingenuity."},
    {"title": "Arrival",         "desc": "A linguist deciphers an alien language to prevent a global crisis."},
    {"title": "Die Hard",        "desc": "A cop battles terrorists who have taken over a skyscraper on Christmas Eve."},
    {"title": "Titanic",         "desc": "A love story unfolds aboard a doomed ocean liner on its ill-fated maiden voyage."},
]

titles = [m["title"] for m in movies]
descs  = [m["desc"]  for m in movies]

# Build TF-IDF matrix: rows = movies, cols = vocab terms
vectorizer = TfidfVectorizer(stop_words="english")
tfidf_matrix = vectorizer.fit_transform(descs)

# Query: user liked "Gravity" — its row is the user profile
query_idx = titles.index("Gravity")
query_vec = tfidf_matrix[query_idx]

# Cosine similarity between query and every other movie
sims = cosine_similarity(query_vec, tfidf_matrix).flatten()
sims[query_idx] = -1  # exclude the query itself

# Rank and print
ranked = np.argsort(sims)[::-1]
print(f"Recommendations if you liked '{titles[query_idx]}':\n")
for rank, idx in enumerate(ranked, 1):
    print(f"  {rank}. {titles[idx]:<18}  cosine={sims[idx]:.3f}")

Recommendations if you liked 'Gravity':

  1. The Martian         cosine=0.218
  2. Interstellar        cosine=0.093
  3. Titanic             cosine=0.000
  4. Die Hard            cosine=0.000
  5. Arrival             cosine=0.000
  6. Gravity             cosine=-1.000

The scores confirm the intuition: The Martian (0.218) and Interstellar (0.093) — both sharing distinctive words with “Gravity” like astronaut, space, survive — rise to the top. The rest score exactly 0.000: no shared distinctive vocabulary means a zero dot product. (Gravity itself sits last at −1.000 because we masked it out.) Note the subtle trap — Arrival is sci-fi too, but its words (“linguist”, “alien language”) overlap none of Gravity’s, so content-based scores it zero. The model only knows the features you feed it; it never saw “genre.”

From item similarity to user profiles

In the snippet above, a single liked item serves as the query. In a real system, the user profile is the mean (or weighted mean) of TF-IDF vectors for all items the user has positively rated. This aggregated vector is then compared against every candidate item in the catalog, and the top-K by cosine similarity are surfaced as recommendations.

# Conceptual sketch — not runnable here
liked_indices = [0, 1]   # user liked Gravity and Interstellar
user_profile  = tfidf_matrix[liked_indices].mean(axis=0)
sims          = cosine_similarity(user_profile, tfidf_matrix).flatten()

In one breath

Content-based filtering ignores other users and matches items to taste by their features. Turn each item into a vector — TF-IDF over its description is the strong default, up-weighting words distinctive to this item (cyberpunk) and zeroing common ones (the). Build a user profile by averaging the vectors of items the user liked, then rank every candidate by cosine similarity (the angle between vectors, ignoring magnitude, so a heavy rater and a light one are compared by taste direction, not how much they’ve rated). Its wins: it solves item cold-start (a new item is recommendable the moment it has metadata) and is explainable (“because you liked sci-fi survival stories”). Its costs: a filter bubble (it only echoes what you already like) and total dependence on good features — it scored two sci-fi films at zero because their words didn’t overlap.

Practice

Quick check

0/3

Q1A streaming platform launches 50 new documentaries overnight. None has a single user rating yet. Which recommendation approach can immediately recommend relevant ones to the right viewers?

Q2A user has rated 40 horror films highly and nothing else. After months of use, the recommender keeps suggesting only horror. Which weakness of content-based filtering does this illustrate?

Q3Two movies share no words in their TF-IDF descriptions, but both belong to the 'sci-fi' genre encoded as a binary feature. If the TF-IDF vectorizer only sees the description text and ignores the genre field, what happens to their cosine similarity?

A question to carry forward

Sit with the filter bubble for a moment, because it’s the deep limitation here. Content-based filtering can only recommend things that look like your past — same words, same tags. It will never tell you “people whose taste is uncannily like yours adored this one weird film that shares nothing with your history.” That kind of serendipitous, cross-genre surprise is precisely what makes a recommender feel magical — and content alone cannot produce it.

So the question to carry forward is: where does that magic come from? Not from the items, but from other people. If a hundred users who rated films exactly as you did all loved something you haven’t seen, that’s a powerful signal no feature vector contains. The next lesson, user-based collaborative filtering, throws away item features entirely and recommends from the crowd — finding the users most similar to you and borrowing what they loved.

Content-based filtering

What you'll learn

Before you start