The cold-start problem
A new user just signed up and a new item just launched — neither has any interaction history. What on earth do I recommend?
What you'll learn
- The three cold-start cases: new user, new item, and new system — and why each breaks collaborative filtering
- Remedies: content-based filtering, preference elicitation, popularity fallbacks, and hybrid strategies
- Exploration vs. exploitation: using multi-armed bandits to gather signal on new items deliberately
Before you start
The root cause: collaborative filtering needs overlap
Collaborative filtering (CF) — whether user-based or item-based — works by finding overlap: users who have rated the same items, or items that have been rated by the same users. Given enough overlap, CF can uncover latent taste patterns that no hand-crafted feature would capture.
But overlap requires history. When history is absent, CF has nothing to compute and either returns empty results or falls back silently to random noise. This failure mode has a name: the cold-start problem.
Three distinct cold-start cases
The term “cold start” actually covers three separate situations, each with different remedies.
The three cold-start cases and the primary remedies for each.
Case 1 — New user
A user who just registered has made no clicks, ratings, or purchases. CF cannot place them in the taste space because there is no signal to anchor on.
Why CF fails: CF computes similarity between users based on shared item interactions. With zero interactions, the new user is equidistant from everyone — the similarity computation is undefined.
Remedies:
- Preference elicitation. Show the user a short onboarding screen: “Pick any genres you like” or “Rate a few of these titles.” Even three or four explicit signals dramatically shrink the cold zone. Spotify, Netflix, and YouTube all do this in some form.
- Demographic and context priors. If you know the user’s region, device type, referral source, or time of day, you can look up what similar contextual cohorts tend to prefer and use that as a prior.
- Popularity and trending fallbacks. In the absence of individual signal, show what is broadly popular right now. It is not personalized, but it is better than random — and it is honest about what it is.
Case 2 — New item
A new product, article, or song has just been added to the catalog. It has no ratings, no clicks, and no purchase history. CF cannot recommend it to anyone because no user vector includes it.
Why CF fails: Item-based CF finds items that co-occur in user histories. A new item has never co-occurred with anything.
Remedies:
- Content-based filtering on item metadata. If you have the item’s features — genre, author, price range, ingredient list, audio tempo — you can find users whose preference profiles match those features. No ratings are needed, only the item description. This is exactly what content-based filtering was designed for.
- Deliberate exploration. Surface the item to a small fraction of users and observe their reactions. This is the exploration half of the exploration/exploitation trade-off, discussed in detail in the bandit section below.
Case 3 — New system
A brand-new platform has no users, no items with ratings, and no interaction history at all. This is the hardest case.
Remedies:
- Bootstrap with content-based filtering and editorial curation. Start with hand-crafted recommendations from domain experts (editors, curators) combined with content-based matching on item metadata.
- Import or synthesize prior signal. Some platforms import aggregate popularity data from public sources (chart positions, review aggregators) to seed the system before real interactions accumulate.
- Accept the ramp-up period. A new system will be less accurate than a mature one. Design the UX to set expectations and to collect feedback aggressively so the system ramps up as fast as possible.
Why pure collaborative filtering fails silently
The core issue is that CF is defined entirely in terms of the interaction matrix. Any row (user) or column (item) with no non-zero entries is structurally invisible to the algorithm. Matrix factorization techniques (SVD, ALS) simply cannot produce a latent embedding for an entity with no observations to fit against.
Hybrid systems: lean on content early, shift to CF over time
The practical engineering answer to cold start is a hybrid system that routes recommendations based on how much history is available.
A simple routing strategy looks like this:
- Zero to a few interactions: use content-based filtering and popularity fallbacks exclusively.
- Moderate history (roughly tens of interactions): blend content-based scores with CF scores, weighting CF lightly.
- Rich history: trust CF more heavily; content signals become a regularizer rather than the primary driver.
This shift can be implemented as a weighted interpolation — if cf_score and
cb_score are both normalized to [0, 1], a blending weight alpha can be
set based on the number of interactions a user or item has accumulated.
As alpha rises from 0 to 1, the system transitions from pure content-based
to pure CF. The exact schedule for alpha is a tunable hyperparameter.
Exploration vs. exploitation: bandits as a cold-start tool
When a new item enters the catalog, you face a decision problem: do you exploit what you already know (recommend things you are confident about), or do you explore (show the new item to learn how people react)?
This trade-off is formalized in the multi-armed bandit framework. Each item is an “arm” of the bandit. Pulling an arm means recommending the item and observing the reward (a click, a purchase, a watch-through). The goal is to maximize total reward over time while still gathering information about new arms.
Epsilon-greedy is the simplest bandit strategy:
- With probability
epsilon(the exploration rate), pick a random item — possibly a new one you know nothing about yet. - With probability
1 - epsilon, pick the item with the best current estimated reward (exploitation).
Over time, the new item accumulates enough observations to get a reliable reward estimate, after which it competes on its own merits. Epsilon is typically annealed (gradually reduced) as the system matures and fewer truly unknown items remain.
More sophisticated strategies — UCB (Upper Confidence Bound) and Thompson Sampling — are theoretically better at balancing exploration and exploitation, but epsilon-greedy is often the right starting point because it is easy to debug and reason about.
A runnable example: popularity fallback plus content scoring
The code below demonstrates the two most practical cold-start remedies working together: a popularity fallback for new users, and a content-based score for new items based on metadata features.
Notice that item D (“Sci-fi short”) scores high on content despite having only five interactions, while item E (“Documentary”) appears near the bottom on both dimensions. Item A (“Sci-fi epic”) ranks first — it combines strong content match with high popularity, exactly what you want for a new user.
Summary
The cold-start problem is not one problem — it is three, and each needs its own toolbox:
- New user: elicit preferences explicitly, use demographic priors, fall back to popularity.
- New item: use content-based filtering on metadata, and explore deliberately with bandit strategies.
- New system: bootstrap with content-based and editorial signals; accept the ramp-up.
The deeper principle is that collaborative filtering assumes a rich interaction matrix. Anything that violates that assumption — newness, sparsity, niche items — requires a different approach. Hybrid systems that blend content and CF, weighted by available evidence, are the standard production answer.