datarekha

Implicit vs Explicit Feedback

Almost no one leaves star ratings, but everyone clicks and watches. Learn how to build recommenders from behavioral signals alone using preference-confidence modeling.

8 min read Intermediate Recommender Systems Lesson 8 of 11

What you'll learn

  • The difference between explicit feedback (ratings) and implicit feedback (clicks, watch time, purchases)
  • How to reframe implicit data as preference (0/1) weighted by confidence rather than a rating
  • Why treating all non-interactions as negatives is wrong, and how negative sampling fixes it

Before you start

Two very different kinds of signal

Every recommender system runs on feedback — some measure of how much a user liked an item. But not all feedback is created equal.

Explicit feedback is what you ask users to provide directly: star ratings, thumbs up/down, written reviews, or a “save to favorites” action. The signal is unambiguous — a 5-star rating is a clear statement of preference.

The problem is that almost no one gives it. Studies across streaming, e-commerce, and news consistently show that fewer than 1% of users rate anything. The data you collect is sparse, and the users who do rate are a self-selected group who may not represent your audience.

Implicit feedback is everything you observe passively from user behavior: clicks, streams, purchases, dwell time on a page, search queries, scroll depth, and add-to-cart events. You never have to ask for it — it accumulates automatically at scale.

Implicit data is abundant. It is also far noisier, and that asymmetry is the central challenge this lesson addresses.

The asymmetry problem: positives only

With explicit ratings you get a full range of signal — a user can tell you they loved something (5 stars) or hated it (1 star). With implicit data you only observe interactions. You know when someone clicked; you do not know why they did not click on everything else.

Consider three reasons a user might not have clicked on an item:

  1. They saw it and genuinely disliked it.
  2. They never saw it — the UI never surfaced it.
  3. They saw it but were not in the right context (wrong time of day, wrong device, distracted).

Only reason 1 is a true negative signal. Reasons 2 and 3 just mean the item is undiscovered, not disliked. This is called the missing-data problem, and it is the cardinal trap in implicit feedback systems.

The right reframing: preference plus confidence

The solution used by most production implicit systems — including the widely-cited implicit ALS algorithm (Hu, Koren, Volinsky 2008) — is to split the implicit signal into two separate matrices:

Preference is binary: did the user interact with this item at all? A single click means preference = 1. No interaction means preference = 0. This encodes your best guess about whether the user likes the item.

Confidence is a weight that expresses how much you trust that preference estimate. One click gives you weak evidence. Twenty streams of the same song gives you strong evidence. The standard formulation is:

confidence(u, i) = 1 + alpha * count(u, i)

Where count(u, i) is the raw number of interactions (plays, clicks, views) user u has had with item i, and alpha is a scaling hyperparameter (commonly 40 in the original paper).

The + 1 ensures that even items with zero interactions get a small baseline confidence (you are not completely certain they are negatives — you just have very low confidence they are positives). Items with many interactions get high confidence, so the model pays close attention to getting those right during training.

This reframing lets you use all the data — including the zeros — without claiming certainty you do not have.

Counts to preference and confidence: the transform

Here is what the transform looks like visually before you run the code.

Interaction Countsitem1 item2 item3user A 0 3 0user B 1 0 8user C 0 0 2Preference (0 / 1)item1 item2 item3user A 0 1 0user B 1 0 1user C 0 0 1Confidenceitem1 item2 item3user A 1.0 121 1.0user B 41 1.0 321user C 1.0 1.0 81alpha = 40
A count matrix splits into a binary preference matrix and a confidence matrix. High counts produce high confidence; zeros yield the baseline confidence of 1.0.

Notice that user A’s three plays of item2 produce a preference of 1 and a confidence of 121 (1 + 403). User B’s eight plays of item3 produce preference 1 and confidence 321 (1 + 408). Every zero entry stays preference 0 with baseline confidence 1.0 — low weight, not a hard negative.

Runnable: build the preference and confidence matrices

The model that learns from these two matrices — implicit ALS — minimizes a weighted least-squares objective where each entry’s contribution is scaled by its confidence. High-confidence entries (many interactions) pull the latent factors strongly; low-confidence zero entries barely affect training.

Negative sampling: an alternative approach

Implicit ALS handles the missing-data problem through confidence weighting. A different family of models — particularly those based on pairwise ranking like BPR (Bayesian Personalized Ranking) — handles it through negative sampling.

Negative sampling means that during training, for each observed positive interaction, you randomly draw a small number of items the user has not interacted with and treat those as negatives for that training step. You are not claiming they are true negatives — you are just giving the model some contrast signal. Because the negatives are sampled randomly, the model never over-commits to the idea that any particular item is disliked.

Negative sampling is especially common in neural recommender architectures (two-tower models, sequence models) where you need an efficient way to provide contrast without the full weighted least-squares objective.

Choosing between explicit and implicit

In practice the choice is usually made for you by the product:

  • If you have star ratings or thumbs up/down and they are dense enough to be useful, treat them as explicit feedback. Clean signals are valuable.
  • If you are building on clicks, streams, purchases, or dwell time — which is nearly every consumer product at scale — you are working with implicit feedback. Use preference + confidence or pairwise ranking with negative sampling.

Some systems blend both: explicit ratings anchor the latent factors for the small set of users who provide them, while implicit signals fill in the vast majority.

The key insight to carry forward is that implicit data is not a degraded version of explicit data — it is a different kind of signal with its own structure. Model it correctly and it is often more powerful than ratings simply because there is so much more of it.

Quick check

0/3
Q1A user has streamed a song 12 times. With alpha=40, what is their confidence score for that song?
Q2A user has never clicked on item X. In an implicit ALS model, how should item X be treated?
Q3You are building a recommender for a recipe app. Users rarely rate recipes, but you log every recipe page they view, every ingredient list they expand, and every recipe they save. Which approach fits best?

Practice this in an interview

All questions
How would you design a metric to evaluate the relevance of a content recommendation feed?

Feed relevance has no single ground-truth label, so it requires a tiered metric system: an implicit behavioural signal (long dwell time, saves, shares) as the online primary metric; an explicit user-satisfaction signal (thumbs-up/down, survey) as the periodic validation; and an offline ranking metric (NDCG computed from historical high-engagement items) for fast model iteration. The three tiers must converge to be trusted.

How does RLHF work and what problem does it solve?

RLHF (Reinforcement Learning from Human Feedback) aligns a language model's outputs to human preferences by training a reward model on ranked human comparisons, then using that reward signal to fine-tune the policy with reinforcement learning. It solves the gap between a model that is good at next-token prediction and a model that is genuinely helpful, harmless, and honest.

How do you explain a technical result or model to non-technical stakeholders?

The best communicators translate outputs into decisions, not equations. Lead with the business implication, use an analogy for the mechanism, and reserve technical detail for an appendix or follow-up. Calibrate depth to the audience in the room, not to what you find interesting.

How would you design a metric to measure the quality of a search feature inside an e-commerce app?

Search quality has two sides: relevance (did results match intent?) and utility (did the user accomplish their goal?). A good metric system combines an offline relevance signal — such as NDCG computed against human-labelled queries — with an online behavioural signal — such as click-through rate at rank 1 and zero-result rate — tied to a downstream business outcome like add-to-cart rate.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content