What are chunking strategies in RAG, and how do you choose chunk size?

Chunking splits documents into retrievable units; strategies include fixed-size windows, overlapping windows, and semantic or structure-aware splitting on sentences or sections. Smaller chunks improve retrieval precision but risk losing context, while larger chunks preserve context but dilute relevance, so chunk size and overlap are tuned to the content and the embedding model's context length.

What chunking strategies exist for RAG and how do you choose between them?

Chunking splits source documents into retrievable units before embedding. The right strategy depends on document structure, query style, and the model's context window. Fixed-size chunks are simple but break mid-sentence; semantic or structural chunking preserves coherence; hierarchical chunking enables parent-document retrieval for richer context.

How would you choose a north-star metric for a product, and what makes a metric a good north-star?

A north-star metric must satisfy three properties: it reflects the core value delivered to users, it correlates with long-term business outcomes (retention and revenue), and it is actionable — meaning teams can run experiments that move it. Choosing one requires articulating the product's value exchange and then stress-testing candidate metrics against those three criteria.

Estimate the number of Uber rides taken in New York City on a typical weekday.

Market-sizing questions test whether you can decompose a complex number into estimable sub-components, make explicit and reasonable assumptions, sanity-check against known benchmarks, and communicate uncertainty without losing structure. The answer itself matters less than the reasoning chain.

Customer Segmentation and RFM — Business Analytics

The last lesson left us with a warning: sorting customers by a single number — how much they spend — is too blunt, because two people who spent the same amount can be opposites, one buying yesterday and one gone for months. This lesson fixes that by sorting on three axes at once.

Averaging across all 50,000 customers gives you a number like “average order value: $42.” That tells you almost nothing useful. Your top 200 customers spend $400 each; your bottom 10,000 made one impulse buy and vanished. Treating them identically wastes money on the wrong people and sends the wrong message to the right ones.

The fix is segmentation.

What Is Segmentation?

Segmentation means splitting customers (or any group) into smaller groups whose members behave similarly to each other but differently from other groups — so you can treat each group differently instead of using one message for everyone.

The key test of a good segment: does it change what you do? If two segments would receive the same email, same offer, and same budget, they are not meaningfully different segments — they are just one segment with a fancier label.

A segment that changes your action is called actionable.

Why RFM?

Dozens of segmentation methods exist, but most require rich data: browsing history, demographic surveys, complex models. RFM (first popularized in direct-mail marketing in the 1980s and still widely used) needs only three numbers per customer, all derivable from a basic transaction table:

Dimension	What it measures	Direction
R — Recency	Days since the customer’s last purchase	Lower = better (they are engaged right now)
F — Frequency	Number of times they have purchased	Higher = better (habit has formed)
M — Monetary	Total amount they have spent	Higher = better (real dollar value)

That’s it. Three columns. No surveys, no tracking pixels, no ML model required at the start.

The Four Segments

You pick a threshold for each dimension — for example, “recent” means ordered within the last 30 days; “frequent” means at least 5 orders — and every customer falls into one of four quadrants:

Champions — recent AND frequent (usually high M too). These customers love you. Reward them, ask for referrals, offer early access, upsell.

Promising / New — recent but NOT yet frequent. They just discovered you and bought once or twice. Your job: onboard them well, give them a reason to come back, and nurture them toward loyalty before they forget you exist.

At-risk — NOT recent, but historically frequent (and usually high M). These were your best customers. They used to buy regularly — and then they went quiet. This is the most valuable segment to act on urgently.

Lost — neither recent nor frequent. Long gone, low engagement. Worth a low-cost reactivation ping occasionally, but not your primary budget.

Try It: RFM Explorer

The chart below plots 16 sample customers. The x-axis is Recency (recent buyers are on the left). The y-axis is Frequency (more orders toward the top). Bubble size represents Monetary value (total spend).

Drag the two sliders to move the Recency and Frequency thresholds. Watch customers jump between segments as the cut-offs shift. Notice how the shaded top-right quadrant — At-risk — contains large bubbles: high spenders who have gone quiet.

TryRFM segmentation

Cut customers into segments you can act on

Bubble size is spend (Monetary). Move the lines — the top-right corner is the money slipping away.

Recent if last order within30 daysFrequent if orders at least5

Champions 5recent + frequent — reward & upsell

Promising 3recent but new — nurture into loyal

At-risk 5were loyal, gone quiet — win back now

Lost 3gone & infrequent — low-cost reactivation

The thresholds you choose are a business judgment call, not a mathematical truth. A luxury retailer might define “recent” as within 90 days; a grocery app might use 7 days. RFM is a framework, not a fixed formula.

From Segments to Actions

The same $5,000 email budget produces very different returns depending on who receives it:

Blast to all 50,000: most recipients are either already loyal (wasted effort) or long-lost (low response rate). Average return.
Target At-risk, high-M customers only: you are reaching people who have demonstrated they will spend, with a message specifically about winning them back (“We miss you — here’s 15% off your next order”). Much higher return per dollar.

This is why segmentation is not just a data exercise — it directly changes the economics of your marketing spend.

Limitations Worth Knowing

RFM is powerful and incomplete:

It is backward-looking. A customer who bought 10 times last year and zero times this year looks like At-risk — but they may have moved cities, changed life stage, or simply found a better product. RFM cannot tell you why.
Thresholds are arbitrary. Moving the Recency slider from 30 to 45 days reclassifies real people. Treat the segments as useful approximations, not ground truth.
M can be misleading. One large gift purchase inflates Monetary without implying loyalty. Pairing RFM with product-category data helps.

These limitations do not disqualify RFM — they are a reminder to use it as a starting heuristic, then layer on richer data as you learn more.

In one breath

Segmentation splits customers into groups that behave alike, so you can treat each differently — and a segment only earns the name if it changes what you do. RFM does this with three cheap numbers from any transaction table: Recency (days since last purchase, lower is better), Frequency (how many orders, higher is better), and Monetary (total spend, higher is better). Pick a threshold on R and F and customers fall into four quadrants — Champions (recent + frequent: reward, ask for referrals), Promising (recent, not yet frequent: onboard and nurture), At-risk (frequent but gone quiet: the urgent, highest-ROI win-back), and Lost (neither: cheap reactivation at most). The counter-intuitive lesson: spend your win-back budget on At-risk, not Champions (already buying) or Lost (rarely return). Thresholds are judgment calls, and RFM is backward-looking — a starting heuristic, not ground truth.

Practice

Quick check

0/3

Q1A customer last bought 8 days ago (recent) and has placed 12 orders total (frequent). Their total spend is $850. Which segment do they belong to, and what is the right action?

Q2Your manager says: 'Our At-risk segment has 800 customers and our Lost segment has 8,000. We should focus win-back spend on Lost because the total audience is 10x bigger.' What is the flaw in this reasoning?

Q3A subscription box company wants to use RFM. They define 'Recency' as months since last renewal, 'Frequency' as number of consecutive months subscribed, and 'Monetary' as average monthly plan value. A customer renewed 2 months ago, has been subscribed for 18 months, and pays $29/month. Their Recency score is low (recent), Frequency is high. Which segment are they in — and which action applies?

A question to carry forward

RFM only works once someone is already a customer — they’re in your transaction table because they bought at least once. But step back: how did all 50,000 of them get there in the first place? Every Champion was once a stranger who saw an ad, landed on your site, signed up, and finally paid. Most strangers never finish that journey.

So the question to carry forward is: before a customer ever reaches your RFM table, what does the path to becoming one look like — and where do most people fall off it? The next lesson is funnel analysis: the ordered steps from first visit to first payment, why overall conversion is the product of step rates rather than their average, and how to find the single leak worth fixing before you spend a rupee on more traffic.

Customer Segmentation and RFM

What you'll learn

Before you start

What Is Segmentation?

Why RFM?

The Four Segments

Try It: RFM Explorer

Cut customers into segments you can act on

From Segments to Actions

Limitations Worth Knowing

In one breath

Practice

Quick check

A question to carry forward

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further