datarekha

Cohort Retention Analysis

50,000 active users and the count looks flat — are you healthy, or quietly dying? Cohort retention is the only way to know.

8 min read Intermediate Business Analytics Lesson 11 of 21

What you'll learn

  • What a cohort is and why blending everyone together hides the truth
  • How to read a retention curve and what flattening means
  • Why a flat total active-user count can mask a leaky bucket business
  • How to spot product-market fit from a retention curve shape

Before you start

Your dashboard shows 50,000 active users this month. Same as last month. Same as the month before. The number looks stable, so everything must be fine — right?

Not necessarily. That flat headline can hide two completely different realities: a thriving business with loyal users, or a leaking bucket where thousands of new signups are quietly replaced by thousands of churned users every single month. Cohort retention analysis is the tool that tells you which world you actually live in.


What is a Cohort?

A cohort is a group of customers defined by when they first started — for example, “everyone who signed up in January.” Instead of blending all your users together into one big pool, you track each cohort separately over time.

The “blending” approach is like weighing a bucket of water every day without noticing that half the water is leaking out and someone is refilling it. The weight stays the same, but the bucket is broken. Cohorts let you watch the original water — where it goes, and how fast.


Retention and Churn

Retention is the share of a cohort still active after a given number of months. If you start with 1,000 January signups and 600 are still using the product in February (month 1), your month-1 retention is 60%.

Churn is the flip side: the share of the cohort who have left. Retention + churn for any step always add up to 100%. If month-1 retention is 60%, month-1 churn is 40%.

These two numbers are always two ways of describing the same fact:

Retention rate  =  Active users in cohort / Original cohort size
Churn rate      =  1 - Retention rate

The Retention Curve

Take a January cohort of 1,000 users. Here is what their retention looks like over five months:

Month since signupActive usersRetention
0 (signup)1,000100%
160060%
245045%
338038%
435035%
534034%

Plot those numbers and you get a retention curve — a line that drops steeply in the first month or two, then bends and flattens out. The shape of that bend carries enormous information.

The answer is: not necessarily. The first-month drop is almost always the largest — many users sign up out of curiosity and never come back. What matters more is what happens after that. If the curve flattens (as in the January example above, where retention settles around 34–35%), it means you have found a group of genuinely loyal users who keep coming back month after month. Practitioners call this a product-market fit signal — evidence that the product delivers real, repeatable value to at least a subset of users.

A curve that never stops falling — one that keeps dropping toward zero — tells the opposite story: every user eventually leaves, which means the product has not found a loyal core yet.


Reading the Cohort Retention Table

Real analysts track multiple cohorts side by side in a triangle-shaped table. Rows are cohorts (by signup month); columns are months since signup.

CohortMonth 0Month 1Month 2Month 3
Jan100%60%45%38%
Feb100%57%43%36%
Mar100%62%47%

Reading across a row shows you how a single cohort ages. Reading down a column shows you whether newer cohorts retain better or worse than older ones at the same age — a way to spot whether product improvements are actually helping.

The triangle shape (dashes at the bottom-right) just means those data points are in the future; the March cohort has not yet reached month 3.


The Diagram: What the Curve Actually Looks Like

100%60%38%34%012345Months since signupRetention %loyal core100%60%45%38%35%34%

January cohort (1,000 users): retention curve drops steeply then flattens — the highlighted tail is the loyal core.

The steep section (months 0–2) is normal; virtually every product sees it. The critical question is always: does the curve eventually flatten, or does it keep falling?


The Killer Insight: Why a Flat Total Can Lie

Now we can fully answer the opening question.

Imagine your product acquires 5,000 new users every month. But every month, it also loses 5,000 of its existing users — users who signed up in previous cohorts and eventually churned. The total active-user count stays locked at 50,000. The dashboard looks healthy.

Month N:   50,000 active users
+5,000 new signups this month
-5,000 churned from previous cohorts
= 50,000 active users next month

The headline is flat. The business is a leaky bucket. You will need to spend more and more on acquisition just to stand still — and the moment acquisition slows, total users collapse. Only cohort retention curves reveal this, because they show you whether the users you acquired six months ago are still around.


Putting the Numbers Together

Using the January cohort (1,000 users, month-5 retention of 34%):

Active in month 5  =  1,000 x 0.34  =  340 users
Churned by month 5 =  1,000 - 340   =  660 users

If you have ten such cohorts of 1,000 each (all at month 5), you have 3,400 active users — but 6,600 users have already left. That attrition has to be offset by new cohorts just to hold the total flat.


Quick check

0/3
Q1Your product has 80,000 monthly active users, and the number has been flat for six months. A colleague says 'retention must be great — the number never drops.' What is wrong with this reasoning?
Q2A January cohort starts with 2,000 users. Month-3 retention is 38%. How many users from this cohort are still active in month 3?
Q3Product A has a retention curve that drops to 30% by month 2 and then stays flat at 30% through month 6. Product B drops to 20% by month 2 and keeps falling, reaching 5% by month 6. Which product is healthier, and why?

Next

Once you know how many users from each cohort stick around — and for how long — you can assign a dollar value to each of them. The next lesson turns retention curves into customer lifetime value (CLV): the total revenue a single customer is expected to generate before churning.

Practice this in an interview

All questions
30-day retention dropped from 42 % to 31 % over the last two months. How do you diagnose the root cause?

A retention drop investigation requires distinguishing between an acquisition-mix shift (newer cohorts are lower quality) and a genuine product regression (existing cohorts are performing worse). The two look identical in aggregate retention but have completely different fixes. Cohort analysis — plotting the D30 survival curve for each weekly acquisition cohort — is the first move.

How would you measure user engagement for a mobile app — what metrics would you use and how would you structure them?

Engagement is multi-dimensional: breadth (how many users engage), depth (how much they do per session), and frequency (how often they return). A robust engagement framework stacks these three layers into a metric hierarchy and links them to retention curves, because engagement that does not predict long-term retention is usually noise.

DAU dropped 15 % week-over-week with no planned changes. How do you diagnose it?

A metric drop investigation starts by confirming the drop is real — ruling out logging bugs and metric-definition changes — before hypothesising causes. Then segment by platform, geography, user cohort, and funnel step to isolate where the drop is concentrated, which points to the most likely root cause.

What is the difference between leading and lagging indicators, and how do you use both when building a metrics system?

Lagging indicators (revenue, annual retention, NPS) measure outcomes after they have occurred — they are accurate but slow. Leading indicators (D1 retention, feature adoption rate, time-to-value) correlate with future outcomes and are available faster, making them suitable for early experiment decisions. A robust metrics system pairs both, with the leading metric as the experiment signal and the lagging metric as the validation gate.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content