What is the gaps-and-islands problem, and how do you solve it with window functions?
Gaps-and-islands is the problem of identifying contiguous ranges (islands) within ordered sequential data and the breaks (gaps) between them. The classic solution subtracts a dense sequential integer from the ordering column — equal differences belong to the same island.
How to think about it
This is a classic hard SQL problem that shows up in real work — sessionization, streak counting, finding outage windows, detecting login gaps. The trick once you see it is surprisingly elegant, but it takes a moment to internalize why it works.
The problem
Given a table of dates with some missing, group consecutive present dates into contiguous ranges (islands) and identify the breaks (gaps) between them.
For example: dates Jan 1, 2, 3, 5, 6, 9 should produce three islands: Jan 1-3, Jan 5-6, Jan 9.
Why the row_number trick works
Imagine numbering the dates 1, 2, 3, 4, 5, 6 in order. Now subtract the row number from the date value. For a consecutive sequence, each date increases by 1 and each row number increases by 1 — the difference stays constant. The moment there is a gap in the dates, the date jumps by more than 1 but the row number only jumps by 1, changing the difference. Same difference = same island.
Run it yourself
Variant: status-based islands (event sequences)
When you are not working with dates but with state changes — like consecutive rows where status = 'active' — use LAG to detect each boundary and SUM over a running window to assign group IDs:
WITH flagged AS (
SELECT
user_id, event_time, status,
CASE
WHEN status <> LAG(status) OVER (PARTITION BY user_id ORDER BY event_time)
OR LAG(status) OVER (PARTITION BY user_id ORDER BY event_time) IS NULL
THEN 1 ELSE 0
END AS is_new_group
FROM events
),
grouped AS (
SELECT *,
SUM(is_new_group) OVER (
PARTITION BY user_id ORDER BY event_time
ROWS UNBOUNDED PRECEDING
) AS grp
FROM flagged
)
SELECT user_id, status, MIN(event_time) AS period_start, MAX(event_time) AS period_end
FROM grouped
GROUP BY user_id, status, grp
ORDER BY user_id, period_start;
SUM(is_new_group) acts as a running counter that increments each time a new island begins.