datarekha
SQL Hard Asked at AmazonAsked at GoogleAsked at LinkedIn

What is the gaps-and-islands problem, and how do you solve it with window functions?

The short answer

Gaps-and-islands is the problem of identifying contiguous ranges (islands) within ordered sequential data and the breaks (gaps) between them. The classic solution subtracts a dense sequential integer from the ordering column — equal differences belong to the same island.

How to think about it

This is a classic hard SQL problem that shows up in real work — sessionization, streak counting, finding outage windows, detecting login gaps. The trick once you see it is surprisingly elegant, but it takes a moment to internalize why it works.

The problem

Given a table of dates with some missing, group consecutive present dates into contiguous ranges (islands) and identify the breaks (gaps) between them.

For example: dates Jan 1, 2, 3, 5, 6, 9 should produce three islands: Jan 1-3, Jan 5-6, Jan 9.

Why the row_number trick works

Imagine numbering the dates 1, 2, 3, 4, 5, 6 in order. Now subtract the row number from the date value. For a consecutive sequence, each date increases by 1 and each row number increases by 1 — the difference stays constant. The moment there is a gap in the dates, the date jumps by more than 1 but the row number only jumps by 1, changing the difference. Same difference = same island.

Run it yourself

Variant: status-based islands (event sequences)

When you are not working with dates but with state changes — like consecutive rows where status = 'active' — use LAG to detect each boundary and SUM over a running window to assign group IDs:

WITH flagged AS (
  SELECT
    user_id, event_time, status,
    CASE
      WHEN status <> LAG(status) OVER (PARTITION BY user_id ORDER BY event_time)
        OR LAG(status) OVER (PARTITION BY user_id ORDER BY event_time) IS NULL
      THEN 1 ELSE 0
    END AS is_new_group
  FROM events
),
grouped AS (
  SELECT *,
    SUM(is_new_group) OVER (
      PARTITION BY user_id ORDER BY event_time
      ROWS UNBOUNDED PRECEDING
    ) AS grp
  FROM flagged
)
SELECT user_id, status, MIN(event_time) AS period_start, MAX(event_time) AS period_end
FROM grouped
GROUP BY user_id, status, grp
ORDER BY user_id, period_start;

SUM(is_new_group) acts as a running counter that increments each time a new island begins.

Learn it properly Sessionization

Keep practising

All SQL questions

Explore further

Skip to content