Slowly Changing Dimensions

What you'll learn

The one question that defines SCD — keep the history, or overwrite it

Type 1 (overwrite), Type 2 (new versioned row), Type 3 (previous-value column)

How surrogate keys plus valid_from/valid_to/is_current make Type 2 work

Why a fact automatically links to the dimension version current at event time

A dimension is meant to describe a thing — a customer, a product, a sales rep. But things change. A customer moves city; a product is recategorised; a rep is reassigned to a new territory.

The question that defines slowly changing dimensions is deceptively small: when the description changes, do you overwrite the past, or remember it? And the answer changes the table’s shape, not just its data. Picture it concretely. Maria’s territory moves from “West” to “Mountain” on March 1, and in April someone runs the Q1-by-territory report. Should January’s closed deals credit “West” — where she was when they closed — or “Mountain”, where she is now? Both answers are defensible, and SCD is the machinery that lets you choose deliberately, instead of finding out by accident which one your schema happened to give you.

Three answers to “what happens to the old value?”

Type 1 — overwrite, and forget. Just UPDATE dim_customer SET city = 'Denver'. One row, no extra columns, no history. The catch is that the past is rewritten: last quarter’s report, re-run today, now says Denver as though Maria had always lived there. That is exactly right for corrections — a misspelled name, a fat-fingered ZIP — where the old value was simply wrong. It is exactly wrong when the old value was true at the time and someone may study that history.

Type 2 — add a new row, and remember everything. This is the one you will use most. Instead of overwriting, you expire the old row and insert a new one: the existing row gets valid_to = '2024-03-01' and is_current = false, while a brand-new row arrives with a new surrogate key, the new city, valid_from = '2024-03-01', and is_current = true. The natural key (C2) stays the same across both; only the surrogate key differs.

Two rows, two surrogate keys, one natural key. Each fact already references the version that was current when it happened.

Here is the elegant part. A sale recorded in January stored customer_key = 2, the version current then, so it permanently points at “Austin Maria”; an April sale stores customer_key = 4 and points at “Denver Maria.” History reconstructs itself — every fact already references the dimension version that was true when the event happened. That is the whole reason surrogate keys exist.

Type 3 — one column for one step back. Add current_city and previous_city, and you remember exactly one prior value — handy for “this year’s region versus last year’s” — but never a third city, and never when the change happened. It is a niche tool; Type 2 is the general answer.

The change stream itself usually arrives via Change Data Capture (the next lesson), and you apply it with a single MERGE (upsert): match on the natural key, expire the matched current row, and insert the new version, all atomically. SCD Type 2 is that MERGE’s canonical use.

Practice

Quick check

0/3

Q1A customer's address was entered with a typo and is simply wrong. Which SCD type fits?

Q2In SCD Type 2, what makes a January fact still point at the customer's January attributes after they move in March?

Q3TRANSFER: After a botched load, your Type 2 dim_customer has TWO rows for C2 both with is_current = true. What went wrong, and what's the symptom?

Questions about this lesson

What are slowly changing dimensions (SCD)?

Slowly changing dimensions are techniques for handling changes to dimension attributes over time — when a customer moves city or a product is recategorized. The core question is whether to overwrite the old value or preserve it. The common strategies are Type 1 (overwrite), Type 2 (add a new versioned row), and Type 3 (keep one previous value in an extra column).

What is the difference between SCD Type 1 and Type 2?

Type 1 overwrites the attribute in place, so history is lost and past reports change — fine for corrections. Type 2 expires the old row (setting valid_to and is_current = false) and inserts a new row with a new surrogate key, preserving full history so facts keep pointing at the version that was current when they happened. Type 2 is the warehouse workhorse.

Why do SCD Type 2 dimensions use a surrogate key?

Because Type 2 stores multiple rows for the same real-world entity (same natural key, different versions), each version needs its own unique key — the surrogate key. A fact recorded at a given time stores the surrogate key of the version current then, so historical facts automatically reference historical attributes without any rewriting.

What you'll learn

Before you start

Three answers to “what happens to the old value?”

Practice

Quick check

Sign in to track your progress

Questions about this lesson

Practice this in an interview

Related lessons

Explore further