What is the difference between a star schema and a snowflake schema in dimensional modeling?

A star schema has a central fact table joined directly to denormalized dimension tables — one join hop per dimension, simple queries, better query performance. A snowflake schema normalizes dimension tables into sub-dimensions, reducing storage redundancy but requiring more joins. Star schemas are preferred for analytics workloads; snowflake schemas are sometimes used when a dimension is very large and has many redundant attribute values.

What is the difference between a star schema and a snowflake schema, and which should you choose?

A star schema has a central fact table joined directly to denormalized dimension tables, giving simple two-table joins and fast query performance at the cost of some data redundancy. A snowflake schema normalizes dimensions into sub-dimension tables, reducing storage and update anomalies but requiring more joins that can slow analytical queries.

Should you normalize or denormalize tables in a data warehouse, and why?

Data warehouses favor denormalization — wide, flat tables that trade storage for query simplicity and performance. Normalization (splitting tables to eliminate redundancy) reduces storage but multiplies join hops, increasing query complexity and optimizer cost. In columnar warehouses with compression, the storage cost of redundancy is negligible, so denormalized star schemas consistently outperform normalized models for analytical workloads.

What are the differences between a data warehouse, a data lake, and a data lakehouse?

A data warehouse stores structured, schema-on-write data optimized for SQL analytics but is expensive for raw or unstructured data. A data lake stores any format cheaply on object storage but lacks ACID transactions and query performance. A lakehouse layers open table formats (Delta Lake, Iceberg, Hudi) on object storage to deliver warehouse-grade performance and ACID semantics at data lake costs — it is the dominant architecture in 2026.

Star vs Snowflake Schemas — GATE DA

What you'll learn

Why warehouses (OLAP) are designed differently from transactional databases (OLTP)

Fact tables hold measurements; dimension tables hold the context

Star schema — denormalised dimensions, fewer joins, faster reads, more storage

Snowflake schema — normalised dimensions, more joins, slower reads, less redundancy

The query / storage trade-off and how GATE phrases it

Last lesson ended by asking where clean, analysis-ready data should live. Here is why the live operational database is the wrong answer. Your company runs an online store, and every sale lands in a transactional database — a row inserted, an inventory count nudged, a receipt printed. Now the analytics team wants “total sales by product category, by state, by month, for the last five years.” Asking the live database that question means a giant scan that elbows the checkout traffic out of the way.

So we copy the history into a separate store — a data warehouse — built for reads, not writes. And once a store exists only to answer big aggregate questions, the shape of its tables changes too. It abandons the tidy normalized form you spent two lessons perfecting, because normalization scatters data across many tables and every read then pays to join them back. Two table shapes dominate the warehouse, and they sit at opposite ends of one trade-off: the star schema and the snowflake schema.

OLTP vs OLAP — why the warehouse exists at all

OLTP (Online Transaction Processing) — the live database. Tuned for small, fast inserts and updates: “record one sale.” Its tables are heavily normalized to kill redundancy, exactly as the earlier lessons taught.
OLAP (Online Analytical Processing) — the warehouse. Tuned for big aggregate reads: “sum sales by region and quarter.” Its tables are laid out, on purpose, to minimise joins on read.

And the warehouse keeps history — old rows are retained, not overwritten — so this year can be compared with last.

Fact and dimension tables

Every warehouse schema sorts its data into two kinds of table, and the names say what each holds.

Fact table — one row per event you measure. It carries numeric measures (amount, quantity, profit) plus foreign keys pointing into the dimensions.
Dimension table — the descriptive context around those measures. The Date dimension carries day, month, quarter, year; the Product dimension carries name, category, brand; the Store dimension carries city, state, region.

The fact table is usually enormous (millions of rows, one per sale). The dimensions are small by comparison — thousands of products, a few hundred stores, about 3,650 dates for ten years.

The two layouts

Same data, two layouts. Snowflake’s Product fans out into Category and Brand sub-tables; star keeps them inline.

Star schema — the fact table sits at the centre, and each dimension is a single denormalised table (denormalised meaning you deliberately keep repeated values in one wide table instead of splitting them out). To read a product’s category, you simply read another column of the Product table.
Snowflake schema — the dimensions are further normalised into sub-tables. The Product table no longer carries category and brand names directly; it carries category_id and brand_id foreign keys pointing into separate ProductCategory and ProductBrand tables.

The trade-off in one sentence

Star — fewer joins per query, so faster reads; but the same category name is duplicated across every product in that category, so more storage.
Snowflake — each category name lives exactly once, so less storage; but every query that needs the category name pays for extra joins, so slower reads.

How GATE asks this

Usually an MCQ with a scenario (“the analytics team prizes query speed over disk usage — which schema?”) or a count-the-joins task on each layout. Sometimes an MSQ that lists characteristics and asks which belong to star and which to snowflake.

Worked example — joins in star vs snowflake

A query: “Total Sales amount by product Category for 2024.”

Star schema layout (1000 products, denormalised):

Sales(amount, product_id, date_id, store_id, customer_id)   -- ~10M rows
Product(product_id, name, category, brand)                  -- 1000 rows  (category sits inline)
Date(date_id, day, month, year)                             -- ~3650 rows

The query becomes Sales JOIN Product JOIN Date — 2 joins (Date for the year filter, Product for the category). Category is just a column in Product, so there is no extra hop.

Snowflake schema layout (Product normalised further):

Sales(amount, product_id, date_id, store_id, customer_id)
Product(product_id, name, category_id, brand_id)            -- 1000 rows
ProductCategory(category_id, category_name)                 -- ~20 rows
ProductBrand(brand_id, brand_name)                          -- ~100 rows
Date(date_id, day, month, year)

The same query now reads Sales JOIN Product JOIN ProductCategory JOIN Date — 3 joins (Brand is not needed for this query, so ProductBrand is skipped). Storage wins, because each category name is stored once in the 20-row ProductCategory table instead of being repeated across 1000 product rows.

In one breath

A data warehouse is an OLAP store built for big aggregate reads, holding history its OLTP source overwrites; it centres each subject on a huge fact table of numeric measures plus foreign keys into small dimension tables of context. A star schema keeps dimensions flat and denormalised — fewer joins and faster reads at the cost of repeated values and more storage — while a snowflake schema normalises dimensions into sub-tables, trimming redundancy but adding a join for every level it splits off.

Practice

Quick check

0/6

Q1Recall — Which best describes a FACT table in a warehouse?

Q2Recall — Which statements about a STAR schema are TRUE? (select all that apply)select all that apply

Q3Recall — Which statements about a SNOWFLAKE schema are TRUE? (select all that apply)select all that apply

Q4Trace — A star schema has a Sales fact joined to a single Product dimension. On a snowflake schema where Product is normalised into Product + ProductCategory + ProductBrand, how many joins does a query selecting amount, product name, category, AND brand need?numerical answer — type a number

Q5Apply — A team prizes query speed over disk usage for a dashboard run by hundreds of analysts. Which schema should they pick?

Q6Create — Which is the GREATEST reason a data warehouse uses OLAP-style schemas instead of a fully normalised OLTP schema?

A question to carry forward

You can now lay out a warehouse: a fact table of measures ringed by dimensions, flattened into a star or split into a snowflake. The stage is set — but a stage is not a performance. An analyst rarely asks one fixed question; they roam the data, and the same fact table has to answer at every zoom level.

They start at “total sales this year,” then zoom in — by quarter, by month — then pivot to “sales by state,” then fix on one product and sweep across time. Each move re-aggregates the very same measures at a different grain. Here is the thread onward: what are the named operations for roaming a warehouse this way — rolling up, drilling down, slicing — and is it always safe to add a measure up across a dimension, or are some measures treacherous to sum?

Star vs Snowflake Schemas

What you'll learn

Before you start