datarekha

Star vs Snowflake Schemas

How a data warehouse organises facts and dimensions for fast analytics — and why one design has fewer joins but uses more storage.

7 min read Intermediate GATE DA Lesson 75 of 122

What you'll learn

  • Why warehouses (OLAP) are designed differently from transactional databases (OLTP)
  • Fact tables hold measurements; dimension tables hold the context
  • Star schema — denormalised dimensions, fewer joins, faster reads, more storage
  • Snowflake schema — normalised dimensions, more joins, slower reads, less redundancy
  • The query / storage trade-off and how GATE phrases it

Before you start

Your company runs an online store. Every sale goes into a transactional database — a row inserted, an inventory count updated, a receipt printed. Now the analytics team wants to know “total sales by product category, by state, by month, for the last five years.” Asking the live transactional database to answer that means a giant scan that competes with checkout traffic.

So we copy the history into a separate store — a data warehouse — that is designed for reads, not writes. The shape of the tables changes too. Two patterns dominate: the star schema and the snowflake schema.

OLTP vs OLAP — the warehouse exists for a reason

  • OLTP (Online Transaction Processing) — the live database. Optimised for small, fast inserts and updates: “record one sale.” Tables are highly normalised to avoid redundancy.
  • OLAP (Online Analytical Processing) — the warehouse. Optimised for big aggregate read queries: “sum sales by region and quarter.” Tables are deliberately laid out to minimise joins on read.

The warehouse stores history — old rows are kept, not overwritten — so you can compare this year with last year.

Fact and dimension tables

Every warehouse schema splits the data into two kinds of tables:

  • Fact table — one row per event you measure. Numeric measures (amount, quantity, profit) plus foreign keys that point into the dimension tables.
  • Dimension table — descriptive context. The Date dimension carries day, month, quarter, year; the Product dimension carries name, category, brand; the Store dimension carries city, state, region.

The fact table is usually huge (millions of rows). Dimensions are small (thousands of products, a few hundred stores, ~3650 dates for ten years).

The two layouts

Star schemaSnowflake schemaSales(fact)DateProductStoreCustomerSales(fact)DateProductCategoryBrandStoreCustomerflat dimensions → fewer joins, faster readsnormalised dimensions → more joins, less redundancy
Same data, two layouts. Snowflake’s Product fans out into Category and Brand sub-tables; star keeps them inline.
  • Star schema — the fact table sits at the centre; each dimension is a single denormalised table (denormalised = deliberately keeping repeated values in one wide table instead of splitting them out). To look up a product’s category, you just read another column of the Product table.
  • Snowflake schema — the dimensions are further normalised into sub-tables. The Product table no longer carries category and brand names directly; instead it carries category_id and brand_id foreign keys that point into separate ProductCategory and ProductBrand tables.

The trade-off in one sentence

  • Star: fewer joins per query → faster reads, but the same category name is duplicated across every product in that category → more storage.
  • Snowflake: each category name lives once → less storage, but every query that needs the category name pays for extra joinsslower reads.

How GATE asks this

Usually an MCQ giving a scenario (“the analytics team prizes query speed over disk usage — which schema?”) or asking you to count joins for a query on each schema. Sometimes an MSQ listing characteristics and asking which belong to star, which to snowflake.

Worked example — joins in star vs snowflake

A query: “Total Sales amount by product Category for 2024.”

Star schema layout (1000 products, denormalised):

Sales(amount, product_id, date_id, store_id, customer_id)   -- ~10M rows
Product(product_id, name, category, brand)                  -- 1000 rows  (category sits inline)
Date(date_id, day, month, year)                             -- ~3650 rows

The query: Sales JOIN Product JOIN Date2 joins (Date for the year filter, Product for the category). Category is just a column in Product, no extra hop.

Snowflake schema layout (Product normalised further):

Sales(amount, product_id, date_id, store_id, customer_id)
Product(product_id, name, category_id, brand_id)            -- 1000 rows
ProductCategory(category_id, category_name)                 -- ~20 rows
ProductBrand(brand_id, brand_name)                          -- ~100 rows
Date(date_id, day, month, year)

Same query now: Sales JOIN Product JOIN ProductCategory JOIN Date3 joins. (Brand is not needed for this query, so we skip ProductBrand.) Storage wins because each category name is stored exactly once in the 20-row ProductCategory table instead of being repeated across 1000 product rows.

Quick check

Quick check

0/6
Q1A star schema has a Sales fact joined to a single Product dimension. The same query on a snowflake schema where Product is normalised into Product + ProductCategory + ProductBrand sub-tables needs HOW MANY joins (assume the query selects amount, product name, category, AND brand)?numerical answer — type a number
Q2Which statements about a STAR schema are TRUE? (select all that apply)select all that apply
Q3Which statements about a SNOWFLAKE schema are TRUE? (select all that apply)select all that apply
Q4Which best describes a FACT table in a warehouse?
Q5A team prizes query speed over disk usage for a dashboard run by hundreds of analysts. Which schema should they pick?
Q6Which is the GREATEST reason a data warehouse uses OLAP-style schemas instead of a fully normalised OLTP schema?

Practice this in an interview

All questions
What is the difference between a star schema and a snowflake schema in dimensional modeling?

A star schema has a central fact table joined directly to denormalized dimension tables — one join hop per dimension, simple queries, better query performance. A snowflake schema normalizes dimension tables into sub-dimensions, reducing storage redundancy but requiring more joins. Star schemas are preferred for analytics workloads; snowflake schemas are sometimes used when a dimension is very large and has many redundant attribute values.

What is the difference between a star schema and a snowflake schema, and which should you choose?

A star schema has a central fact table joined directly to denormalized dimension tables, giving simple two-table joins and fast query performance at the cost of some data redundancy. A snowflake schema normalizes dimensions into sub-dimension tables, reducing storage and update anomalies but requiring more joins that can slow analytical queries.

Should you normalize or denormalize tables in a data warehouse, and why?

Data warehouses favor denormalization — wide, flat tables that trade storage for query simplicity and performance. Normalization (splitting tables to eliminate redundancy) reduces storage but multiplies join hops, increasing query complexity and optimizer cost. In columnar warehouses with compression, the storage cost of redundancy is negligible, so denormalized star schemas consistently outperform normalized models for analytical workloads.

What are the differences between a data warehouse, a data lake, and a data lakehouse?

A data warehouse stores structured, schema-on-write data optimized for SQL analytics but is expensive for raw or unstructured data. A data lake stores any format cheaply on object storage but lacks ACID transactions and query performance. A lakehouse layers open table formats (Delta Lake, Iceberg, Hudi) on object storage to deliver warehouse-grade performance and ACID semantics at data lake costs — it is the dominant architecture in 2026.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content