What are ROLLUP, CUBE, and GROUPING SETS, and when would you choose each?

All three are extensions to GROUP BY that produce multiple levels of aggregation in a single query. ROLLUP produces hierarchical subtotals, CUBE produces all possible subtotal combinations, and GROUPING SETS lets you specify exactly which grouping combinations you want.

What is the difference between OLTP and OLAP workloads, and how does that drive database design choices?

OLTP systems handle many small, latency-sensitive transactions that read and write a few rows at a time, so they are optimized for fast point lookups and row-level locking. OLAP systems run infrequent but wide analytical queries over millions of rows, so they benefit from columnar storage, bulk scans, and denormalized schemas that minimize joins.

What is the difference between OLTP and OLAP systems, and why can't you run analytics on your production database?

OLTP (Online Transaction Processing) systems handle high-throughput, low-latency reads and writes for individual records — think order placement, user authentication. OLAP (Online Analytical Processing) systems handle complex aggregations over millions of rows for business intelligence. Running heavy analytics directly on an OLTP database locks rows, competes for I/O, and slows application queries that customers feel.

How does columnar storage work, and how does partitioning improve query performance in a data warehouse?

Columnar storage colocates values from the same column on disk, so aggregation queries read only the columns they need rather than full rows — dramatically reducing I/O on wide tables. Partitioning physically separates data into subdirectories (e.g., by date), allowing the query engine to skip entire partitions whose predicate cannot match, cutting scan volume from the full table to just the relevant slice.

OLAP: Concept Hierarchies & Measures — GATE DA

What you'll learn

Concept hierarchies — day → month → quarter → year, city → state → country

OLAP operations: roll-up, drill-down, slice, dice

Distributive measures (SUM, COUNT, MIN, MAX) merge cleanly from sub-aggregates

Algebraic measures (AVG, stddev) reduce to a fixed set of distributive ones

Holistic measures (MEDIAN, MODE, percentiles) need every underlying row

Last lesson set the stage — a fact table ringed by dimensions — and warned that an analyst never asks just one fixed question. She roams. Watch her do it. She opens “total sales by day, this month.” She zooms out to “by month, this year.” Out once more to “by year, this decade.” Then she freezes on a single store — “only Bangalore” — and sweeps it back across time.

Every one of those moves has a name, and together they are the OLAP operations. They all lean on two quiet ideas: a concept hierarchy that tells the cube how to summarise (a day rolls up into a month, a city into a state), and a measure type that tells it which aggregates are safe to combine across sub-groups. That second idea matters far more than it first sounds — get it wrong and you compute an “average of averages” that simply is not the true average.

Concept hierarchies — levels of summary

A concept hierarchy is nothing more than a chain saying “this level rolls up into that level.” A few you already know by heart:

Time: day → month → quarter → year
Location: city → state → country
Product: SKU → brand → category

These chains let the warehouse pre-build totals at every level. “Sales by year” then becomes a one-row lookup instead of a scan over every transaction.

A four-level time hierarchy. Roll-up climbs (more aggregated, fewer rows); drill-down descends (more detail, more rows).

The four OLAP operations

Think of the hierarchy as a camera zoom, and these four moves as what your hands can do with it.

Roll-up — go up the hierarchy: aggregate sales from month to quarter to year. Coarser, fewer rows.
Drill-down — go down: split a year’s sales back into its months. Finer, more rows.
Slice — fix one dimension to a single value, “only 2024.” The cube loses a dimension.
Dice — apply filters on several dimensions at once, “2024 AND Bangalore AND Electronics.” The cube shrinks to a sub-cube.

Measure types — what you can pre-compute, and what you cannot

Here is the single most-tested idea in the lesson: not every aggregate can be rebuilt from sub-aggregates. Some can, some cannot, and three categories sort all of them.

Distributive — the aggregate over the whole equals an aggregate of the parts. The famous four are SUM, COUNT, MAX, MIN. Daily sums add into a monthly sum; monthly sums add into a yearly sum. Once the lower totals are stored, no raw rows are needed.
Algebraic — computable from a fixed number of distributive sub-aggregates. AVG = SUM ÷ COUNT, so store a SUM and a COUNT per month and you recover the year’s average exactly. Standard deviation is the same idea (it needs SUM, COUNT, and SUM of squares).
Holistic — needs every underlying row; no bounded set of sub-totals will do. MEDIAN, MODE, rank, percentiles. The median of twelve monthly medians is simply not the median of the year.

This is precisely why a warehouse pre-materialises SUMs and COUNTs at every level of the hierarchy — and never pre-materialises medians.

How GATE asks this

Usually an MCQ: “Classify AVG as distributive, algebraic, or holistic.” Or an MSQ that lists measures and asks which are distributive — and the answer is always the SUM/COUNT/MIN/MAX subset. Sometimes an MSQ asks which OLAP operations move you up the hierarchy versus down it.

Worked example — classify three aggregates

A retail warehouse pre-computes monthly aggregates per store. You want annual totals per store without re-scanning the daily rows. Which measures can you recover, and how?

Take them one at a time, watching what each needs.

SUM(sales)    → distributive.  Add the twelve monthly sums per store. Annual total, exact.

AVG(sales)    → algebraic.     Keep monthly SUM and monthly COUNT separately, then
                               annual AVG = (sum of 12 monthly sums) / (sum of 12 monthly counts).
                               Two distributive sub-aggregates suffice — so algebraic, not distributive.

MEDIAN(sales) → holistic.      Twelve monthly medians do NOT combine into the annual median.
                               You need the daily rows back. Nothing smaller will do.

Notice why AVG is not distributive: averaging the twelve monthly averages, (avg_jan + ... + avg_dec) / 12, gives the wrong answer whenever the months differ in how many rows they hold. The honest route keeps SUM and COUNT apart and divides only at the level you finally want.

In one breath

A concept hierarchy (day→month→quarter→year) lets a warehouse summarise at many grains, and the four OLAP operations roam those grains — roll-up climbs, drill-down descends, slice fixes one dimension, dice filters several; meanwhile a measure is distributive if the whole equals an aggregate of its parts (SUM/COUNT/MIN/MAX), algebraic if a fixed handful of distributive pieces rebuild it (AVG from SUM and COUNT), or holistic if nothing short of every raw row will do (MEDIAN, MODE, percentiles).

Practice

Quick check

0/7

Q1Recall — Which measures are DISTRIBUTIVE? (select all that apply)select all that apply

Q2Recall — Which OLAP operations move you UP a concept hierarchy (to a coarser summary)? (select all that apply)select all that apply

Q3Apply — Classify MEDIAN(sales): distributive, algebraic, or holistic?

Q4Apply — Classify AVG(sales): distributive, algebraic, or holistic?

Q5Apply — Filtering on Year = 2024 AND City = Bangalore AND Category = Electronics simultaneously is which OLAP operation?

Q6Trace — A warehouse pre-computes the monthly SUM and monthly COUNT of sales per store. From these alone, which aggregates can you exactly recover for the YEAR per store? (select all that apply)select all that apply

Q7Create — COUNT(DISTINCT customer) — the number of unique customers — was not in the worked example. To get the year's distinct-customer count, can you just add up the twelve monthly distinct-customer counts? What does that make the measure?

A question to carry forward

That closes the database chapter. Step back and see what every lesson in it shared: you recorded facts, shaped them, stored them, and queried them — sum this, average that, roll the past up to the year. Every question, from a simple SELECT to a diced sub-cube, asked the same kind of thing: what already happened?

But a warehouse full of last decade’s sales is only worth so much if all it can do is report the past. The far more valuable question is the one no query can answer: what will happen next? Given a house’s size and locality, what price will it fetch — a figure that exists in no row anywhere? Answering that means learning a pattern from the data and turning it on a future you have never seen. Here is the thread onward into a new chapter: how does a machine learn such a pattern from examples, what are the two great families of that learning, and what one discipline keeps its predictions honest?

OLAP: Concept Hierarchies & Measures

What you'll learn

Before you start

Concept hierarchies — levels of summary

The four OLAP operations

Measure types — what you can pre-compute, and what you cannot

How GATE asks this

Worked example — classify three aggregates

In one breath

Practice

Quick check

A question to carry forward

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further