OLAP: Concept Hierarchies & Measures
Roll up sales to the year, drill back down to the day, and learn which aggregates you can merge from sub-totals — and which need every raw row.
What you'll learn
- Concept hierarchies — day → month → quarter → year, city → state → country
- OLAP operations: roll-up, drill-down, slice, dice
- Distributive measures (SUM, COUNT, MIN, MAX) merge cleanly from sub-aggregates
- Algebraic measures (AVG, stddev) reduce to a fixed set of distributive ones
- Holistic measures (MEDIAN, MODE, percentiles) need every underlying row
Before you start
Picture a sales dashboard. You start at “total sales by day for the last month.” You zoom out — “by month for the last year.” Zoom out again — “by year for the last decade.” Then you fix one slice — “just the Bangalore store” — and compare it across years.
Every one of those moves is a named OLAP operation, and they all rely on two ideas: a concept hierarchy telling the cube how to summarise (day rolls up to month, city rolls up to state), and a measure type telling it which aggregations are safe to combine across sub-groups. That last point matters more than it sounds — get it wrong and you’ll compute an “average of averages” that does not equal the true average.
Concept hierarchies — levels of summary
A concept hierarchy is just a chain saying “this level rolls up to that level.” A few classics:
- Time: day → month → quarter → year
- Location: city → state → country
- Product: SKU → brand → category
These let the warehouse pre-build aggregates at every level. Asking for “sales by year” then becomes a one-row lookup, not a scan of every transaction.
The four OLAP operations
- Roll-up — go up the hierarchy: aggregate sales from month to quarter to year. Coarser, fewer rows.
- Drill-down — go down: split year sales back into months. Finer, more rows.
- Slice — fix one dimension to a single value: “only 2024.” The cube drops a dimension.
- Dice — apply filters on several dimensions at once: “2024 AND Bangalore AND Electronics.” The cube shrinks to a sub-cube.
Measure categorisation — the rule that decides what you can pre-compute
The single most-tested idea here: not every aggregate can be combined from sub-aggregates. Some can. Some cannot. Three categories cover all of them.
- Distributive — the aggregate over the whole equals an aggregate of the sub-aggregates. The big four: SUM, COUNT, MAX, MIN. Daily sales sums add up to the monthly sum; monthly sums add up to the annual sum. No raw rows needed once the lower-level totals are stored.
- Algebraic — computable from a fixed number of distributive sub-aggregates. AVG = SUM / COUNT — store the sum and the count per month, recover the year’s average exactly. Standard deviation is similar (needs SUM, COUNT, SUM of squares).
- Holistic — needs every underlying row; no bounded set of sub-totals is enough. MEDIAN, MODE, rank, percentiles. The median of twelve monthly medians is not the median of the year.
This is exactly why warehouses pre-materialise SUMs and COUNTs at every hierarchy level — and don’t pre-materialise medians.
How GATE asks this
Usually an MCQ: “Classify AVG as distributive, algebraic, or holistic.” Or an MSQ listing measures and asking which are distributive — the answer is always the SUM/COUNT/MIN/MAX subset. Occasionally an MSQ on which OLAP operations move you up vs down the hierarchy.
Worked example — classify three aggregates
A retail warehouse pre-computes monthly aggregates per store. You want annual totals per store without re-scanning the daily rows. Which measures can you recover, and how?
- SUM(sales) — distributive. Add the twelve monthly sums per store. Annual total recovered exactly.
- AVG(sales) — algebraic. Store monthly SUM and monthly COUNT separately;
then annual
AVG = (sum of twelve monthly sums) / (sum of twelve monthly counts). Two distributive sub-aggregates suffice, so it’s algebraic — not distributive on its own (because(avg_jan + avg_feb + ... + avg_dec) / 12generally does NOT equal the true annual average when month lengths differ). - MEDIAN(sales) — holistic. You cannot combine twelve monthly medians into the annual median; for that you need the daily rows. Holistic measures must be recomputed from raw data, which is why warehouses do not pre-store them per hierarchy level.
Quick check
Quick check
Practice this in an interview
All questionsAll three are extensions to GROUP BY that produce multiple levels of aggregation in a single query. ROLLUP produces hierarchical subtotals, CUBE produces all possible subtotal combinations, and GROUPING SETS lets you specify exactly which grouping combinations you want.
OLTP systems handle many small, latency-sensitive transactions that read and write a few rows at a time, so they are optimized for fast point lookups and row-level locking. OLAP systems run infrequent but wide analytical queries over millions of rows, so they benefit from columnar storage, bulk scans, and denormalized schemas that minimize joins.
OLTP (Online Transaction Processing) systems handle high-throughput, low-latency reads and writes for individual records — think order placement, user authentication. OLAP (Online Analytical Processing) systems handle complex aggregations over millions of rows for business intelligence. Running heavy analytics directly on an OLTP database locks rows, competes for I/O, and slows application queries that customers feel.
Columnar storage colocates values from the same column on disk, so aggregation queries read only the columns they need rather than full rows — dramatically reducing I/O on wide tables. Partitioning physically separates data into subdirectories (e.g., by date), allowing the query engine to skip entire partitions whose predicate cannot match, cutting scan volume from the full table to just the relevant slice.