datarekha

OLTP vs OLAP

Why you never point your dashboards at the production app database. The two database workload archetypes, the storage-layout mechanism that drives the whole split, and the ETL/CDC bridge between them.

8 min read Intermediate SQL Lesson 19 of 27

What you'll learn

  • The two workload archetypes — many tiny transactions vs few huge scans
  • Why storage layout (row vs column) is the root cause of everything else
  • Why analytics on the production box starves the live app
  • The ETL vs ELT bridge and zero-ETL / CDC replication
  • Where HTAP fits — and why it didn't replace the two-system pattern

Before you start

Almost every data system you will ever touch falls into one of two camps, defined entirely by the shape of the work it does.

OLTP — Online Transaction Processing. Thousands of tiny concurrent transactions: read one user, update one order, insert one row. This is the database behind the live application. It is normalized, indexed for point lookups, and fully transactional (ACID). Think PostgreSQL, MySQL, SQL Server, Oracle.

OLAP — Online Analytical Processing. A handful of enormous read-only queries that scan millions-to-billions of historical rows to produce an aggregate: total revenue by region, daily active users by cohort. This is the analytics / BI database. Think Snowflake, BigQuery, Amazon Redshift, ClickHouse, DuckDB.

OLTPOLAP
Workloadmany tiny reads/writesfew huge aggregate scans
Reads vs writesmixed, high write fractionread-mostly
Storage layoutrow-orientedcolumn-oriented
Schemanormalized (e.g. 3NF)denormalized / star schema
IndexingB-tree / hash for point lookupssparse + data-skipping, materialized views
Consistencyfull ACID, immediateACID at write, lagged reads often fine
Servesthe live appdashboards + BI
ExamplesPostgres, MySQL, OracleSnowflake, BigQuery, Redshift, ClickHouse, DuckDB

The one decision that drives everything: row vs column

Every difference in that table is downstream of a single physical choice: how the bytes are laid out on disk.

A row store keeps all the columns of a single row physically contiguous — one customer’s id, name, region, and balance sit together on one page. A column store keeps each column’s values contiguous in its own block — all the regions together, all the amounts together, across millions of rows.

Row storea row’s columns sit togetherid•region•amount•status ← row 1id•region•amount•status ← row 2id•region•amount•status ← row 3✓ fetch one whole row: cheap✗ sum one column: read everythingColumn storea column’s values sit togetheridregionamountstatus✓ sum one column: read just it✗ write one row: touch every block
Same table, two physical layouts. The layout — not the SQL — decides who is fast.

That one decision cascades:

Why the row store wins OLTP. Fetching one customer by primary key is a single B-tree walk plus one page read, because every field of that row is already adjacent. B-tree lookups are O(log n) — finding one row in ~100M records traverses only ~3-4 tree levels. Inserting, updating, or deleting one row touches one place. This is single-row locality + point-index seeks, and it is exactly what a live app does all day.

Why the column store wins OLAP. Three mechanisms compound:

  1. Column pruning (projection). An aggregate over one or two columns reads only those columns’ blocks and skips every other column entirely. A row store must read every row — all columns — just to reach the few it needs.
  2. Compression. A column holds values of a single type and domain, so same-type adjacency compresses far harder than mixed-type rows. Analytical compression is roughly 5-20x, and 30x+ on low-cardinality columns (order-of-magnitude, dataset-dependent). Fewer bytes on disk means fewer bytes to scan.
  3. Vectorized execution. Contiguous same-type arrays let the engine process batches of thousands of values with SIMD / CPU-vector instructions and tight cache locality, crushing SUM / COUNT / AVG / GROUP BY.

The aha from the widget: the SQL barely changes; the storage layout flips the winner. Columnar is not “just faster everywhere” — it is poor at single-row point lookups and high-frequency single-row writes, because one insert must touch every column’s separate structure (write fan-out across N column files), and updating compressed blocks usually means append/rewrite rather than in-place edits. Great scans, poor point writes.

Why analytics on the OLTP box is harmful

So why not just run the big aggregate on production? Even under MVCC — where Postgres and MySQL/InnoDB readers generally do not block writers — a long analytical query on the live database hurts in ways that have nothing to do with classic lock blocking:

One big scan on the production box• Full-scan costrow store reads every row; scan time grows with table size• Buffer-pool pollutionthe scan evicts the hot OLTP working set from cache• Resource starvationCPU / memory / IO spent on the query is stolen from transactions• Blast radiusone runaway query can degrade or take down production
The damage is mostly resource contention and cache eviction — not lock blocking.

State the harm precisely. Long readers under MVCC do not usually lock out writers, but they do pin resources, hold long snapshots (bloat pressure), and can block schema or maintenance operations. The buffer-pool eviction and CPU/IO starvation are what actually slow your users. The standard fix is structural: isolate analytics on a separate system.

The bridge: ETL, ELT, and zero-ETL

If analytics lives in a separate columnar warehouse, something has to move the data there. That pipeline has evolved:

  • ETL — Extract, Transform, Load. Transform the data on a separate processing tier before loading it into the warehouse. The older pattern; it fits when cleansing must precede load or the target compute is weak.
  • ELT — Extract, Load, Transform. Load raw data into the warehouse first, then transform in-warehouse using its massively-parallel (MPP) compute. This is the modern default for cloud warehouses (Snowflake, BigQuery, Redshift), with dbt as the dominant in-warehouse transform tool.
  • Zero-ETL / CDC. The newest evolution streams committed OLTP changes (change data capture) to the warehouse with minimal or no pipeline — for example AWS Aurora-to-Redshift zero-ETL, or ClickHouse’s ClickPipes / PeerDB.

The destination shapes the data into a star schema — the standard OLAP dimensional model. One (or a few) central fact table holds the quantitative measures (sales amount, quantity) plus foreign keys, surrounded by dimension tables holding descriptive attributes (time, product, customer, geography). It deliberately denormalizes dimension attributes into wide tables, trading redundancy for fewer joins and faster reads — the exact opposite philosophy from normalized OLTP design, which minimizes redundancy to keep writes safe. (A snowflake schema re-normalizes those dimensions; Kimball generally advises against it because the extra joins can hurt read performance.)

What about HTAP — one database for both?

HTAP — Hybrid Transactional/Analytical Processing is the convergence story: run OLTP and OLAP on the same system and data without moving it. The term was coined by Gartner in 2014. Real systems include TiDB (a row store, TiKV, plus a columnar replica, TiFlash), SingleStore, SAP HANA, Oracle Database In-Memory, and the columnar-augmented OLTP engines MySQL HeatWave and Google AlloyDB.

It is tempting to declare the divide solved. It is not — and the honest trade-offs matter:

  • The data-format dilemma. You cannot simultaneously maximize performance isolation, data freshness, and workload-specific optimization. Something gives.
  • Dual-format designs pay a write-amplification tax. Every transaction writes the row store and a columnar copy, roughly doubling write IO and adding format-conversion CPU.
  • Single-format designs degrade one side. Research has measured analytical queries on a row store degrading ~50% when more than 95% of scanned data is unused, transactions on a column store degrading >20%, and OLTP throughput dropping up to ~42% from sharing hardware with analytics. (Workload-specific benchmark figures, not universal constants.)

The strongest market signal is that the industry chose separate engines over convergence. The two leading analytics platforms acquired dedicated Postgres OLTP engines rather than extend their columnar engines into HTAP: Snowflake bought Crunchy Data (announced June 2025), and Databricks bought Neon (May 2025). DuckDB’s rise — an embedded columnar OLAP engine, the “SQLite for analytics” — pushed analytics local and in-process for many workloads. The dominant production pattern is explicitly not one converged database: a row-store OLTP system (overwhelmingly Postgres) for the live app, plus a columnar OLAP system for analytics, connected by CDC / zero-ETL. Treat HTAP as a real-but-niche convergence attempt with honest caveats, not the winner.

Quick check

Quick check

0/3
Q1A column store crushes a SUM over 100M rows mainly because…
Q2Under MVCC (Postgres / InnoDB), what is the MAIN harm of running a giant analytical scan on the production OLTP database?
Q3TRANSFER: A startup runs everything on one Postgres box. Analysts keep firing huge GROUP BY reports that scan the whole orders table, and the live checkout page has started timing out during business hours. You can't change the app's queries. What is the architecturally correct fix?

Next

You now know why analytics data lives somewhere else and what that somewhere looks like — columnar, denormalized, fed by CDC. Next time you write a heavy GROUP BY, you will know which kind of engine you actually want underneath it.

Practice this in an interview

All questions
What is the difference between OLTP and OLAP systems, and why can't you run analytics on your production database?

OLTP (Online Transaction Processing) systems handle high-throughput, low-latency reads and writes for individual records — think order placement, user authentication. OLAP (Online Analytical Processing) systems handle complex aggregations over millions of rows for business intelligence. Running heavy analytics directly on an OLTP database locks rows, competes for I/O, and slows application queries that customers feel.

What is the difference between OLTP and OLAP workloads, and how does that drive database design choices?

OLTP systems handle many small, latency-sensitive transactions that read and write a few rows at a time, so they are optimized for fast point lookups and row-level locking. OLAP systems run infrequent but wide analytical queries over millions of rows, so they benefit from columnar storage, bulk scans, and denormalized schemas that minimize joins.

What are the differences between a data warehouse, a data lake, and a data lakehouse?

A data warehouse stores structured, schema-on-write data optimized for SQL analytics but is expensive for raw or unstructured data. A data lake stores any format cheaply on object storage but lacks ACID transactions and query performance. A lakehouse layers open table formats (Delta Lake, Iceberg, Hudi) on object storage to deliver warehouse-grade performance and ACID semantics at data lake costs — it is the dominant architecture in 2026.

What is the difference between ETL and ELT, and when should you choose each?

ETL transforms data before loading it into the destination, which was necessary when warehouses were expensive and compute-constrained. ELT loads raw data first and transforms inside the warehouse, leveraging cheap cloud compute and making raw data available for reprocessing. ELT is the default in modern cloud stacks; ETL still makes sense when you must mask sensitive fields before they ever land in the warehouse.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content