SQL Easy Asked at AmazonAsked at GoogleAsked at MetaAsked at Microsoft

What is the difference between OLTP and OLAP workloads, and how does that drive database design choices?

For Data Engineer Data Analyst Data Scientist

The short answer

OLTP systems handle many small, latency-sensitive transactions that read and write a few rows at a time, so they are optimized for fast point lookups and row-level locking. OLAP systems run infrequent but wide analytical queries over millions of rows, so they benefit from columnar storage, bulk scans, and denormalized schemas that minimize joins.

How to think about it

What the question is really testing

This is a systems design question at heart. The interviewer wants to know whether you understand why different workloads call for different storage engines, schemas, and indexing strategies — and whether you can translate that into concrete architecture decisions.

The core contrast

Think of OLTP as the store’s cash register — many small, fast transactions happening all day. OLAP is the monthly financial report — infrequent but touching every transaction ever recorded.

Dimension	OLTP	OLAP
Query pattern	Point read/write, few rows	Aggregations over millions of rows
Concurrency	Thousands of simultaneous small transactions	Tens of large queries
Schema style	Normalized (3NF), row-oriented	Star/snowflake, columnar
Latency target	Milliseconds	Seconds to minutes
Index strategy	Many selective indexes	Fewer, but covering or bitmap
Storage layout	Row store (InnoDB, PostgreSQL heap)	Column store (BigQuery, Redshift, Parquet)
Key concern	Write throughput, ACID, lock contention	Read throughput, compression, join strategy

Why these design choices make sense

OLTP: row storage + normalization

An OLTP transaction typically touches one or a few rows. Row storage puts all columns of a row adjacent on disk — perfect for point lookups and updates.

Normalization protects write correctness: update a customer’s city once, in one row:

UPDATE customers SET city = 'Boston' WHERE id = 9901;

In a denormalized schema, city might be duplicated across millions of order rows — a single address change requires updating every one of them.

OLAP: column storage + denormalization

An OLAP query typically scans one or two columns across millions of rows:

SELECT region, SUM(revenue) FROM fact_sales GROUP BY region;

A row store reads every column of every qualifying row — customer_name, product_desc, address — even though the query only needs region and revenue. A column store reads only those two columns, typically 10–50x less I/O. Homogeneous data in each column also compresses extremely well (run-length encoding, dictionary encoding).

The mental model — data flow between OLTP and OLAP

PostgreSQL / MySQL     →  CDC or batch ETL/ELT  →  Snowflake / BigQuery
(OLTP source of truth)    (Fivetran, Debezium)      (OLAP analytical layer)

normalized rows              transform                  star schema,
row-oriented store           & load                     columnar store

The ETL/ELT bridge

Production systems almost always have both layers. Data flows from the OLTP source into the OLAP warehouse via Change Data Capture (CDC) or batch ETL/ELT pipelines. The warehouse holds a denormalized copy optimized for analytical read patterns.

Interview framing

The question is really asking whether you understand the read/write optimization trade-off. Show that you know the architectural implication: OLTP normalization protects data integrity under concurrent writes; OLAP denormalization and columnar storage optimize scan throughput for analytics. Then mention the standard pipeline that bridges them.

Learn it properly OLTP vs OLAP