What is the difference between OLTP and OLAP workloads, and how does that drive database design choices?
OLTP systems handle many small, latency-sensitive transactions that read and write a few rows at a time, so they are optimized for fast point lookups and row-level locking. OLAP systems run infrequent but wide analytical queries over millions of rows, so they benefit from columnar storage, bulk scans, and denormalized schemas that minimize joins.
How to think about it
What the question is really testing
This is a systems design question at heart. The interviewer wants to know whether you understand why different workloads call for different storage engines, schemas, and indexing strategies — and whether you can translate that into concrete architecture decisions.
The core contrast
Think of OLTP as the store’s cash register — many small, fast transactions happening all day. OLAP is the monthly financial report — infrequent but touching every transaction ever recorded.
| Dimension | OLTP | OLAP |
|---|---|---|
| Query pattern | Point read/write, few rows | Aggregations over millions of rows |
| Concurrency | Thousands of simultaneous small transactions | Tens of large queries |
| Schema style | Normalized (3NF), row-oriented | Star/snowflake, columnar |
| Latency target | Milliseconds | Seconds to minutes |
| Index strategy | Many selective indexes | Fewer, but covering or bitmap |
| Storage layout | Row store (InnoDB, PostgreSQL heap) | Column store (BigQuery, Redshift, Parquet) |
| Key concern | Write throughput, ACID, lock contention | Read throughput, compression, join strategy |
Why these design choices make sense
OLTP: row storage + normalization
An OLTP transaction typically touches one or a few rows. Row storage puts all columns of a row adjacent on disk — perfect for point lookups and updates.
Normalization protects write correctness: update a customer’s city once, in one row:
UPDATE customers SET city = 'Boston' WHERE id = 9901;
In a denormalized schema, city might be duplicated across millions of order rows — a single address change requires updating every one of them.
OLAP: column storage + denormalization
An OLAP query typically scans one or two columns across millions of rows:
SELECT region, SUM(revenue) FROM fact_sales GROUP BY region;
A row store reads every column of every qualifying row — customer_name, product_desc, address — even though the query only needs region and revenue. A column store reads only those two columns, typically 10–50x less I/O. Homogeneous data in each column also compresses extremely well (run-length encoding, dictionary encoding).
The ETL/ELT bridge
Production systems almost always have both layers. Data flows from the OLTP source into the OLAP warehouse via Change Data Capture (CDC) or batch ETL/ELT pipelines. The warehouse holds a denormalized copy optimized for analytical read patterns.
Interview framing
The question is really asking whether you understand the read/write optimization trade-off. Show that you know the architectural implication: OLTP normalization protects data integrity under concurrent writes; OLAP denormalization and columnar storage optimize scan throughput for analytics. Then mention the standard pipeline that bridges them.