SQL Medium Asked at AmazonAsked at GoogleAsked at UberAsked at Spotify

What is table partitioning and when does it improve query performance?

The short answer

Partitioning divides a large table into smaller physical segments (partitions) based on a column value, so the planner can skip irrelevant partitions entirely — a technique called partition pruning. It improves performance for queries that filter on the partition key, and it simplifies bulk data management tasks like dropping old data by dropping a partition instead of issuing a slow DELETE.

How to think about it

Partitioning is a physical storage optimisation, not a logical schema change. From the application’s perspective the table looks and behaves identically; behind the scenes the engine reads only the partition(s) that can possibly contain the matching rows — a technique called partition pruning.

What is really being tested

The interviewer wants to know three things: what partitioning is, when it actually helps (and when it does not), and how you would manage data lifecycle with it. Candidates who only answer “it makes queries faster” miss two thirds of the question.

Partition strategies

Strategy	How rows are assigned	Best for
Range	Based on ranges of a column value	Time-series data (by month / year)
List	Based on discrete values	Region, status, tenant ID
Hash	By hash of the partition key	Uniform distribution when no natural range exists

How partition pruning works

Pruning visualised

Table: events (100 M rows across 12 monthly partitions)

Query: WHERE ts BETWEEN '2024-01-15' AND '2024-02-15'

Partitions evaluated:
  ✓  events_2024_jan   ← touched
  ✓  events_2024_feb   ← touched
  ✗  events_2024_mar   ← skipped
  ✗  events_2024_apr   ← skipped
  ... (8 more skipped)

Result: ~8 M rows read instead of 100 M

Range partitioning in practice (PostgreSQL)

CREATE TABLE events (
  id         BIGSERIAL,
  user_id    INT,
  event_type TEXT,
  ts         TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (ts);

CREATE TABLE events_2024_q1 PARTITION OF events
  FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE events_2024_q2 PARTITION OF events
  FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

-- The planner reads only events_2024_q1; all other partitions are skipped
SELECT COUNT(*) FROM events
WHERE ts BETWEEN '2024-01-15' AND '2024-02-15';

You can confirm pruning with EXPLAIN:

EXPLAIN SELECT COUNT(*) FROM events
WHERE ts >= '2024-01-01' AND ts < '2024-04-01';
-- Plan output: "Partitions: events_2024_q1"

Dropping old data is instant

This is the operational win that interviewers love to hear about. Deleting millions of rows via DELETE is slow and write-logs every row. Dropping a partition is a metadata operation — effectively instant.

-- Slow: row-by-row delete with logging
DELETE FROM events WHERE ts < '2022-01-01';

-- Fast: drop the whole partition as a unit
DROP TABLE events_2022_q1;

-- Or detach it first (keeps data accessible, removes from parent)
ALTER TABLE events DETACH PARTITION events_2022_q1;

When partitioning does NOT help

Queries that do not filter on the partition key: all partitions are scanned regardless.
Tables with fewer than a few million rows: partition metadata overhead outweighs any benefit.
Cloud columnar warehouses with native clustering (BigQuery, Snowflake): they use clustering keys and micro-partition metadata for a similar effect; declarative partitioning is less critical there.

Indexes on partitioned tables

In PostgreSQL, indexes live on each partition individually. Creating an index on the parent table (CREATE INDEX ON events (user_id)) automatically cascades to all existing and future child partitions. If you add the index after some partitions already exist, those partitions are re-indexed automatically — but it locks them during the build.

Learn it properly Columnar Storage & Parquet