What is table partitioning and when does it improve query performance?
Partitioning divides a large table into smaller physical segments (partitions) based on a column value, so the planner can skip irrelevant partitions entirely — a technique called partition pruning. It improves performance for queries that filter on the partition key, and it simplifies bulk data management tasks like dropping old data by dropping a partition instead of issuing a slow DELETE.
How to think about it
Partitioning is a physical storage optimisation, not a logical schema change. From the application’s perspective the table looks and behaves identically; behind the scenes the engine reads only the partition(s) that can possibly contain the matching rows — a technique called partition pruning.
What is really being tested
The interviewer wants to know three things: what partitioning is, when it actually helps (and when it does not), and how you would manage data lifecycle with it. Candidates who only answer “it makes queries faster” miss two thirds of the question.
Partition strategies
| Strategy | How rows are assigned | Best for |
|---|---|---|
| Range | Based on ranges of a column value | Time-series data (by month / year) |
| List | Based on discrete values | Region, status, tenant ID |
| Hash | By hash of the partition key | Uniform distribution when no natural range exists |
How partition pruning works
Range partitioning in practice (PostgreSQL)
CREATE TABLE events (
id BIGSERIAL,
user_id INT,
event_type TEXT,
ts TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (ts);
CREATE TABLE events_2024_q1 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE events_2024_q2 PARTITION OF events
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
-- The planner reads only events_2024_q1; all other partitions are skipped
SELECT COUNT(*) FROM events
WHERE ts BETWEEN '2024-01-15' AND '2024-02-15';
You can confirm pruning with EXPLAIN:
EXPLAIN SELECT COUNT(*) FROM events
WHERE ts >= '2024-01-01' AND ts < '2024-04-01';
-- Plan output: "Partitions: events_2024_q1"
Dropping old data is instant
This is the operational win that interviewers love to hear about. Deleting millions of rows via DELETE is slow and write-logs every row. Dropping a partition is a metadata operation — effectively instant.
-- Slow: row-by-row delete with logging
DELETE FROM events WHERE ts < '2022-01-01';
-- Fast: drop the whole partition as a unit
DROP TABLE events_2022_q1;
-- Or detach it first (keeps data accessible, removes from parent)
ALTER TABLE events DETACH PARTITION events_2022_q1;
When partitioning does NOT help
- Queries that do not filter on the partition key: all partitions are scanned regardless.
- Tables with fewer than a few million rows: partition metadata overhead outweighs any benefit.
- Cloud columnar warehouses with native clustering (BigQuery, Snowflake): they use clustering keys and micro-partition metadata for a similar effect; declarative partitioning is less critical there.
Indexes on partitioned tables
In PostgreSQL, indexes live on each partition individually. Creating an index on the parent table (CREATE INDEX ON events (user_id)) automatically cascades to all existing and future child partitions. If you add the index after some partitions already exist, those partitions are re-indexed automatically — but it locks them during the build.