datarekha
SQL Medium Asked at AmazonAsked at GoogleAsked at UberAsked at Spotify

What is table partitioning and when does it improve query performance?

The short answer

Partitioning divides a large table into smaller physical segments (partitions) based on a column value, so the planner can skip irrelevant partitions entirely — a technique called partition pruning. It improves performance for queries that filter on the partition key, and it simplifies bulk data management tasks like dropping old data by dropping a partition instead of issuing a slow DELETE.

How to think about it

Partitioning is a physical storage optimisation, not a logical schema change. From the application’s perspective the table looks and behaves identically; behind the scenes the engine reads only the partition(s) that can possibly contain the matching rows — a technique called partition pruning.

What is really being tested

The interviewer wants to know three things: what partitioning is, when it actually helps (and when it does not), and how you would manage data lifecycle with it. Candidates who only answer “it makes queries faster” miss two thirds of the question.

Partition strategies

StrategyHow rows are assignedBest for
RangeBased on ranges of a column valueTime-series data (by month / year)
ListBased on discrete valuesRegion, status, tenant ID
HashBy hash of the partition keyUniform distribution when no natural range exists

How partition pruning works

Range partitioning in practice (PostgreSQL)

CREATE TABLE events (
  id         BIGSERIAL,
  user_id    INT,
  event_type TEXT,
  ts         TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (ts);

CREATE TABLE events_2024_q1 PARTITION OF events
  FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE events_2024_q2 PARTITION OF events
  FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

-- The planner reads only events_2024_q1; all other partitions are skipped
SELECT COUNT(*) FROM events
WHERE ts BETWEEN '2024-01-15' AND '2024-02-15';

You can confirm pruning with EXPLAIN:

EXPLAIN SELECT COUNT(*) FROM events
WHERE ts >= '2024-01-01' AND ts < '2024-04-01';
-- Plan output: "Partitions: events_2024_q1"

Dropping old data is instant

This is the operational win that interviewers love to hear about. Deleting millions of rows via DELETE is slow and write-logs every row. Dropping a partition is a metadata operation — effectively instant.

-- Slow: row-by-row delete with logging
DELETE FROM events WHERE ts < '2022-01-01';

-- Fast: drop the whole partition as a unit
DROP TABLE events_2022_q1;

-- Or detach it first (keeps data accessible, removes from parent)
ALTER TABLE events DETACH PARTITION events_2022_q1;

When partitioning does NOT help

  • Queries that do not filter on the partition key: all partitions are scanned regardless.
  • Tables with fewer than a few million rows: partition metadata overhead outweighs any benefit.
  • Cloud columnar warehouses with native clustering (BigQuery, Snowflake): they use clustering keys and micro-partition metadata for a similar effect; declarative partitioning is less critical there.

Indexes on partitioned tables

In PostgreSQL, indexes live on each partition individually. Creating an index on the parent table (CREATE INDEX ON events (user_id)) automatically cascades to all existing and future child partitions. If you add the index after some partitions already exist, those partitions are re-indexed automatically — but it locks them during the build.

Learn it properly Columnar Storage & Parquet

Keep practising

All SQL questions

Explore further

Skip to content