Pandas & Data Wrangling Medium Asked at AmazonAsked at MetaAsked at MicrosoftAsked at Walmart

What is the difference between pivot, pivot_table, and melt in pandas, and when do you use each?

For Data Analyst Data Scientist Data Engineer

The short answer

pivot reshapes long-format data to wide by spreading a column's values into new column headers — it requires unique index/column combinations and has no aggregation. pivot_table is the aggregating version that handles duplicates via a specified aggfunc. melt is the inverse: it takes wide-format data and collapses multiple columns into key-value rows (long format).

How to think about it

The interviewer is testing whether you treat data shape as a deliberate choice, not a side-effect of how the data happened to arrive. The mental model: wide format has one row per entity and a column per variable (months as headers); long/tidy format has one row per (entity, variable) pair. melt goes wide → long; pivot and pivot_table go long → wide. Bonus points for naming up front that pivot raises on duplicate keys — that signals you’ve hit it in practice.

A worked example — the round trip

melt collapses the month columns into rows, then pivot spreads them back:

import pandas as pd

wide = pd.DataFrame({"city": ["NYC", "LA", "Chicago"],
                     "jan": [3, 9, 5], "feb": [5, 8, 6], "mar": [4, 10, 7]})

long = wide.melt(id_vars="city", var_name="month", value_name="sales")
print(long)

      city month  sales
0      NYC   jan      3
1       LA   jan      9
2  Chicago   jan      5
3      NYC   feb      5
4       LA   feb      8
5  Chicago   feb      6
6      NYC   mar      4
7       LA   mar     10
8  Chicago   mar      7

Nine tidy rows — exactly the shape groupby, plotting, and ML pipelines expect. pivot reverses it, and because every (city, month) pair is unique it just succeeds:

back = long.pivot(index="city", columns="month", values="sales")
print(back)

month    feb  jan  mar
city
Chicago    6    5    7
LA         8    9   10
NYC        5    3    4

(pandas sorts the resulting index and columns, so cities and months come back alphabetical.) The catch is duplicates: if two rows shared a (city, month) key, pivot couldn’t pick one value and would raise. That’s where pivot_table earns its place — it aggregates the collision:

sales = pd.DataFrame({"date": ["2024-Q1","2024-Q1","2024-Q1","2024-Q2","2024-Q2"],
                      "region": ["East","East","West","East","West"],
                      "amount": [100, 150, 200, 130, 210]})
print(sales.pivot_table(index="date", columns="region", values="amount", aggfunc="sum", fill_value=0))

region   East  West
date
2024-Q1   250   200
2024-Q2   130   210

The two Q1-East rows (100 + 150) collapse to 250 under aggfunc="sum" — exactly the case where plain pivot would crash.

Learn it properly pivot, melt, stack

What is the difference between pivot, pivot_table, and melt in pandas, and when do you use each?

A worked example — the round trip

Keep practising

Explore further