datarekha
Pandas & Data Wrangling Medium Asked at GoogleAsked at MetaAsked at MicrosoftAsked at Amazon

What is the difference between merge, join, and concat in pandas?

The short answer

concat stacks DataFrames along an axis without matching keys; join aligns on the index (or a single key column) using a convenient shorthand; merge is the most general, joining on any column(s) with full SQL-style control over the join type, key names, and suffix handling.

How to think about it

What the interviewer wants to know

All three combine DataFrames, but they answer different questions. concat asks “stack these top-to-bottom (or side-by-side)”. merge asks “align rows where this column’s values match”. join is a convenience alias for merge when you want to align on the index. A strong answer names each one’s sweet spot and its failure mode.

The three tools, in plain English

concat — stack without matching. No key alignment. pandas lines up columns by name and fills gaps with NaN:

pd.concat([df1, df2])               # row-wise: stack vertically (axis=0)
pd.concat([df1, df2], axis=1)       # column-wise: side by side
pd.concat([df1, df2], ignore_index=True)  # reset to 0, 1, 2, ...

Use case: appending monthly export files, or adding a new set of features alongside existing ones.

merge — SQL-style join on any column(s). The most powerful and explicit option:

orders.merge(customers, on="customer_id", how="left")

# Different column names on each side
orders.merge(customers, left_on="cust_id", right_on="id")

# Multi-key join
df1.merge(df2, on=["year", "region"])

join — shorthand for index-on-index (or index-on-column) merge. Cleaner syntax when you have already set meaningful indexes:

df1.join(df2, how="left")   # aligns on index of both

# Join on a specific column of the right frame
df1.join(df2.set_index("customer_id"), on="cust_id")

join is implemented as a thin wrapper around merge — there is no performance difference.

Decision table

ScenarioTool
Append rows from multiple DataFramesconcat(axis=0)
Add columns side-by-sideconcat(axis=1)
Join on arbitrary column(s)merge
Join on indexjoin or merge(left_index=True, right_index=True)
Join where column names differmerge(left_on=..., right_on=...)

Playground — see all three in action

The key insight

The choice is almost always between concat (when you want to stack) and merge (when you want to align on keys). join is merge with a prettier syntax — use it when both DataFrames already have the right index set.

concat is deceptively powerful: it handles schema mismatches gracefully (columns not in both DataFrames get NaN), but that same flexibility can hide problems — a misnamed column in one batch will create two separate NaN-filled columns instead of merging cleanly.

Learn it properly Merge & join

Keep practising

All Pandas & Data Wrangling questions

Explore further

Skip to content