Pandas & Data Wrangling Medium Asked at GoogleAsked at MetaAsked at MicrosoftAsked at Amazon

What is the difference between merge, join, and concat in pandas?

For Data Analyst Data Scientist ML Engineer Data Engineer

The short answer

concat stacks DataFrames along an axis without matching keys; join aligns on the index (or a single key column) using a convenient shorthand; merge is the most general, joining on any column(s) with full SQL-style control over the join type, key names, and suffix handling.

How to think about it

All three combine DataFrames, but they answer different questions. concat asks “stack these top-to-bottom (or side-by-side)” — no key matching. merge asks “align rows where this column’s values match” — SQL-style. join is a convenience alias for merge that aligns on the index. Name each one’s sweet spot and you’ve answered well.

A worked example — all three side by side

concat stacks without matching, lining up columns by name:

import pandas as pd

jan = pd.DataFrame({"month": ["Jan", "Jan"], "sales": [100, 200]})
feb = pd.DataFrame({"month": ["Feb", "Feb"], "sales": [150, 90]})
print(pd.concat([jan, feb], ignore_index=True))     # vertical stack

  month  sales
0   Jan    100
1   Jan    200
2   Feb    150
3   Feb     90

That’s appending monthly exports — pure stacking, no key. pd.concat([features, labels], axis=1) instead glues columns side-by-side by position. merge is the SQL join on a column, even when the key names differ:

orders = pd.DataFrame({"order_id": [1, 2, 3], "cust_id": [10, 20, 10], "amount": [50, 200, 75]})
customers = pd.DataFrame({"id": [10, 20, 30], "name": ["Alice", "Bob", "Carol"]})
print(orders.merge(customers, left_on="cust_id", right_on="id", how="left"))

   order_id  cust_id  amount  id   name
0         1       10      50  10  Alice
1         2       20     200  20    Bob
2         3       10      75  10  Alice

Each order is matched to its customer by value (cust_id ↔ id), and Alice appears twice because she has two orders — that’s key alignment, not stacking. join is the same machine when both frames already carry the right index:

info  = pd.DataFrame({"city": ["NY", "LA", "SF"]}, index=[10, 20, 30])
score = pd.DataFrame({"score": [88, 72, 95]},     index=[10, 20, 30])
print(info.join(score))     # index-on-index shorthand for merge

   city  score
10   NY     88
20   LA     72
30   SF     95

join aligns on the shared index with no on= needed — it’s merge(left_index=True, right_index=True) with prettier syntax and no performance difference. So the real choice is almost always concat (stack) vs merge (align on keys); join is merge for pre-indexed frames.

Learn it properly Merge & join

What is the difference between merge, join, and concat in pandas?

A worked example — all three side by side

Keep practising

Explore further