Pandas & Data Wrangling Medium Asked at AmazonAsked at UberAsked at Airbnb

What is the difference between GroupBy transform and agg in pandas?

For Data Analyst Data Scientist ML Engineer Data Engineer

The short answer

agg collapses each group into a single scalar, returning a result with one row per group. transform returns a Series or DataFrame with the same index as the original, broadcasting the group-level result back to every row — making it ideal for adding derived columns without a merge.

How to think about it

This is an output-shape question, and the cleanest one-line answer is: agg reduces, transform preserves. agg collapses each group to a single value, so you end up with one row per group — a summary table. transform runs the same aggregation but then broadcasts the result back across every member of the group, so the output has the exact same shape and index as the input. That broadcast is the whole point: it is what lets you add a group statistic as a new column without ever doing a merge. Without transform, your only path is to agg down to one row per group and then merge that summary back on the group key — two steps and a join where transform does it in one.

Same group, two shapes

Watch one dept grouping go both ways. agg returns three rows, one per department; transform returns six rows, the same count as the input, with each department’s average glued onto every employee in it.

import pandas as pd

df = pd.DataFrame({
    "dept":   ["Eng", "Eng", "HR", "HR", "HR", "Sales"],
    "name":   ["Alice", "Bob", "Carol", "Dave", "Eve", "Frank"],
    "salary": [90_000, 110_000, 60_000, 70_000, 80_000, 55_000],
})

# agg — collapses each group to one row (3 rows out)
dept_summary = df.groupby("dept")["salary"].agg(
    avg_salary="mean", headcount="count",
).reset_index()
print("agg (one row per group):")
print(dept_summary)
print()

# transform — broadcasts the group average back to every row (6 rows out)
df["dept_avg"]   = df.groupby("dept")["salary"].transform("mean")
df["pct_of_avg"] = (df["salary"] / df["dept_avg"] * 100).round(1)
df["above_avg"]  = df["salary"] > df["dept_avg"]
print("transform (same shape, new columns added):")
print(df.to_string())

agg (one row per group):
    dept  avg_salary  headcount
0    Eng    100000.0          2
1     HR     70000.0          3
2  Sales     55000.0          1

transform (same shape, new columns added):
    dept   name  salary  dept_avg  pct_of_avg  above_avg
0    Eng  Alice   90000  100000.0        90.0      False
1    Eng    Bob  110000  100000.0       110.0       True
2     HR  Carol   60000   70000.0        85.7      False
3     HR   Dave   70000   70000.0       100.0      False
4     HR    Eve   80000   70000.0       114.3       True
5  Sales  Frank   55000   55000.0       100.0      False

The contrast is right there in the row counts. agg reduced HR’s three salaries to a single 70000.0, so the summary has three rows total. transform computed the same 70000.0 but wrote it onto all three HR rows, keeping the frame at six — which is exactly what makes pct_of_avg and above_avg line up row-for-row with each employee. Because the result is index-aligned, you can assign it straight into a new column.

Common index-aligned patterns fall out of this directly:

# z-score within each department
df["z"] = df.groupby("dept")["salary"].transform(lambda s: (s - s.mean()) / s.std())

# rank within each department (1 = highest earner)
df["rank"] = df.groupby("dept")["salary"].transform("rank", ascending=False)

# share of the department's total
df["pct_dept"] = df["salary"] / df.groupby("dept")["salary"].transform("sum")

Goal	Use
Summary table — one row per group	`agg`
New column on the original frame (no merge)	`transform`
Drop entire groups by a condition	`filter`
Custom multi-column logic per group	`apply` (last resort)

Learn it properly GroupBy

What is the difference between GroupBy transform and agg in pandas?

Same group, two shapes

Keep practising

Explore further