What is the difference between GroupBy transform and agg in pandas?
agg collapses each group into a single scalar, returning a result with one row per group. transform returns a Series or DataFrame with the same index as the original, broadcasting the group-level result back to every row — making it ideal for adding derived columns without a merge.
How to think about it
The output-shape question
The fastest way to answer this in an interview: “agg reduces — you get fewer rows. transform preserves — you get the same number of rows, with the group-level value broadcast to every member of that group.”
In practice, transform saves you from a merge. Without it, the only way to add a group statistic back to the original DataFrame is to agg, then merge on the group key. transform does both in one step.
See the shape difference clearly
Common transform patterns
Because transform is index-aligned, you can create derived columns directly:
# z-score within each department
df["z_score"] = df.groupby("dept")["salary"].transform(
lambda s: (s - s.mean()) / s.std()
)
# rank within each department (1 = highest earner)
df["dept_rank"] = df.groupby("dept")["salary"].transform("rank", ascending=False)
# percentage of department total
df["pct_dept"] = df["salary"] / df.groupby("dept")["salary"].transform("sum")
When to use which
| Goal | Use |
|---|---|
| Summary table — one row per group | agg |
| New column on original DataFrame (no merge) | transform |
| Drop entire groups based on a condition | filter |
| Custom multi-column logic per group | apply (last resort) |