What is the split-apply-combine model in pandas GroupBy?
The short answer
GroupBy splits a DataFrame into subgroups by key, applies a function independently to each group, then combines the results back into a single object. Understanding which phase each method targets — agg collapses, transform preserves shape, filter removes entire groups — determines which API to reach for.
How to think about it
The mental model
Hadley Wickham formalized split-apply-combine, and once you internalize it, groupby stops being mysterious. The trick is to think about what you want the output shape to look like — that instantly tells you which method to use.
- Want one row per group (a summary table)? Use
agg. - Want to add a new column to the original DataFrame with a group-level stat? Use
transform. - Want to drop entire groups that fail a condition? Use
filter.
All three methods in one playground
The three phases mapped to methods
| Phase | Method | Output shape | Typical use |
|---|---|---|---|
| Apply | agg | One row per group | Summary tables, dashboards |
| Apply | transform | Same shape as input | Adding derived columns (z-scores, ratios, ranks) |
| Apply | filter | Subset of original rows | Removing rare categories, outlier groups |