datarekha
Pandas & Data Wrangling Easy Asked at AmazonAsked at GoogleAsked at Microsoft

What is the split-apply-combine model in pandas GroupBy?

The short answer

GroupBy splits a DataFrame into subgroups by key, applies a function independently to each group, then combines the results back into a single object. Understanding which phase each method targets — agg collapses, transform preserves shape, filter removes entire groups — determines which API to reach for.

How to think about it

The mental model

Hadley Wickham formalized split-apply-combine, and once you internalize it, groupby stops being mysterious. The trick is to think about what you want the output shape to look like — that instantly tells you which method to use.

  • Want one row per group (a summary table)? Use agg.
  • Want to add a new column to the original DataFrame with a group-level stat? Use transform.
  • Want to drop entire groups that fail a condition? Use filter.
OriginalDataFrameGroup AGroup BSplitfn(A)fn(B)ApplyResultCombine
Split-apply-combine: each group is processed independently before results are assembled.

All three methods in one playground

The three phases mapped to methods

PhaseMethodOutput shapeTypical use
ApplyaggOne row per groupSummary tables, dashboards
ApplytransformSame shape as inputAdding derived columns (z-scores, ratios, ranks)
ApplyfilterSubset of original rowsRemoving rare categories, outlier groups
Learn it properly GroupBy

Keep practising

All Pandas & Data Wrangling questions

Explore further

Skip to content