datarekha
Pandas & Data Wrangling Medium Asked at Two SigmaAsked at Goldman SachsAsked at Spotify

How do rolling and expanding windows work in pandas, and when do you use each?

The short answer

rolling() computes statistics over a fixed-size sliding window, discarding data outside the window; expanding() grows the window from the first row to the current row, equivalent to an ever-increasing cumulative calculation. Both return objects you chain .mean(), .sum(), .std(), or a custom .apply() onto.

How to think about it

How to think through window functions

The key distinction is what you’re forgetting versus what you’re remembering. A rolling window is like a sliding spotlight — it only looks at the most recent N rows and actively forgets older data. An expanding window is like a running total — it never forgets anything, growing wider with every new row.

When does that matter? Rolling is right for short-term trend detection: is this week’s sales higher than the 4-week average? Expanding is right for running metrics: what is the all-time average up to today?

Try rolling and expanding side by side

Time-based windows on a DatetimeIndex

When your index is a DatetimeIndex, you can pass a time offset string instead of a row count. This handles irregular time series correctly — a “3D” window always covers 3 calendar days regardless of how many rows fall in that window.

ts = prices.copy()
ts.index = pd.date_range("2024-01-01", periods=10, freq="D")

# 3-calendar-day rolling window
ts.rolling("3D").mean()

GroupBy + rolling for per-group moving averages

This combination is one of the most common feature engineering patterns in time series ML:

df["rolling_avg"] = (
    df.groupby("ticker")["close"]
      .transform(lambda s: s.rolling(window=5).mean())
)

The transform ensures the result is index-aligned with the original DataFrame, so you can assign it directly as a new column without a merge.

Custom functions on a window

# 3-period price momentum (last minus first in window)
prices.rolling(3).apply(lambda x: x[-1] - x[0], raw=True)

raw=True passes a NumPy array instead of a Series to the lambda, skipping pandas overhead — significantly faster for large windows.

Learn it properly Time series

Keep practising

All Pandas & Data Wrangling questions

Explore further

Skip to content