datarekha

SARIMA (Seasonal ARIMA)

Extend ARIMA with a second set of seasonal AR, differencing, and MA terms that operate at lag m — the season length — so the model captures yearly, weekly, or any repeating cycle.

10 min read Advanced Time Series Lesson 8 of 14

What you'll learn

  • SARIMA(p,d,q)(P,D,Q)m: what each of the seven parameters controls and how the seasonal block differs from the ordinary block
  • Seasonal differencing D: subtracting the value one full season ago to remove a periodic mean shift
  • Reading ACF/PACF spikes at multiples of m to choose P and Q

Before you start

Why ARIMA is not enough for seasonal data

ARIMA(p, d, q) models short-range autocorrelation (observations a few lags apart) and trend (via ordinary differencing). It knows nothing about observations separated by exactly one season — 12 months for monthly-yearly data, 7 days for daily-weekly data, 4 quarters for quarterly-yearly data.

If your series has a strong seasonal cycle, residuals from a plain ARIMA model will still carry spikes at multiples of the season length m. Those spikes mean signal the model has not absorbed. SARIMA absorbs them by adding a seasonal block of terms that work at lag m, 2m, 3m, and so on.

The SARIMA notation

SARIMA(p, d, q)(P, D, Q)m

The notation stacks two ARIMA-style triplets:

SymbolBlockRole
p, d, qOrdinary (non-seasonal)Short-range AR lags, trend differencing, short-range MA lags
P, D, QSeasonalSeasonal AR lags at multiples of m, seasonal differencing, seasonal MA lags at multiples of m
mPeriodNumber of time steps in one season (12 = monthly/yearly, 7 = daily/weekly, 4 = quarterly/yearly)

The seasonal block mirrors the ordinary block exactly — it just stretches the lag axis by a factor of m. A seasonal AR term at order P = 1 uses the observation from m steps ago, not just one step ago.

Seasonal differencing D

Ordinary differencing (parameter d) subtracts the previous observation to remove a linear trend:

y't = yt - y(t-1)

Seasonal differencing (parameter D) subtracts the observation exactly one full season ago to remove a repeating seasonal mean shift:

y't = yt - y(t-m)

If your monthly sales always rise in summer and fall in winter, subtracting the value from the same month last year cancels that cycle. After D = 1 seasonal difference, the seasonal pattern is — in principle — gone, leaving only irregular variation that the AR and MA terms can model.

You can apply both ordinary and seasonal differencing together. d = 1, D = 1 means you first seasonally difference and then ordinary-difference the result (or vice versa). The total order of integration is d + D.

Identifying P and Q from the ACF and PACF

The same ACF/PACF logic that drives ordinary (p, q) selection applies to the seasonal block, but you read the plots at the seasonal lags m, 2m, 3m rather than lags 1, 2, 3.

  • Spike in the PACF at lag m (and perhaps 2m), ACF decaying slowly at seasonal lags — suggests a seasonal AR term (P = 1 or 2).
  • Spike in the ACF at lag m (and perhaps 2m), PACF decaying slowly at seasonal lags — suggests a seasonal MA term (Q = 1 or 2).
  • Both decay gradually at seasonal lags — a mixed seasonal ARMA may help, though parsimony favors keeping P and Q small.

For the ordinary block you read the same plots at lags 1, 2, 3 as you would for a plain ARIMA.

The inline diagram

lag m (P,D,Q apply here)lag 2mlag 3mt=0m2m3mTime →SARIMA (p,d,q)(P,D,Q)m

A seasonal series peaks at every multiple of m. The seasonal block (P, D, Q) models the autocorrelation at those lags; the ordinary block (p, d, q) handles everything in between.

Fitting with statsmodels

statsmodels exposes SARIMA through its SARIMAX class (the X stands for optional exogenous regressors; you can ignore it for pure SARIMA). The order argument takes the ordinary triplet and seasonal_order takes the seasonal triplet plus the period m.

import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX

# monthly_sales is a pandas Series with a DatetimeIndex, freq="MS"
model = SARIMAX(
    monthly_sales,
    order=(1, 1, 1),           # p=1, d=1, q=1
    seasonal_order=(1, 1, 1, 12),  # P=1, D=1, Q=1, m=12
    enforce_stationarity=False,
    enforce_invertibility=False,
)
result = model.fit(disp=False)

print(result.summary())

# 12-step-ahead forecast
forecast = result.get_forecast(steps=12)
mean_forecast = forecast.predicted_mean
conf_int = forecast.conf_int()

Reading the summary

The summary table from result.summary() lists coefficients for each AR, MA, seasonal AR, and seasonal MA term along with p-values. Coefficients that are not statistically significant (large p-values, typically above 0.05) suggest you may be over-parameterizing — try reducing P or Q by one. After fitting, inspect residuals with result.plot_diagnostics(): the residual ACF should show no spikes at any lag, including the seasonal ones.

Choosing m

m must match the true periodicity in your data. Common choices:

  • m = 12 — monthly data with a yearly cycle
  • m = 7 — daily data with a weekly cycle
  • m = 4 — quarterly data with a yearly cycle
  • m = 24 — hourly data with a daily cycle

If you are unsure of m, plot the raw series and look for the distance between recurring peaks. A seasonal decomposition (STL or classical) can also reveal the dominant period before you commit to a value.

SARIMA vs plain ARIMA — a summary

FeatureARIMA(p,d,q)SARIMA(p,d,q)(P,D,Q)m
Handles trendYes (via d)Yes (via d)
Handles seasonalityNoYes (via D and seasonal terms)
ACF/PACF spikes at lag mLeft unmodeledAbsorbed by P or Q
Parameters to choose37
Typical useNon-seasonal or weakly seasonalMonthly, daily, quarterly with clear cycle

Quick check

0/3
Q1In SARIMA(1,1,1)(1,1,1)12 fitted to monthly data, which parameter removes the repeating year-over-year mean shift?
Q2Your ACF of a seasonally differenced series shows a single large spike at lag 7 and nothing notable elsewhere. What seasonal order does this suggest for weekly data?
Q3A colleague models daily electricity demand with SARIMA(2,1,2)(1,1,1)7. They mention that residuals still show spikes at lags 24 and 48. What is the most likely explanation?

Practice this in an interview

All questions
What is the difference between ARIMA and SARIMA, and when do you use each?

ARIMA(p,d,q) models non-seasonal series by combining autoregression, differencing, and a moving average of errors. SARIMA extends it with a second set of seasonal parameters (P,D,Q,s) that operate at the seasonal lag s, handling periodic patterns that ARIMA alone cannot capture.

What is a VAR model, and when would you use it instead of a univariate ARIMA?

A Vector Autoregression (VAR) model extends ARIMA to multiple time series simultaneously: each variable is regressed on its own past values and the past values of all other variables in the system. Use VAR when the series have mutual predictive relationships (Granger-causality) and you want to model those interactions; ARIMA is sufficient when one series can be forecast in isolation.

When would you choose Prophet over ARIMA for a forecasting problem?

Prophet is a curve-fitting model that decomposes the series into trend, seasonality, and holidays; it handles missing data, multiple seasonalities, and non-uniform time grids with minimal tuning and is accessible to non-statisticians. ARIMA is a statistical model based on autocorrelation structure; it is more appropriate when the series is short, noise is small, and you need principled uncertainty intervals from an explicit stochastic process.

How do you choose p, d, and q for an ARIMA model?

Choose d by differencing until the ADF test confirms stationarity; choose p from the PACF cutoff and q from the ACF cutoff on the differenced series; then confirm with AIC or BIC to guard against over-fitting. In practice, an automated grid search over a small range of candidates with information criteria is more reliable than visual inspection alone.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content