ARIMA
Learn how autoregression, differencing, and moving-average error terms combine into one unified model you can fit, diagnose, and forecast with.
What you'll learn
- AR(p), differencing d, and MA(q): what each parameter controls and why they are combined
- The Box-Jenkins workflow: stationarity check, ACF/PACF identification, fitting, residual diagnostics, forecasting
- How to use statsmodels ARIMA, interpret AIC/BIC, and verify that residuals look like white noise
Before you start
What ARIMA is
ARIMA(p, d, q) stands for AutoRegressive Integrated Moving Average. The three letters encode a recipe:
- Difference the raw series
dtimes until it is stationary. - Fit an AR(p) + MA(q) model on that differenced series.
- Integrate back (reverse the differencing) to get forecasts in the original units.
The word “integrated” here is the inverse of differencing: if you difference once to remove a trend, you integrate once (cumulative sum) on the way back out.
The three parameters
| Symbol | Name | What it controls |
|---|---|---|
p | AR order | How many of the series’s own past values feed into today’s prediction |
d | Differencing degree | How many times to subtract consecutive values to achieve stationarity |
q | MA order | How many past forecast errors feed into today’s prediction |
An ARIMA(1, 1, 1) model says: difference the series once, then predict with yesterday’s differenced value plus yesterday’s error.
Stationarity and the role of d
A stationary series has a constant mean and variance over time. Trending or drifting series are not stationary, which breaks ordinary AR and MA fitting. Differencing once removes a linear trend; differencing twice removes a quadratic one. In practice d is almost always 0, 1, or 2. Choosing d too large is a common mistake — see the warning below.
The Box-Jenkins workflow
George Box and Gwilym Jenkins formalized a five-step loop that remains the standard approach for fitting ARIMA models:
Step 1 — Make the series stationary (choose d).
Plot the series. If it trends, compute the first difference. Run an Augmented Dickey-Fuller (ADF) test or KPSS test. Repeat until the test indicates stationarity.
Step 2 — Identify p and q from ACF and PACF.
- The autocorrelation function (ACF) measures correlation between the series and its own lags. An MA(q) model cuts off after lag
q. - The partial autocorrelation function (PACF) removes the influence of intermediate lags. An AR(p) model cuts off after lag
p. - In practice the patterns overlap, so treat ACF/PACF as a starting point, not a rule.
Step 3 — Fit the model.
Pass (p, d, q) to your fitting routine and estimate the parameters by maximum likelihood.
Step 4 — Diagnose residuals. This is the step most beginners skip — and the most important. If your model has captured all signal, the residuals should look like white noise: zero mean, constant variance, no autocorrelation. Check:
- A residual ACF plot: no spikes outside the confidence band.
- The Ljung-Box test: a significant p-value means autocorrelation is still present and your model is underfit.
Step 5 — Forecast.
Call .forecast(steps=h) for an h-step-ahead forecast. Uncertainty widens with horizon.
Model selection: AIC and BIC
When comparing candidate orders, use the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC). Both penalize models for complexity. Lower is better. BIC applies a heavier penalty for extra parameters, so it tends to favor sparser models.
A practical shortcut is pmdarima.auto_arima, which searches over a grid of (p, d, q) values, runs stationarity tests automatically, and returns the order with the lowest AIC. It is a useful starting point, but always inspect the residuals of whatever it returns.
The ARIMA pipeline — diagram
The ARIMA pipeline: difference → AR+MA fit → integrate back → forecast (widening band = growing uncertainty).
Fitting ARIMA in Python
The code below shows the full Box-Jenkins pipeline using statsmodels. It runs locally — not in the browser — because statsmodels is not available in Pyodide.
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.stats.diagnostic import acorr_ljungbox
# --- Step 1: load and inspect ---
series = pd.read_csv("monthly_sales.csv", index_col="date", parse_dates=True)["sales"]
# ADF test for stationarity
adf_result = adfuller(series)
print(f"ADF p-value: {adf_result[1]:.4f}") # p < 0.05 => stationary
# First difference if needed
diff1 = series.diff().dropna()
adf_diff = adfuller(diff1)
print(f"ADF p-value (diff=1): {adf_diff[1]:.4f}")
# --- Step 2: ACF / PACF plots to choose p and q ---
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(diff1, lags=20, ax=axes[0])
plot_pacf(diff1, lags=20, ax=axes[1])
plt.tight_layout()
plt.show()
# --- Step 3: fit ---
model = ARIMA(series, order=(2, 1, 1)) # p=2, d=1, q=1
result = model.fit()
print(result.summary()) # AIC, BIC, coefficient estimates
# --- Step 4: diagnose residuals ---
result.plot_diagnostics(figsize=(12, 8))
plt.show()
lb = acorr_ljungbox(result.resid, lags=[10], return_df=True)
print(lb) # p-value >> 0.05 => residuals look like white noise
# --- Step 5: forecast ---
forecast = result.forecast(steps=12)
print(forecast)
conf_int = result.get_forecast(steps=12).conf_int()
print(conf_int)
Reading .summary()
The summary table includes:
- coef — estimated AR and MA coefficients.
- P value — whether each coefficient is statistically significant.
- AIC / BIC — use these to compare competing orders; lower wins.
- Ljung-Box (Q) — the null is no autocorrelation; a large p-value here is what you want.
Auto-selection with pmdarima
from pmdarima import auto_arima
auto_model = auto_arima(
series,
seasonal=False,
information_criterion="aic",
stepwise=True,
trace=True,
)
print(auto_model.summary())
auto_arima runs ADF tests internally, tries many (p, d, q) combinations, and returns the lowest-AIC model. Always verify the residuals even when using auto-selection.
Seeing forecast extrapolation intuitively
The playground below uses only NumPy and Matplotlib. It fits a simple linear extrapolation on a synthetic trending series and plots the forecast as a naive intuition-builder — not a real ARIMA fit. The widening band represents how uncertainty grows with horizon.
Notice how the confidence band fans out as you go further into the future. A real ARIMA forecast behaves the same way: the further out you forecast, the less certain the model is.
Putting it all together
Here is the compact mental model to carry forward:
- ARIMA(p, d, q): difference
dtimes to get stationarity, then fit AR lags on the past values and MA lags on the past errors, then integrate the forecasts back to original scale. - Box-Jenkins in one sentence: stationarise → identify → fit → diagnose → forecast — and do not skip the diagnosis.
- AIC/BIC rank models; residual ACF and Ljung-Box confirm the model is done.
- Once you are comfortable with non-seasonal ARIMA, the natural next step is SARIMA, which adds seasonal AR and MA terms for periodic data.
Quick check
Practice this in an interview
All questionsA Vector Autoregression (VAR) model extends ARIMA to multiple time series simultaneously: each variable is regressed on its own past values and the past values of all other variables in the system. Use VAR when the series have mutual predictive relationships (Granger-causality) and you want to model those interactions; ARIMA is sufficient when one series can be forecast in isolation.
ARIMA(p,d,q) models non-seasonal series by combining autoregression, differencing, and a moving average of errors. SARIMA extends it with a second set of seasonal parameters (P,D,Q,s) that operate at the seasonal lag s, handling periodic patterns that ARIMA alone cannot capture.
Prophet is a curve-fitting model that decomposes the series into trend, seasonality, and holidays; it handles missing data, multiple seasonalities, and non-uniform time grids with minimal tuning and is accessible to non-statisticians. ARIMA is a statistical model based on autocorrelation structure; it is more appropriate when the series is short, noise is small, and you need principled uncertainty intervals from an explicit stochastic process.
Choose d by differencing until the ADF test confirms stationarity; choose p from the PACF cutoff and q from the ACF cutoff on the differenced series; then confirm with AIC or BIC to guard against over-fitting. In practice, an automated grid search over a small range of candidates with information criteria is more reliable than visual inspection alone.