datarekha

ARIMA

Learn how autoregression, differencing, and moving-average error terms combine into one unified model you can fit, diagnose, and forecast with.

10 min read Intermediate Time Series Lesson 7 of 14

What you'll learn

  • AR(p), differencing d, and MA(q): what each parameter controls and why they are combined
  • The Box-Jenkins workflow: stationarity check, ACF/PACF identification, fitting, residual diagnostics, forecasting
  • How to use statsmodels ARIMA, interpret AIC/BIC, and verify that residuals look like white noise

Before you start

What ARIMA is

ARIMA(p, d, q) stands for AutoRegressive Integrated Moving Average. The three letters encode a recipe:

  1. Difference the raw series d times until it is stationary.
  2. Fit an AR(p) + MA(q) model on that differenced series.
  3. Integrate back (reverse the differencing) to get forecasts in the original units.

The word “integrated” here is the inverse of differencing: if you difference once to remove a trend, you integrate once (cumulative sum) on the way back out.

The three parameters

SymbolNameWhat it controls
pAR orderHow many of the series’s own past values feed into today’s prediction
dDifferencing degreeHow many times to subtract consecutive values to achieve stationarity
qMA orderHow many past forecast errors feed into today’s prediction

An ARIMA(1, 1, 1) model says: difference the series once, then predict with yesterday’s differenced value plus yesterday’s error.

Stationarity and the role of d

A stationary series has a constant mean and variance over time. Trending or drifting series are not stationary, which breaks ordinary AR and MA fitting. Differencing once removes a linear trend; differencing twice removes a quadratic one. In practice d is almost always 0, 1, or 2. Choosing d too large is a common mistake — see the warning below.

The Box-Jenkins workflow

George Box and Gwilym Jenkins formalized a five-step loop that remains the standard approach for fitting ARIMA models:

Step 1 — Make the series stationary (choose d). Plot the series. If it trends, compute the first difference. Run an Augmented Dickey-Fuller (ADF) test or KPSS test. Repeat until the test indicates stationarity.

Step 2 — Identify p and q from ACF and PACF.

  • The autocorrelation function (ACF) measures correlation between the series and its own lags. An MA(q) model cuts off after lag q.
  • The partial autocorrelation function (PACF) removes the influence of intermediate lags. An AR(p) model cuts off after lag p.
  • In practice the patterns overlap, so treat ACF/PACF as a starting point, not a rule.

Step 3 — Fit the model. Pass (p, d, q) to your fitting routine and estimate the parameters by maximum likelihood.

Step 4 — Diagnose residuals. This is the step most beginners skip — and the most important. If your model has captured all signal, the residuals should look like white noise: zero mean, constant variance, no autocorrelation. Check:

  • A residual ACF plot: no spikes outside the confidence band.
  • The Ljung-Box test: a significant p-value means autocorrelation is still present and your model is underfit.

Step 5 — Forecast. Call .forecast(steps=h) for an h-step-ahead forecast. Uncertainty widens with horizon.

Model selection: AIC and BIC

When comparing candidate orders, use the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC). Both penalize models for complexity. Lower is better. BIC applies a heavier penalty for extra parameters, so it tends to favor sparser models.

A practical shortcut is pmdarima.auto_arima, which searches over a grid of (p, d, q) values, runs stationarity tests automatically, and returns the order with the lowest AIC. It is a useful starting point, but always inspect the residuals of whatever it returns.

The ARIMA pipeline — diagram

Raw seriesy₁, y₂, …Differenced timesFit AR(p)+ MA(q)on differenced seriesIntegrateback (∑ d)Forecast

The ARIMA pipeline: difference → AR+MA fit → integrate back → forecast (widening band = growing uncertainty).

Fitting ARIMA in Python

The code below shows the full Box-Jenkins pipeline using statsmodels. It runs locally — not in the browser — because statsmodels is not available in Pyodide.

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.stats.diagnostic import acorr_ljungbox

# --- Step 1: load and inspect ---
series = pd.read_csv("monthly_sales.csv", index_col="date", parse_dates=True)["sales"]

# ADF test for stationarity
adf_result = adfuller(series)
print(f"ADF p-value: {adf_result[1]:.4f}")   # p < 0.05 => stationary

# First difference if needed
diff1 = series.diff().dropna()
adf_diff = adfuller(diff1)
print(f"ADF p-value (diff=1): {adf_diff[1]:.4f}")

# --- Step 2: ACF / PACF plots to choose p and q ---
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(diff1, lags=20, ax=axes[0])
plot_pacf(diff1, lags=20, ax=axes[1])
plt.tight_layout()
plt.show()

# --- Step 3: fit ---
model = ARIMA(series, order=(2, 1, 1))   # p=2, d=1, q=1
result = model.fit()
print(result.summary())   # AIC, BIC, coefficient estimates

# --- Step 4: diagnose residuals ---
result.plot_diagnostics(figsize=(12, 8))
plt.show()

lb = acorr_ljungbox(result.resid, lags=[10], return_df=True)
print(lb)   # p-value >> 0.05 => residuals look like white noise

# --- Step 5: forecast ---
forecast = result.forecast(steps=12)
print(forecast)

conf_int = result.get_forecast(steps=12).conf_int()
print(conf_int)

Reading .summary()

The summary table includes:

  • coef — estimated AR and MA coefficients.
  • P value — whether each coefficient is statistically significant.
  • AIC / BIC — use these to compare competing orders; lower wins.
  • Ljung-Box (Q) — the null is no autocorrelation; a large p-value here is what you want.

Auto-selection with pmdarima

from pmdarima import auto_arima

auto_model = auto_arima(
    series,
    seasonal=False,
    information_criterion="aic",
    stepwise=True,
    trace=True,
)
print(auto_model.summary())

auto_arima runs ADF tests internally, tries many (p, d, q) combinations, and returns the lowest-AIC model. Always verify the residuals even when using auto-selection.

Seeing forecast extrapolation intuitively

The playground below uses only NumPy and Matplotlib. It fits a simple linear extrapolation on a synthetic trending series and plots the forecast as a naive intuition-builder — not a real ARIMA fit. The widening band represents how uncertainty grows with horizon.

Notice how the confidence band fans out as you go further into the future. A real ARIMA forecast behaves the same way: the further out you forecast, the less certain the model is.

Putting it all together

Here is the compact mental model to carry forward:

  • ARIMA(p, d, q): difference d times to get stationarity, then fit AR lags on the past values and MA lags on the past errors, then integrate the forecasts back to original scale.
  • Box-Jenkins in one sentence: stationarise → identify → fit → diagnose → forecast — and do not skip the diagnosis.
  • AIC/BIC rank models; residual ACF and Ljung-Box confirm the model is done.
  • Once you are comfortable with non-seasonal ARIMA, the natural next step is SARIMA, which adds seasonal AR and MA terms for periodic data.

Quick check

0/3
Q1In ARIMA(2, 1, 3), what does the middle parameter '1' specify?
Q2After fitting an ARIMA model you inspect the residual ACF and see several spikes well outside the confidence bands. What does this indicate?
Q3A colleague fits ARIMA(1,1,1) to monthly electricity demand and gets low AIC and clean residuals. They then apply the same order to daily humidity readings from a weather station. The residual ACF shows a large spike at lag 7. What is the most likely explanation and best next step?

Practice this in an interview

All questions
What is a VAR model, and when would you use it instead of a univariate ARIMA?

A Vector Autoregression (VAR) model extends ARIMA to multiple time series simultaneously: each variable is regressed on its own past values and the past values of all other variables in the system. Use VAR when the series have mutual predictive relationships (Granger-causality) and you want to model those interactions; ARIMA is sufficient when one series can be forecast in isolation.

What is the difference between ARIMA and SARIMA, and when do you use each?

ARIMA(p,d,q) models non-seasonal series by combining autoregression, differencing, and a moving average of errors. SARIMA extends it with a second set of seasonal parameters (P,D,Q,s) that operate at the seasonal lag s, handling periodic patterns that ARIMA alone cannot capture.

When would you choose Prophet over ARIMA for a forecasting problem?

Prophet is a curve-fitting model that decomposes the series into trend, seasonality, and holidays; it handles missing data, multiple seasonalities, and non-uniform time grids with minimal tuning and is accessible to non-statisticians. ARIMA is a statistical model based on autocorrelation structure; it is more appropriate when the series is short, noise is small, and you need principled uncertainty intervals from an explicit stochastic process.

How do you choose p, d, and q for an ARIMA model?

Choose d by differencing until the ADF test confirms stationarity; choose p from the PACF cutoff and q from the ACF cutoff on the differenced series; then confirm with AIC or BIC to guard against over-fitting. In practice, an automated grid search over a small range of candidates with information criteria is more reliable than visual inspection alone.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content