datarekha
Time Series Medium Asked at AmazonAsked at JPMorganAsked at Microsoft

How do you choose p, d, and q for an ARIMA model?

The short answer

Choose d by differencing until the ADF test confirms stationarity; choose p from the PACF cutoff and q from the ACF cutoff on the differenced series; then confirm with AIC or BIC to guard against over-fitting. In practice, an automated grid search over a small range of candidates with information criteria is more reliable than visual inspection alone.

How to think about it

Walk through the three steps in order: d first (stationarity), then p and q (plots + AIC). Interviewers want a principled workflow, not “I just try a few values.”

Step 1 — choose d (differencing order)

Run the ADF test on the raw series. If p-value > 0.05, apply first-order differencing and retest. Repeat until the series is stationary. Most economic and business series need d=0 or d=1; d=2 is rare and risks over-differencing.

from statsmodels.tsa.stattools import adfuller

def find_d(series, max_d=2):
    for d in range(max_d + 1):
        p_val = adfuller(series.dropna())[1]
        if p_val < 0.05:
            return d, series
        series = series.diff()
    return max_d, series

Step 2 — choose p and q (visual heuristic)

On the stationary (differenced) series:

  • Plot the PACF: the lag where spikes first fall inside the confidence band gives a candidate p.
  • Plot the ACF: the lag where spikes first fall inside the confidence band gives a candidate q.

These are starting candidates, not hard answers.

Step 3 — confirm with AIC/BIC

Fit models over a small grid (e.g., p, q ∈ {0,1,2}) and pick the one with the lowest AIC (prefers fit) or BIC (penalises complexity more).

import itertools
from statsmodels.tsa.arima.model import ARIMA
import warnings

best_aic, best_order = float("inf"), None
for p, q in itertools.product(range(3), repeat=2):
    try:
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            aic = ARIMA(train, order=(p, d, q)).fit().aic
        if aic < best_aic:
            best_aic, best_order = aic, (p, d, q)
    except Exception:
        pass
print("Best order:", best_order, "AIC:", round(best_aic, 2))

Common pitfalls

MistakeEffect
Choosing d without testingUnder- or over-differencing; spurious or noisy series
Reading ACF/PACF on the raw (non-stationary) seriesMisleading plots — all lags appear correlated
Using only AIC, ignoring residual diagnosticsModel fits history but residuals are still autocorrelated
Learn it properly ARIMA

Keep practising

All Time Series questions

Explore further

Skip to content