Time Series Medium Asked at AmazonAsked at JPMorganAsked at Microsoft

How do you choose p, d, and q for an ARIMA model?

For Data Scientist ML Engineer Data Analyst

The short answer

Choose d by differencing until the ADF test confirms stationarity; choose p from the PACF cutoff and q from the ACF cutoff on the differenced series; then confirm with AIC or BIC to guard against over-fitting. In practice, an automated grid search over a small range of candidates with information criteria is more reliable than visual inspection alone.

How to think about it

Walk through the three steps in order: d first (stationarity), then p and q (plots + AIC). Interviewers want a principled workflow, not “I just try a few values.”

Step 1 — choose d (differencing order)

Run the ADF test on the raw series. If p-value > 0.05, apply first-order differencing and retest. Repeat until the series is stationary. Most economic and business series need d=0 or d=1; d=2 is rare and risks over-differencing.

from statsmodels.tsa.stattools import adfuller

def find_d(series, max_d=2):
    for d in range(max_d + 1):
        p_val = adfuller(series.dropna())[1]
        if p_val < 0.05:
            return d, series
        series = series.diff()
    return max_d, series

Step 2 — choose p and q (visual heuristic)

On the stationary (differenced) series:

Plot the PACF: the lag where spikes first fall inside the confidence band gives a candidate p.
Plot the ACF: the lag where spikes first fall inside the confidence band gives a candidate q.

These are starting candidates, not hard answers.

Step 3 — confirm with AIC/BIC

Fit models over a small grid (e.g., p, q ∈ {0,1,2}) and pick the one with the lowest AIC (prefers fit) or BIC (penalises complexity more).

import itertools
from statsmodels.tsa.arima.model import ARIMA
import warnings

best_aic, best_order = float("inf"), None
for p, q in itertools.product(range(3), repeat=2):
    try:
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            aic = ARIMA(train, order=(p, d, q)).fit().aic
        if aic < best_aic:
            best_aic, best_order = aic, (p, d, q)
    except Exception:
        pass
print("Best order:", best_order, "AIC:", round(best_aic, 2))

Common pitfalls

Mistake	Effect
Choosing d without testing	Under- or over-differencing; spurious or noisy series
Reading ACF/PACF on the raw (non-stationary) series	Misleading plots — all lags appear correlated
Using only AIC, ignoring residual diagnostics	Model fits history but residuals are still autocorrelated

Learn it properly ARIMA

How do you choose p, d, and q for an ARIMA model?

Step 1 — choose d (differencing order)

Step 2 — choose p and q (visual heuristic)

Step 3 — confirm with AIC/BIC

Common pitfalls

Keep practising

Explore further