datarekha

Autoregression (AR)

Learn how the AR(p) model explains tomorrow as a weighted blend of recent history, and why the PACF tells you exactly how many lags to include.

8 min read Intermediate Time Series Lesson 5 of 14

What you'll learn

  • AR(p) model: y_t as a linear combination of its own past p values plus white-noise error
  • How phi controls persistence and mean-reversion in the AR(1) special case
  • Why the PACF cuts off at lag p — the key diagnostic for choosing model order

Before you start

Why autoregression?

Many real-world time series carry momentum or persistence — a warm day tends to be followed by another warm day; a busy hour at a server often follows a busy hour. Autoregression (AR) is the workhorse model that captures exactly this structure. It forms the AR half of the famous ARIMA framework, so understanding it deeply pays dividends throughout forecasting work.

The core idea is disarmingly simple: treat the past values of the series itself as the predictors in an ordinary linear regression.

The AR(p) model

An AR(p) model says that the value at time t is a linear combination of its own p previous values plus a white-noise error term (a random shock with mean zero and constant variance):

y_t = c + phi_1·y_(t-1) + phi_2·y_(t-2) + ... + phi_p·y_(t-p) + e_t
  • c — a constant (the intercept, related to the series mean)
  • phi_1 … phi_p — the autoregressive coefficients (the weights given to each lagged value)
  • lag — a time-shifted copy of the series; y_(t-1) is the series lagged by one period
  • e_twhite noise: independent, identically distributed errors with mean zero
  • p — the order of the model, i.e. how many past values you look back

To fit an AR model you literally stack lagged columns of the series into a design matrix and run ordinary least squares. That is why the method is called autoregression — you regress the series on itself.

The AR(1) case in depth

The simplest non-trivial case, AR(1), uses only the immediately preceding value:

y_t = c + phi·y_(t-1) + e_t

The single coefficient phi controls everything interesting:

phi valueBehaviour
phi close to +1Strong persistence — the series drifts slowly and stays near recent values for a long time
phi close to 0Weak memory — the series looks almost like random noise around its mean
phi negativeMean reversion with oscillation — the series bounces above and below the mean each step

Stationarity requirement: for the series to be stationary (have a stable, finite mean and variance), you need |phi| < 1. When |phi| = 1 you have a random walk; when |phi| > 1 the series explodes.

The PACF connection

In the previous lesson you learned that the Partial Autocorrelation Function (PACF) measures the correlation between y_t and y_(t-k) after removing the influence of all shorter lags. For a true AR(p) process, the PACF has a clean sharp cutoff after lag p — all partial autocorrelations beyond lag p are zero (within sampling noise). This is the primary diagnostic for choosing p: plot the PACF, find where it drops into the confidence band, and that lag is your order.

The AR feedback loop

The diagram below shows AR(1) as a feedback loop: the current output y_t is fed back as input y_(t-1) on the next step, scaled by phi, and combined with fresh noise e_t.

y_(t‑1)y_t× φy_(t+1)× φe_(t‑1)e_te_(t+1)
AR(1) as a feedback loop. Each output y_t becomes the next input, scaled by phi, before fresh noise e_t is added.

Simulate AR(1): see persistence change

The playground simulates two AR(1) series side by side — one with phi = 0.9 (high persistence) and one with phi = 0.3 (quick mean-reversion) — so you can see the difference directly.

What you should observe:

  • The phi = 0.9 series drifts in long, sweeping runs — a positive excursion lasts many steps before the series returns to zero.
  • The phi = 0.3 series bounces back toward zero rapidly; consecutive values look nearly independent.
  • Both series are stationary because |phi| < 1 in both cases.

Try editing phi to 0.99 and observe how the series starts to look like a slow random walk. Then try a negative value such as -0.7 to see the oscillating mean-reversion pattern.

Fitting AR with statsmodels

In practice you rarely simulate — you fit AR to observed data. The statsmodels library provides AutoReg:

import pandas as pd
from statsmodels.tsa.ar_model import AutoReg

# Assume `series` is a pd.Series of your observed time series
model = AutoReg(series, lags=2)   # AR(2)
result = model.fit()

print(result.summary())           # shows phi_1, phi_2 and standard errors
print(result.params)              # constant, phi_1, phi_2

# One-step-ahead forecasts on the training period
fitted_values = result.fittedvalues

# Out-of-sample forecast for 5 steps ahead
forecast = result.forecast(steps=5)
print(forecast)

Choosing p in practice

  1. Plot the PACF of your (stationary) series. Count the lags before the bars drop inside the 95 % confidence band. That count is a good starting value for p.
  2. Information criteria (AIC, BIC) let you compare candidate orders rigorously — AutoReg reports both.
  3. Start small. AR(1) or AR(2) often captures most of the structure. Higher-order models may overfit.

Summary

  • An AR(p) model predicts y_t as a weighted sum of its last p values plus white noise: y_t = c + phi_1·y_(t-1) + ... + phi_p·y_(t-p) + e_t.
  • phi controls persistence: values near 1 mean long memory; values near 0 mean rapid mean-reversion.
  • Stationarity requires |phi| < 1 for AR(1) (and the analogous condition on all roots for higher orders).
  • The PACF cuts off sharply at lag p — use it to identify the order before fitting.
  • Next up: Moving Average (MA) models, which blend past errors rather than past values — the complementary half of ARIMA.

Quick check

0/3
Q1In an AR(1) model y_t = 0.85·y_(t-1) + e_t, you observe an unusually large spike at time t. How long would you expect the effect of that spike to persist compared with an AR(1) with phi = 0.2?
Q2You plot the PACF of a stationary series and see significant spikes only at lags 1 and 2, with all higher lags inside the confidence band. Which model order is most appropriate?
Q3A colleague fits an AR(1) model to monthly sales data and reports phi = 1.05. What is the key problem with this result?

Practice this in an interview

All questions
How do you read ACF and PACF plots, and what do they tell you about AR and MA orders?

The ACF measures correlation between a series and its own lags including indirect effects; the PACF strips out those indirect effects to show direct correlation at each lag. A cut-off in the PACF after lag p signals an AR(p) process; a cut-off in the ACF after lag q signals an MA(q) process.

What is a VAR model, and when would you use it instead of a univariate ARIMA?

A Vector Autoregression (VAR) model extends ARIMA to multiple time series simultaneously: each variable is regressed on its own past values and the past values of all other variables in the system. Use VAR when the series have mutual predictive relationships (Granger-causality) and you want to model those interactions; ARIMA is sufficient when one series can be forecast in isolation.

How do you choose p, d, and q for an ARIMA model?

Choose d by differencing until the ADF test confirms stationarity; choose p from the PACF cutoff and q from the ACF cutoff on the differenced series; then confirm with AIC or BIC to guard against over-fitting. In practice, an automated grid search over a small range of candidates with information criteria is more reliable than visual inspection alone.

What is the difference between ARIMA and SARIMA, and when do you use each?

ARIMA(p,d,q) models non-seasonal series by combining autoregression, differencing, and a moving average of errors. SARIMA extends it with a second set of seasonal parameters (P,D,Q,s) that operate at the seasonal lag s, handling periodic patterns that ARIMA alone cannot capture.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content