What is a VAR model, and when would you use it instead of a univariate ARIMA?
A Vector Autoregression (VAR) model extends ARIMA to multiple time series simultaneously: each variable is regressed on its own past values and the past values of all other variables in the system. Use VAR when the series have mutual predictive relationships (Granger-causality) and you want to model those interactions; ARIMA is sufficient when one series can be forecast in isolation.
How to think about it
Explain the structure of the model, the stationarity requirement, lag selection, and Granger causality. Interviewers at quant or macro-forecasting shops expect you to know the full workflow.
The VAR(p) model
For a k-variable system y_t = [y1_t, y2_t, …, yk_t], each variable is modelled as:
y_t = c + A1 y_(t-1) + A2 y_(t-2) + … + Ap y_(t-p) + ε_t
where each Ai is a k × k coefficient matrix. With k=2 variables and p=1 lags, you have 4 slope coefficients plus 2 intercepts — the parameter count grows as k² × p, so large systems require long series.
Requirements
- All series must be stationary. Test each with ADF; difference as needed. If the original levels are cointegrated, use a Vector Error Correction Model (VECM) instead.
- Select lag order p using AIC, BIC, or HQIC computed on the VAR.
- Check stability: all eigenvalues of the companion matrix must lie inside the unit circle.
Code
from statsmodels.tsa.api import VAR
import pandas as pd
df = pd.read_csv("macro.csv", parse_dates=["date"], index_col="date")
# Assume df has columns: gdp_growth, inflation, unemployment
# All three must already be stationary (differenced if needed)
model = VAR(df)
results = model.fit(maxlags=8, ic="aic") # ic selects optimal lag
print(results.summary())
# Forecast 4 steps ahead
lag_order = results.k_ar
forecast_input = df.values[-lag_order:]
forecast = results.forecast(forecast_input, steps=4)
Granger causality
Before fitting, test whether x Granger-causes y — does adding x’s lags significantly improve the forecast of y?
from statsmodels.tsa.stattools import grangercausalitytests
grangercausalitytests(df[["gdp_growth", "inflation"]], maxlag=4)
# p < 0.05 at some lag → inflation Granger-causes gdp_growth
A significant result justifies including that variable; a non-significant result suggests ARIMA on y alone may be equally good and much simpler.
VAR vs ARIMA decision
| Factor | VAR | ARIMA |
|---|---|---|
| Multiple series with cross-effects | Yes | No |
| Single series | Overkill | Yes |
| Series length | Needs long data (k²×p params) | Works with shorter series |
| Interpretability of impulse responses | Native support | N/A |