What is walk-forward validation, and why is it the correct cross-validation strategy for time series?
Walk-forward validation (also called time-series cross-validation or expanding-window CV) creates successive train/test folds where each fold's test set is always strictly in the future relative to its training set. It mimics real deployment — you fit on what you knew then and evaluate on what happened next — unlike random k-fold, which lets future data contaminate training.
How to think about it
Explain the two variants (expanding vs sliding window), show the code, and articulate why random k-fold is wrong. This question almost always appears at FAANG-level interviews.
Two variants
Expanding-window (recommended by default): training set grows with each fold; test set is the next fixed-size window. Mirrors real deployment where you retrain periodically on all available history.
Sliding-window (rolling): training set has a fixed maximum size; the oldest observations are dropped as new ones arrive. Useful when the process is non-stationary and recent data is more informative than old data.
Why random k-fold leaks
In a standard k-fold shuffle, fold 3’s training data may contain observations from 2024 while its validation data contains observations from 2022. The model has been trained on the future of its own validation period — any metric computed is optimistic and unreliable.
Code — TimeSeriesSplit
from sklearn.model_selection import TimeSeriesSplit
import numpy as np
tscv = TimeSeriesSplit(n_splits=5, test_size=30)
for fold, (train_idx, test_idx) in enumerate(tscv.split(X)):
X_train, X_test = X[train_idx], X[test_idx]
y_train, y_test = y[train_idx], y[test_idx]
model.fit(X_train, y_train)
preds = model.predict(X_test)
mae = np.mean(np.abs(preds - y_test))
print(f"Fold {fold+1} MAE: {mae:.4f}")
# test_idx always > all train_idx — no leakage
Choosing test_size
Set test_size to the forecast horizon you care about. If you deploy monthly forecasts, use a 30-day test window so each fold measures the exact task you’re solving in production.
Practical note on refitting
In production, retrain on the expanding window each time new data arrives. Skipping retraining (using a model frozen months ago) introduces distribution shift and typically degrades accuracy faster than the model’s inherent limitations.