Time Series Medium Asked at AmazonAsked at AirbnbAsked at UberAsked at Meta

What is walk-forward validation, and why is it the correct cross-validation strategy for time series?

For Data Scientist ML Engineer Data Analyst AI / LLM Engineer

The short answer

Walk-forward validation (also called time-series cross-validation or expanding-window CV) creates successive train/test folds where each fold's test set is always strictly in the future relative to its training set. It mimics real deployment — you fit on what you knew then and evaluate on what happened next — unlike random k-fold, which lets future data contaminate training.

How to think about it

Explain the two variants (expanding vs sliding window), show the code, and articulate why random k-fold is wrong. This question almost always appears at FAANG-level interviews.

Two variants

Expanding-window (recommended by default): training set grows with each fold; test set is the next fixed-size window. Mirrors real deployment where you retrain periodically on all available history.

Sliding-window (rolling): training set has a fixed maximum size; the oldest observations are dropped as new ones arrive. Useful when the process is non-stationary and recent data is more informative than old data.

Why random k-fold leaks

In a standard k-fold shuffle, fold 3’s training data may contain observations from 2024 while its validation data contains observations from 2022. The model has been trained on the future of its own validation period — any metric computed is optimistic and unreliable.

Code — TimeSeriesSplit

from sklearn.model_selection import TimeSeriesSplit
import numpy as np

tscv = TimeSeriesSplit(n_splits=5, test_size=30)

for fold, (train_idx, test_idx) in enumerate(tscv.split(X)):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]

    model.fit(X_train, y_train)
    preds = model.predict(X_test)

    mae = np.mean(np.abs(preds - y_test))
    print(f"Fold {fold+1} MAE: {mae:.4f}")
    # test_idx always > all train_idx — no leakage

Choosing test_size

Set test_size to the forecast horizon you care about. If you deploy monthly forecasts, use a 30-day test window so each fold measures the exact task you’re solving in production.

Practical note on refitting

In production, retrain on the expanding window each time new data arrives. Skipping retraining (using a model frozen months ago) introduces distribution shift and typically degrades accuracy faster than the model’s inherent limitations.

Learn it properly Why time series is different