Why can't you use standard k-fold cross-validation on time-series data, and what should you use instead?
Standard k-fold randomly shuffles data, so a validation fold can contain timestamps earlier than the training fold — training on the future to predict the past. Time-series CV uses walk-forward (expanding-window or sliding-window) splits that always validate on data strictly after the training window.
How to think about it
Time series data has a causal ordering: information at time t cannot be known at time t-k for k > 0. Standard k-fold ignores this and freely mixes past and future across train and validation folds, leaking future information into training.
Walk-forward (expanding window) CV — the canonical fix. Each split uses all data up to time T for training and evaluates on [T, T+h]. The training window grows with each split.
Sliding-window CV — training window has a fixed width and slides forward. Useful when older data is stale (concept drift).
from sklearn.model_selection import TimeSeriesSplit
tscv = TimeSeriesSplit(n_splits=5, gap=0)
for train_idx, val_idx in tscv.split(X):
X_tr, X_val = X[train_idx], X[val_idx]
y_tr, y_val = y[train_idx], y[val_idx]
# fit, evaluate ...
gap is a critical parameter: set it to the forecast horizon so training never includes data from periods that overlap with the validation window (e.g., if you predict 7 days ahead, set gap=7).
Additional concerns for time-series CV:
- Feature engineering that uses rolling statistics must be re-computed inside each fold with only training-window data visible.
- Seasonal patterns can make earlier folds systematically easier or harder; report per-fold scores, not just the mean.