Why time series is different
Why you cannot treat time-stamped data like an ordinary tabular dataset — and what that forces you to do differently.
What you'll learn
- i.i.d. assumption: why ordinary ML relies on it and why time series breaks it
- Autocorrelation and ordering: the two structural facts that change everything
- Correct train/test splitting for time series: always keep the test set strictly in the future
Before you start
Ordinary ML assumes rows are i.i.d.
Every standard algorithm — logistic regression, random forests, gradient boosting, neural networks — is built on one quiet assumption: that your rows are i.i.d. (independent and identically distributed). Independent means knowing row 42 tells you nothing about row 43. Identically distributed means every row is drawn from the same underlying process.
When that assumption holds, you can shuffle your data, split it however you like, and a randomly chosen 20 % test set is a fair sample of the whole population. The ordering of rows is irrelevant noise.
Time series data breaks both parts of that assumption, always.
What makes time series structurally different
1. Observations are ordered
A row recorded on Monday is followed by Tuesday, then Wednesday. That order is not an accident — it is the data. Strip the order and you lose the thing you are trying to model.
2. Observations are correlated with their own past
Today’s sales depend on yesterday’s sales. Today’s temperature is closer to yesterday’s than to a random day six months ago. This self-correlation is called autocorrelation (the correlation of a series with a lagged copy of itself). It is not a nuisance to remove — it is the primary signal you are trying to exploit.
Because of autocorrelation, rows are not independent. Shuffle them and you destroy the very structure that makes prediction possible.
3. The data-generating process can drift
Related to autocorrelation is stationarity — whether the statistical properties (mean, variance) of the series change over time. You will explore this deeply in a later lesson. For now, just note that sales in December look nothing like sales in July; sensor readings drift as equipment ages. A random sample drawn from across the whole timeline may not represent the distribution your model will actually face at deployment.
The forecast horizon
When you build a time series model, you define a forecast horizon — how far ahead you want to predict. One day? One week? Three months? This matters because every evaluation protocol must respect it: the gap between the last training observation and the first test observation must be at least as large as the horizon you care about.
Typical domains where this matters:
- Retail demand — ordering inventory days or weeks ahead
- Energy prices — bidding in spot markets hours to days ahead
- IoT sensors — predictive maintenance before a failure occurs
- Web traffic — capacity planning for the next hour or day
In every case, at prediction time the future is genuinely unknown. Your evaluation must honour that.
The cardinal sin: shuffling time series data
The correct rule is simple: the test set must come strictly after the training set in calendar time. No exceptions.
Visualising correct vs wrong splits
The diagram below contrasts the correct forward-chaining split (top) with the wrong shuffled split (bottom).
Top: the test set is a clean future window. Bottom: shuffling scatters test rows across the full timeline, leaking future information into training.
Seeing structure that shuffling destroys
The code below synthesises a realistic-looking daily time series and then plots it alongside a version where the rows have been shuffled. The structure in the original — trend, rhythm, coherence — vanishes completely in the shuffled version. Any model trained on the shuffled version and tested on a random slice of it will absorb information from the future without knowing it.
In the top panel you will see an upward drift with a repeating weekly rhythm — the kind of pattern a forecasting model should learn. In the bottom panel that structure is gone: the series looks like pure noise, because it is — the temporal order has been destroyed. The model trained on the bottom panel would learn nothing meaningful about how sales actually evolve.
What to do instead
For time series you have two correct evaluation strategies:
- Simple holdout — train on everything up to date T, test on everything after T. Fast and interpretable.
- Walk-forward (rolling) validation — repeatedly slide the training window forward, always predicting one step ahead into an unseen future. More robust, especially for shorter series.
You will implement both in the dedicated lesson on time series cross-validation.
Key vocabulary in one place
| Term | One-sentence definition |
|---|---|
| i.i.d. | Rows are drawn independently from the same distribution — standard ML assumption |
| Autocorrelation | The correlation of a series with a past (lagged) version of itself |
| Forecast horizon | How many steps into the future you need to predict |
| Data leakage | Training data that contains information about the future, inflating apparent performance |
| Stationarity | Whether the series’ statistical properties stay constant over time (preview for next lesson) |
Quick check
Practice this in an interview
All questionsShuffling destroys temporal order, so the model trains on future data and is evaluated on the past — a direct information leak. Time series observations are serially correlated, meaning past values predict future ones, and any random split obliterates that structure entirely.
Standard k-fold randomly shuffles data, so a validation fold can contain timestamps earlier than the training fold — training on the future to predict the past. Time-series CV uses walk-forward (expanding-window or sliding-window) splits that always validate on data strictly after the training window.
Wide format stores multiple measurements as separate columns per subject; long (tidy) format stores one measurement per row with a variable-name column and a value column. Long format is required by most statistical and visualization libraries, makes adding new variables trivial, and is the standard expected by groupby and merge operations.
Batch pipelines process data in bounded chunks on a schedule — simple to build and test, but latency is measured in hours or days. Streaming pipelines process records continuously as they arrive — latency drops to seconds or milliseconds, but correctness requires handling late arrivals, watermarks, and stateful aggregations. Choose streaming when business decisions need fresh data; choose batch when daily freshness is acceptable and operational simplicity matters.