datarekha

Why time series is different

Why you cannot treat time-stamped data like an ordinary tabular dataset — and what that forces you to do differently.

7 min read Beginner Time Series Lesson 1 of 14

What you'll learn

  • i.i.d. assumption: why ordinary ML relies on it and why time series breaks it
  • Autocorrelation and ordering: the two structural facts that change everything
  • Correct train/test splitting for time series: always keep the test set strictly in the future

Before you start

Ordinary ML assumes rows are i.i.d.

Every standard algorithm — logistic regression, random forests, gradient boosting, neural networks — is built on one quiet assumption: that your rows are i.i.d. (independent and identically distributed). Independent means knowing row 42 tells you nothing about row 43. Identically distributed means every row is drawn from the same underlying process.

When that assumption holds, you can shuffle your data, split it however you like, and a randomly chosen 20 % test set is a fair sample of the whole population. The ordering of rows is irrelevant noise.

Time series data breaks both parts of that assumption, always.

What makes time series structurally different

1. Observations are ordered

A row recorded on Monday is followed by Tuesday, then Wednesday. That order is not an accident — it is the data. Strip the order and you lose the thing you are trying to model.

2. Observations are correlated with their own past

Today’s sales depend on yesterday’s sales. Today’s temperature is closer to yesterday’s than to a random day six months ago. This self-correlation is called autocorrelation (the correlation of a series with a lagged copy of itself). It is not a nuisance to remove — it is the primary signal you are trying to exploit.

Because of autocorrelation, rows are not independent. Shuffle them and you destroy the very structure that makes prediction possible.

3. The data-generating process can drift

Related to autocorrelation is stationarity — whether the statistical properties (mean, variance) of the series change over time. You will explore this deeply in a later lesson. For now, just note that sales in December look nothing like sales in July; sensor readings drift as equipment ages. A random sample drawn from across the whole timeline may not represent the distribution your model will actually face at deployment.

The forecast horizon

When you build a time series model, you define a forecast horizon — how far ahead you want to predict. One day? One week? Three months? This matters because every evaluation protocol must respect it: the gap between the last training observation and the first test observation must be at least as large as the horizon you care about.

Typical domains where this matters:

  • Retail demand — ordering inventory days or weeks ahead
  • Energy prices — bidding in spot markets hours to days ahead
  • IoT sensors — predictive maintenance before a failure occurs
  • Web traffic — capacity planning for the next hour or day

In every case, at prediction time the future is genuinely unknown. Your evaluation must honour that.

The cardinal sin: shuffling time series data

The correct rule is simple: the test set must come strictly after the training set in calendar time. No exceptions.

Visualising correct vs wrong splits

The diagram below contrasts the correct forward-chaining split (top) with the wrong shuffled split (bottom).

CORRECT → forward-chaining splitTrain (Jan → Sep)Test (Oct → Dec)cutoffWRONG → shuffled / random K-fold splitTrain rowsTest rows (scattered across all time)Future rows leak into train → scores look great, production fails.

Top: the test set is a clean future window. Bottom: shuffling scatters test rows across the full timeline, leaking future information into training.

Seeing structure that shuffling destroys

The code below synthesises a realistic-looking daily time series and then plots it alongside a version where the rows have been shuffled. The structure in the original — trend, rhythm, coherence — vanishes completely in the shuffled version. Any model trained on the shuffled version and tested on a random slice of it will absorb information from the future without knowing it.

In the top panel you will see an upward drift with a repeating weekly rhythm — the kind of pattern a forecasting model should learn. In the bottom panel that structure is gone: the series looks like pure noise, because it is — the temporal order has been destroyed. The model trained on the bottom panel would learn nothing meaningful about how sales actually evolve.

What to do instead

For time series you have two correct evaluation strategies:

  1. Simple holdout — train on everything up to date T, test on everything after T. Fast and interpretable.
  2. Walk-forward (rolling) validation — repeatedly slide the training window forward, always predicting one step ahead into an unseen future. More robust, especially for shorter series.

You will implement both in the dedicated lesson on time series cross-validation.

Key vocabulary in one place

TermOne-sentence definition
i.i.d.Rows are drawn independently from the same distribution — standard ML assumption
AutocorrelationThe correlation of a series with a past (lagged) version of itself
Forecast horizonHow many steps into the future you need to predict
Data leakageTraining data that contains information about the future, inflating apparent performance
StationarityWhether the series’ statistical properties stay constant over time (preview for next lesson)

Quick check

0/3
Q1You have 3 years of hourly electricity prices. A colleague does a stratified 5-fold cross-validation, shuffling rows before splitting. What is the most serious problem?
Q2Which of the following is the clearest sign that a dataset has strong autocorrelation?
Q3Your manager wants daily revenue predictions for the next 30 days. You train on 2 years of history ending on 1 June. Which test set is correct?

Practice this in an interview

All questions
Why can't you shuffle a time series before splitting into train and test sets?

Shuffling destroys temporal order, so the model trains on future data and is evaluated on the past — a direct information leak. Time series observations are serially correlated, meaning past values predict future ones, and any random split obliterates that structure entirely.

Why can't you use standard k-fold cross-validation on time-series data, and what should you use instead?

Standard k-fold randomly shuffles data, so a validation fold can contain timestamps earlier than the training fold — training on the future to predict the past. Time-series CV uses walk-forward (expanding-window or sliding-window) splits that always validate on data strictly after the training window.

What is the difference between wide and long (tidy) data formats, and why does it matter for analysis?

Wide format stores multiple measurements as separate columns per subject; long (tidy) format stores one measurement per row with a variable-name column and a value column. Long format is required by most statistical and visualization libraries, makes adding new variables trivial, and is the standard expected by groupby and merge operations.

What is the difference between batch and streaming data pipelines, and how do you choose between them?

Batch pipelines process data in bounded chunks on a schedule — simple to build and test, but latency is measured in hours or days. Streaming pipelines process records continuously as they arrive — latency drops to seconds or milliseconds, but correctness requires handling late arrivals, watermarks, and stateful aggregations. Choose streaming when business decisions need fresh data; choose batch when daily freshness is acceptable and operational simplicity matters.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content