datarekha
Machine Learning Easy Asked at AmazonAsked at UberAsked at AirbnbAsked at Flipkart

How do you extract useful features from datetime columns for a machine learning model?

The short answer

Raw timestamps are meaningless to most models. Useful features extracted from a datetime column include calendar components (hour, day of week, month, quarter, year), cyclical encodings of periodic components (sin/cos of hour or day-of-week), lag and rolling-window aggregates, time-since-event features, and business-calendar flags like is_weekend or is_holiday.

How to think about it

A datetime column fed raw to a model carries no information; the model cannot learn that 2024-12-25 is Christmas unless you decode that structure into numeric or binary features.

Calendar decomposition

import pandas as pd

df["ts"] = pd.to_datetime(df["timestamp"])

df["hour"]        = df["ts"].dt.hour
df["day_of_week"] = df["ts"].dt.dayofweek   # 0 = Monday
df["day_of_month"]= df["ts"].dt.day
df["month"]       = df["ts"].dt.month
df["quarter"]     = df["ts"].dt.quarter
df["year"]        = df["ts"].dt.year
df["is_weekend"]  = df["day_of_week"].isin([5, 6]).astype(int)

Cyclical encoding

hour = 23 and hour = 0 are adjacent in time but 23 units apart as integers. Sine-cosine encoding wraps the feature onto a circle:

import numpy as np

df["hour_sin"] = np.sin(2 * np.pi * df["hour"] / 24)
df["hour_cos"] = np.cos(2 * np.pi * df["hour"] / 24)
df["dow_sin"]  = np.sin(2 * np.pi * df["day_of_week"] / 7)
df["dow_cos"]  = np.cos(2 * np.pi * df["day_of_week"] / 7)

Cyclical encoding is especially important for distance-based models and neural networks. Tree models can discover the periodicity on their own by splitting on the raw integer, but sin/cos speeds up learning.

Lag and rolling features

For time-series data, the most predictive features often come from the recent past:

df = df.sort_values("ts")
df["sales_lag_1d"]       = df["sales"].shift(1)
df["sales_rolling_7d"]   = df["sales"].rolling(7).mean()
df["sales_rolling_std7d"]= df["sales"].rolling(7).std()

Time-since-event features

reference_date = pd.Timestamp("2024-01-01")
df["days_since_signup"] = (df["ts"] - df["signup_date"]).dt.days
df["days_to_deadline"]  = (df["deadline"] - df["ts"]).dt.days

Holiday and business flags

Use a library like holidays (Python) to add country-specific holiday indicators rather than hard-coding dates:

import holidays
in_holidays = holidays.India(years=range(2020, 2026))
df["is_holiday"] = df["ts"].dt.date.apply(lambda d: int(d in in_holidays))

Keep practising

All Machine Learning questions

Explore further

Skip to content