How do you extract useful features from datetime columns for a machine learning model?
Raw timestamps are meaningless to most models. Useful features extracted from a datetime column include calendar components (hour, day of week, month, quarter, year), cyclical encodings of periodic components (sin/cos of hour or day-of-week), lag and rolling-window aggregates, time-since-event features, and business-calendar flags like is_weekend or is_holiday.
How to think about it
A datetime column fed raw to a model carries no information; the model cannot learn that 2024-12-25 is Christmas unless you decode that structure into numeric or binary features.
Calendar decomposition
import pandas as pd
df["ts"] = pd.to_datetime(df["timestamp"])
df["hour"] = df["ts"].dt.hour
df["day_of_week"] = df["ts"].dt.dayofweek # 0 = Monday
df["day_of_month"]= df["ts"].dt.day
df["month"] = df["ts"].dt.month
df["quarter"] = df["ts"].dt.quarter
df["year"] = df["ts"].dt.year
df["is_weekend"] = df["day_of_week"].isin([5, 6]).astype(int)
Cyclical encoding
hour = 23 and hour = 0 are adjacent in time but 23 units apart as integers. Sine-cosine encoding wraps the feature onto a circle:
import numpy as np
df["hour_sin"] = np.sin(2 * np.pi * df["hour"] / 24)
df["hour_cos"] = np.cos(2 * np.pi * df["hour"] / 24)
df["dow_sin"] = np.sin(2 * np.pi * df["day_of_week"] / 7)
df["dow_cos"] = np.cos(2 * np.pi * df["day_of_week"] / 7)
Cyclical encoding is especially important for distance-based models and neural networks. Tree models can discover the periodicity on their own by splitting on the raw integer, but sin/cos speeds up learning.
Lag and rolling features
For time-series data, the most predictive features often come from the recent past:
df = df.sort_values("ts")
df["sales_lag_1d"] = df["sales"].shift(1)
df["sales_rolling_7d"] = df["sales"].rolling(7).mean()
df["sales_rolling_std7d"]= df["sales"].rolling(7).std()
Time-since-event features
reference_date = pd.Timestamp("2024-01-01")
df["days_since_signup"] = (df["ts"] - df["signup_date"]).dt.days
df["days_to_deadline"] = (df["deadline"] - df["ts"]).dt.days
Holiday and business flags
Use a library like holidays (Python) to add country-specific holiday indicators rather than hard-coding dates:
import holidays
in_holidays = holidays.India(years=range(2020, 2026))
df["is_holiday"] = df["ts"].dt.date.apply(lambda d: int(d in in_holidays))