How do you parse, manipulate, and extract features from datetime columns in pandas?
Convert string columns to datetime with pd.to_datetime(), then use the .dt accessor to extract components like year, month, day, and day of week, compute time deltas, and perform resampling. Setting a DatetimeIndex unlocks time-series-specific operations like resample, rolling, and time-aware interpolation.
How to think about it
What the interviewer wants to see
Date and time data is almost always messy — strings in inconsistent formats, mixed timezones, gaps in time series. A strong answer demonstrates that you know pd.to_datetime() for parsing, the .dt accessor for feature extraction, timedelta arithmetic for durations, and resampling for aggregation over time windows.
Step 1 — Parse strings to datetime
pandas reads everything from CSV as strings. The first step is always converting to datetime64:
df["order_ts"] = pd.to_datetime(df["order_ts"])
# Or at read time (more efficient — avoids double pass):
df = pd.read_csv("file.csv", parse_dates=["order_ts", "ship_ts"])
Always specify format= in production to avoid ambiguous parses (MM/DD vs DD/MM):
df["order_ts"] = pd.to_datetime(df["order_ts"], format="%Y-%m-%d %H:%M:%S",
errors="coerce") # bad rows become NaT
Step 2 — Extract features with the .dt accessor
Once a column is datetime64, the .dt accessor exposes all components:
df["year"] = df["order_ts"].dt.year
df["month"] = df["order_ts"].dt.month
df["day_of_week"] = df["order_ts"].dt.day_name() # "Monday", "Tuesday", …
df["hour"] = df["order_ts"].dt.hour
df["quarter"] = df["order_ts"].dt.quarter
df["is_weekend"] = df["order_ts"].dt.dayofweek >= 5
Step 3 — Time deltas and durations
Subtracting two datetime columns produces a Timedelta. Use .dt.days to get a plain integer:
df["fulfillment_days"] = (df["ship_ts"] - df["order_ts"]).dt.days
Step 4 — Resampling (requires DatetimeIndex)
Set a datetime column as the index and you unlock resample — the pandas equivalent of SQL’s DATE_TRUNC + GROUP BY:
ts = df.set_index("order_ts")
ts.resample("W").size() # weekly order count
ts.resample("ME").agg({"revenue": "sum"}) # monthly revenue
ts["fulfillment_days"].rolling("7D").mean() # 7-day rolling average
Playground — parse, extract, compute
For ML feature engineering
Standard datetime features to consider: hour, day_of_week, is_weekend, month, quarter. For periodicity-aware models, add cyclical encodings:
import numpy as np
df["hour_sin"] = np.sin(2 * np.pi * df["order_ts"].dt.hour / 24)
df["hour_cos"] = np.cos(2 * np.pi * df["order_ts"].dt.hour / 24)
This preserves the circular relationship (hour 23 is close to hour 0) that raw integer encoding loses.