datarekha
Pandas & Data Wrangling Medium Asked at AmazonAsked at MicrosoftAsked at UberAsked at Airbnb

How do you parse, manipulate, and extract features from datetime columns in pandas?

The short answer

Convert string columns to datetime with pd.to_datetime(), then use the .dt accessor to extract components like year, month, day, and day of week, compute time deltas, and perform resampling. Setting a DatetimeIndex unlocks time-series-specific operations like resample, rolling, and time-aware interpolation.

How to think about it

What the interviewer wants to see

Date and time data is almost always messy — strings in inconsistent formats, mixed timezones, gaps in time series. A strong answer demonstrates that you know pd.to_datetime() for parsing, the .dt accessor for feature extraction, timedelta arithmetic for durations, and resampling for aggregation over time windows.

Step 1 — Parse strings to datetime

pandas reads everything from CSV as strings. The first step is always converting to datetime64:

df["order_ts"] = pd.to_datetime(df["order_ts"])
# Or at read time (more efficient — avoids double pass):
df = pd.read_csv("file.csv", parse_dates=["order_ts", "ship_ts"])

Always specify format= in production to avoid ambiguous parses (MM/DD vs DD/MM):

df["order_ts"] = pd.to_datetime(df["order_ts"], format="%Y-%m-%d %H:%M:%S",
                                 errors="coerce")  # bad rows become NaT

Step 2 — Extract features with the .dt accessor

Once a column is datetime64, the .dt accessor exposes all components:

df["year"]        = df["order_ts"].dt.year
df["month"]       = df["order_ts"].dt.month
df["day_of_week"] = df["order_ts"].dt.day_name()   # "Monday", "Tuesday", …
df["hour"]        = df["order_ts"].dt.hour
df["quarter"]     = df["order_ts"].dt.quarter
df["is_weekend"]  = df["order_ts"].dt.dayofweek >= 5

Step 3 — Time deltas and durations

Subtracting two datetime columns produces a Timedelta. Use .dt.days to get a plain integer:

df["fulfillment_days"] = (df["ship_ts"] - df["order_ts"]).dt.days

Step 4 — Resampling (requires DatetimeIndex)

Set a datetime column as the index and you unlock resample — the pandas equivalent of SQL’s DATE_TRUNC + GROUP BY:

ts = df.set_index("order_ts")
ts.resample("W").size()                          # weekly order count
ts.resample("ME").agg({"revenue": "sum"})        # monthly revenue
ts["fulfillment_days"].rolling("7D").mean()       # 7-day rolling average

Playground — parse, extract, compute

For ML feature engineering

Standard datetime features to consider: hour, day_of_week, is_weekend, month, quarter. For periodicity-aware models, add cyclical encodings:

import numpy as np
df["hour_sin"] = np.sin(2 * np.pi * df["order_ts"].dt.hour / 24)
df["hour_cos"] = np.cos(2 * np.pi * df["order_ts"].dt.hour / 24)

This preserves the circular relationship (hour 23 is close to hour 0) that raw integer encoding loses.

Learn it properly DataFrame basics

Keep practising

All Pandas & Data Wrangling questions

Explore further

Skip to content