Pandas & Data Wrangling Medium Asked at AmazonAsked at MicrosoftAsked at UberAsked at Airbnb

How do you parse, manipulate, and extract features from datetime columns in pandas?

For Data Analyst Data Scientist ML Engineer Data Engineer

The short answer

Convert string columns to datetime with pd.to_datetime(), then use the .dt accessor to extract components like year, month, day, and day of week, compute time deltas, and perform resampling. Setting a DatetimeIndex unlocks time-series-specific operations like resample, rolling, and time-aware interpolation.

How to think about it

Date-time data is almost always messy — inconsistent string formats, mixed timezones, gaps in the series. A strong answer shows the four moves: pd.to_datetime() to parse, the .dt accessor to extract features, timedelta arithmetic for durations, and resample to aggregate over time windows.

The flow: pandas reads CSVs as strings, so step one is always df["order_ts"] = pd.to_datetime(df["order_ts"]) (or parse_dates=[...] at read time). Once a column is datetime64, .dt exposes every component, subtraction yields a Timedelta, and setting a DatetimeIndex unlocks resample — the pandas answer to SQL’s DATE_TRUNC + GROUP BY.

A worked example — parse, extract, resample

import pandas as pd

df = pd.DataFrame({
    "order_ts": ["2024-03-15 08:30:00", "2024-03-16 14:00:00", "2024-03-22 09:15:00",
                 "2024-04-01 17:45:00", "2024-04-05 11:00:00"],
    "ship_ts":  ["2024-03-17 10:00:00", "2024-03-19 08:30:00", "2024-03-25 12:00:00",
                 "2024-04-04 09:00:00", "2024-04-06 14:30:00"],
    "revenue":  [120.0, 85.5, 200.0, 55.0, 310.0],
})
df["order_ts"] = pd.to_datetime(df["order_ts"])
df["ship_ts"]  = pd.to_datetime(df["ship_ts"])

df["month"]            = df["order_ts"].dt.month
df["day_of_week"]      = df["order_ts"].dt.day_name()
df["is_weekend"]       = df["order_ts"].dt.dayofweek >= 5
df["fulfillment_days"] = (df["ship_ts"] - df["order_ts"]).dt.days
print(df[["order_ts", "month", "day_of_week", "is_weekend", "fulfillment_days"]])

             order_ts  month day_of_week  is_weekend  fulfillment_days
0 2024-03-15 08:30:00      3      Friday       False                 2
1 2024-03-16 14:00:00      3    Saturday        True                 2
2 2024-03-22 09:15:00      3      Friday       False                 3
3 2024-04-01 17:45:00      4      Monday       False                 2
4 2024-04-05 11:00:00      4      Friday       False                 1

Every feature is derived from the one parsed column: day_name() reads off “Saturday” for row 1, dayofweek >= 5 flags it as the only weekend, and subtracting the two datetimes then taking .dt.days gives whole-day fulfillment times. Set the timestamp as the index and resample("W") buckets revenue into calendar weeks:

ts = df.set_index("order_ts")
print(ts["revenue"].resample("W").sum())

order_ts
2024-03-17    205.5
2024-03-24    200.0
2024-03-31      0.0
2024-04-07    365.0
Freq: W-SUN, Name: revenue, dtype: float64

The two mid-March orders collapse into the week ending Sunday 03-17 (120 + 85.5 = 205.5), and — crucially — the empty week of 03-31 appears as 0.0 rather than vanishing, because resample fills the gap in the date range. That’s the behaviour that keeps time-series charts honest. (For ML, add cyclical encodings — np.sin/np.cos of dt.hour / 24 — so hour 23 stays close to hour 0.)

Learn it properly DataFrame basics

How do you parse, manipulate, and extract features from datetime columns in pandas?

A worked example — parse, extract, resample

Keep practising

Explore further