datarekha
Pandas & Data Wrangling Medium Asked at GoogleAsked at UberAsked at AirbnbAsked at Netflix

When should you use apply, map, or applymap versus vectorized pandas operations, and what are the performance implications?

The short answer

Vectorized pandas and NumPy operations operate on entire arrays in compiled C/Fortran code and should always be your first choice. apply runs a Python function row- or column-wise in a Python loop, map transforms a single Series element-by-element, and applymap (DataFrame.map in pandas 2.1+) applies a function to every scalar — all three are orders of magnitude slower than vectorized equivalents.

How to think about it

What the interviewer is really probing

This question tests whether you understand where pandas spends its time. The trap is reaching for apply because it “feels like a loop you understand.” The better answer shows you can think in whole-array operations, then fall back to apply only when you genuinely need row-level Python logic.

The speed ladder — fastest to slowest

Think of it in tiers:

  1. Vectorized pandas / NumPy ufuncs — the entire column lives in a contiguous C array; arithmetic, .str, .dt, and np.* ufuncs all work at C speed with no Python overhead per row.
  2. Series.map with a dict — a hash lookup per element, but still no per-row Python function call.
  3. Series.map / apply with a Python lambda — Python interpreter overhead on every single element.
  4. apply(axis=1) — one Python function call per row, serialized. On a million-row DataFrame this is 100–500x slower than the vectorized equivalent.

Working through each tool

Vectorized operations should be your default for numeric work and string cleaning:

df["revenue"]   = df["price"] * df["quantity"]   # element-wise arithmetic
df["log_price"] = np.log(df["price"])             # NumPy ufunc
df["upper_cat"] = df["category"].str.upper()      # .str accessor

Series.map shines for lookup tables and single-column element-wise transforms:

size_map = {"S": 1, "M": 2, "L": 3}
df["size_code"] = df["size"].map(size_map)        # dict lookup, fast

apply(axis=1) is the last resort — use it only when the logic genuinely needs values from multiple columns AND cannot be expressed with np.where or np.select:

# Acceptable: complex multi-column conditional with many branches
df["tier"] = df.apply(
    lambda r: "premium" if r["price"] > 50 and r["vip"] else
              "standard" if r["price"] > 20 else "budget",
    axis=1,
)

# But for a simple binary case — replace with np.where:
df["tier"] = np.where(df["price"] * df["quantity"] > 30, "high", "low")

See it yourself — playground

Run the code below. The vectorized version computes revenue in one C-speed pass. The apply version loops through every row in Python. On a tiny 6-row frame the difference is invisible — that is exactly why beginners over-use apply and only notice the cost at scale.

The key insight

pandas stores each column as a NumPy array in contiguous memory. Vectorized operations hand the whole array to compiled C/Fortran code in one call. apply(axis=1) extracts each row as a Python object, calls your function, and stores the result — per row. The Python interpreter overhead alone is the bottleneck.

A good rule of thumb: if you can express the logic as arithmetic, a comparison, or an accessor (.str, .dt, .cat), do that. If you need a lookup, use .map(dict). Only reach for apply when you need multi-column Python logic with no clean vectorized equivalent.

Learn it properly Selection: loc vs iloc

Keep practising

All Pandas & Data Wrangling questions

Explore further

Skip to content