Pandas & Data Wrangling Medium Asked at GoogleAsked at UberAsked at AirbnbAsked at Netflix

When should you use apply, map, or applymap versus vectorized pandas operations, and what are the performance implications?

For Data Scientist ML Engineer Data Engineer

The short answer

Vectorized pandas and NumPy operations operate on entire arrays in compiled C/Fortran code and should always be your first choice. apply runs a Python function row- or column-wise in a Python loop, map transforms a single Series element-by-element, and applymap (DataFrame.map in pandas 2.1+) applies a function to every scalar — all three are orders of magnitude slower than vectorized equivalents.

How to think about it

This question tests whether you understand where pandas spends its time. The trap is reaching for apply because it “feels like a loop you understand.” The stronger answer thinks in whole-array operations first, and falls back to apply only when the logic genuinely needs row-level Python.

There’s a speed ladder, fastest to slowest:

Vectorized pandas / NumPy ufuncs — the whole column is a contiguous C array; arithmetic, .str, .dt, and np.* run at C speed with zero per-row Python.
Series.map with a dict — a hash lookup per element, still no per-row function call.
map/apply with a Python lambda — interpreter overhead on every element.
apply(axis=1) — one Python call per row, serialized. On a million rows that’s 100–500× slower than the vectorized form.

A worked example

Vectorized multiplication and apply(axis=1) produce the identical column — the difference is entirely speed, which is exactly why the cost hides on small data:

import pandas as pd
import numpy as np

df = pd.DataFrame({"price": [10.5, 22.0, 8.75, 15.0, 45.0, 3.99],
                   "quantity": [3, 1, 5, 2, 10, 8]})

df["revenue_vec"]   = df["price"] * df["quantity"]                       # C-speed, one pass
df["revenue_apply"] = df.apply(lambda r: r["price"] * r["quantity"], axis=1)  # Python loop
print(df)

   price  quantity  revenue_vec  revenue_apply
0  10.50         3        31.50          31.50
1  22.00         1        22.00          22.00
2   8.75         5        43.75          43.75
3  15.00         2        30.00          30.00
4  45.00        10       450.00         450.00
5   3.99         8        31.92          31.92

revenue_vec and revenue_apply match to the cent — on six rows the timing gap is invisible, so beginners over-use apply and only feel it at a million rows. For a binary flag, stay vectorized with np.where:

df["big_order"] = np.where(df["revenue_vec"] > 50, "yes", "no")
print(df[["price", "quantity", "revenue_vec", "big_order"]])

   price  quantity  revenue_vec big_order
0  10.50         3        31.50        no
1  22.00         1        22.00        no
2   8.75         5        43.75        no
3  15.00         2        30.00        no
4  45.00        10       450.00       yes
5   3.99         8        31.92        no

Only the 450.00 order clears the threshold — computed across the whole array in one call, no loop. The rule of thumb: arithmetic, comparisons, and .str/.dt/.cat accessors first; .map(dict) for lookups; apply only for multi-column Python logic with no clean vectorized form.

Learn it properly Selection: loc vs iloc

When should you use apply, map, or applymap versus vectorized pandas operations, and what are the performance implications?

A worked example

Keep practising

Explore further