Pandas & Data Wrangling Hard Asked at MetaAsked at UberAsked at DatabricksAsked at Snowflake

Why is pandas slow, and what are the main strategies to speed it up?

For Data Scientist ML Engineer Data Engineer

The short answer

pandas is slow primarily because Python loops bypass NumPy's vectorized C kernels, object-dtype columns prevent SIMD optimizations, and keeping entire datasets in memory limits scalability. The fixes are vectorization, categorical encoding, eval/query for large frames, chunking for out-of-core data, and switching to Polars or DuckDB for compute-heavy pipelines.

How to think about it

When an interviewer asks why pandas is slow, they are not fishing for “use vectorization.” They want to see whether you can separate two different problems. One is correctness of style: code that looks like pandas but is still doing Python-level work row by row. The other is scalability: even flawless pandas is single-threaded and lives entirely in memory. The first you fix by rewriting; the second you fix by changing tools. Naming both tells the interviewer you understand the quick win and the architectural ceiling above it.

The root cause: Python overhead on every row

NumPy is fast because it processes a whole array down in C, never returning to the Python interpreter between elements. The moment you loop over rows — a for loop, iterrows(), or an apply whose lambda calls Python functions — you pay interpreter overhead for every single cell. On a million-row frame that is a million round-trips through Python. Vectorizing collapses all of them into one C-level pass.

Here the same computation is written both ways. The slow version walks row by row; the fast version multiplies three columns as whole arrays.

import pandas as pd

df = pd.DataFrame({
    "price":    [10.5, 22.0, 5.75, 99.0, 45.5] * 2000,
    "quantity": [1, 3, 2, 1, 4] * 2000,
    "discount": [0.0, 0.1, 0.0, 0.2, 0.05] * 2000,
})

# Slow: apply row-by-row — Python overhead per row
df["total_slow"] = df.apply(
    lambda row: row["price"] * row["quantity"] * (1 - row["discount"]),
    axis=1,
)

# Fast: vectorized — one C kernel over the whole column, no Python per row
df["total_fast"] = df["price"] * df["quantity"] * (1 - df["discount"])

print("Identical results:", df["total_slow"].equals(df["total_fast"]))
print()
print(df[["price", "quantity", "discount", "total_fast"]].head(4))

Identical results: True

   price  quantity  discount  total_fast
0  10.50         1       0.0        10.5
1  22.00         3       0.1        59.4
2   5.75         2       0.0        11.5
3  99.00         1       0.2        79.2

Both columns are byte-for-byte identical, so the only thing the apply bought you was a far slower walk through the interpreter. On a frame this size the vectorized line runs orders of magnitude faster, and the gap only widens as rows grow — same answer, a fraction of the time.

Categorical dtype for low-cardinality strings

String columns default to object dtype, where every value is an independent Python string object scattered across the heap. Comparisons, groupby, and boolean masks all have to hash full strings. Converting a low-cardinality column to category replaces those strings with small integer codes plus a tiny lookup table, so comparisons become integer comparisons and the memory footprint collapses.

import pandas as pd

df = pd.DataFrame({
    "city":  ["Mumbai", "Delhi", "Mumbai", "Chennai", "Delhi"] * 100_000,
    "sales": [100, 200, 150, 80, 300] * 100_000,
})

before = df["city"].memory_usage(deep=True)
df["city"] = df["city"].astype("category")
after = df["city"].memory_usage(deep=True)

print(f"object dtype memory:   {before:>10,} bytes")
print(f"category dtype memory: {after:>10,} bytes")
print(f"reduction:             {before / after:.0f}x smaller")

object dtype memory:   31,400,128 bytes
category dtype memory:    500,425 bytes
reduction:             63x smaller

Three distinct cities across half a million rows shrink from 31 MB to half a megabyte — a 63x cut — and every groupby or mask over that column now runs on integer codes. The rule of thumb: when a string column has few distinct values relative to its length, category pays off on both memory and speed.

eval and query for large expressions

For big arithmetic spanning several columns, a + b + c builds two temporary full-size arrays before you get the answer. pd.eval hands the expression to numexpr, which fuses the operations and skips those intermediates; query does the same for boolean filters, avoiding the Python-level overhead of fancy indexing.

import pandas as pd

df = pd.DataFrame({"a": [1, 2, 3, 4],
                   "b": [10, 20, 30, 40],
                   "c": [100, 200, 300, 400]})

direct   = df["a"] * df["b"] + df["c"]      # builds temporaries
via_eval = pd.eval("df.a * df.b + df.c")    # fused, no temporaries

print("Same result:", direct.equals(via_eval))
print(via_eval.tolist())

Same result: True
[110, 240, 390, 560]

The result is identical; the win is invisible here on four rows but real once the frame is millions of rows and those skipped temporaries would each be the size of the data. The same idea powers df.query("price > 100 and region == 'East'") for filtering.

Right dtypes, chunking, and changing tools

Three more levers round out the answer. Correct dtypes from the start — reading with dtype={"region": "category", "id": "int32", "score": "float32"} avoids ever allocating the wide default representation and the later conversion. Chunking — when a file does not fit in RAM, stream it with pd.read_csv(..., chunksize=100_000), aggregate each piece, and combine. And when the bottleneck is the pandas engine itself rather than your code, change tools: Polars uses every CPU core automatically, and DuckDB runs a vectorized, out-of-core query engine. Reaching for those is not an admission of failure — it is recognizing that single-threaded, in-memory has a ceiling.

Learn it properly Memory optimization

Why is pandas slow, and what are the main strategies to speed it up?

The root cause: Python overhead on every row

Categorical dtype for low-cardinality strings

eval and query for large expressions

Right dtypes, chunking, and changing tools

Keep practising

Explore further