Why is pandas slow, and what are the main strategies to speed it up?
pandas is slow primarily because Python loops bypass NumPy's vectorized C kernels, object-dtype columns prevent SIMD optimizations, and keeping entire datasets in memory limits scalability. The fixes are vectorization, categorical encoding, eval/query for large frames, chunking for out-of-core data, and switching to Polars or DuckDB for compute-heavy pipelines.
How to think about it
How to frame this answer
The honest answer has two layers. First, there’s a correctness layer: most slowness comes from code that’s pandas-shaped but still doing Python-level work (loops, iterrows, apply with a lambda that calls Python functions). Second, there’s a scalability layer: even well-written pandas is single-threaded and fully in-memory. Knowing both tells the interviewer you understand not just the quick fix but also the architecture.
The root cause: Python overhead on every row
NumPy’s speed comes from processing entire arrays in C without touching Python between elements. When you loop over rows — whether with a for loop, iterrows(), or a slow apply — you pay Python’s interpreter overhead for every single cell. On a million-row DataFrame, that’s a million Python interpreter round-trips.
Strategy 1: Vectorize — eliminate all Python loops
Strategy 2: categorical dtype for low-cardinality strings
String comparisons on object dtype columns are slow because each element is an independent Python string object. Converting to category dtype replaces the strings with small integer codes — comparisons become integer comparisons, groupby builds its hash table over codes instead of strings.
# object dtype: comparisons and groupby hash every Python string
df["city"] = df["city"].astype("category")
# Now groupby, value_counts, and boolean masks are all faster
# Memory drops from ~50 bytes/value to ~1-2 bytes/value for low cardinality
Strategy 3: eval and query for large arithmetic and filter expressions
pd.eval uses numexpr under the hood to evaluate arithmetic across columns without creating intermediate arrays. For a DataFrame with millions of rows, a + b + c creates two temporary full-size arrays; pd.eval("a + b + c") avoids both.
# Avoids intermediate arrays for large frames
result = pd.eval("df['a'] * df['b'] + df['c']")
# query avoids Python overhead in boolean indexing
df.query("price > 100 and region == 'East'")
Strategy 4: Correct dtypes from the start
Reading with explicit dtypes avoids the object→category conversion later and means pandas never allocates the large initial representation:
df = pd.read_csv(
"data.csv",
dtype={"region": "category", "id": "int32", "score": "float32"},
)
Strategy 5: Chunking for out-of-core data
When the file is too large to fit in RAM, process it in pieces:
totals = []
for chunk in pd.read_csv("big.csv", chunksize=100_000):
totals.append(chunk.groupby("region")["sales"].sum())
pd.concat(totals).groupby(level=0).sum()
Strategy 6: Escape to Polars or DuckDB for compute-heavy pipelines
When the bottleneck is the pandas engine itself (single-threaded, fully in-memory), the right move is to swap tools. Polars uses all CPU cores automatically; DuckDB has a vectorized query engine and can operate out-of-core.