datarekha
Pandas & Data Wrangling Medium Asked at PalantirAsked at Two SigmaAsked at Citadel

When should you use pandas eval() and query(), and what are their limitations?

The short answer

eval() and query() parse expression strings and delegate evaluation to numexpr, which uses multi-threaded SIMD operations and avoids allocating intermediate arrays — giving 2-10x speedups on large DataFrames. They are most beneficial on DataFrames larger than a few hundred thousand rows where intermediate array allocation dominates; for small frames, the expression parsing overhead makes them slower than standard indexing.

How to think about it

What is actually being asked

This question tests whether you understand why eval/query exist — which is memory, not just readability. When you write df[(df["a"] > 0) & (df["b"] < 10)], pandas allocates a boolean array for each condition separately before combining them. For a 10M-row DataFrame, those intermediate arrays are expensive. eval/query compile the whole expression and evaluate it in a single pass, often via numexpr which also uses multi-threaded SIMD instructions.

query() — cleaner row filtering

Standard boolean indexing creates one intermediate boolean array per condition:

# Standard: three intermediate boolean arrays allocated
df[(df["price"] > 100) & (df["region"] == "East") & (df["qty"] < 50)]

# query: one expression, no intermediate allocations
df.query("price > 100 and region == 'East' and qty < 50")

Reference a Python variable with @:

threshold = 100
df.query("price > @threshold")

eval() — computed columns without intermediate arrays

Standard column arithmetic creates a temporary array for every operation:

# Standard: intermediate arrays for price*qty, then subtract discount
df["revenue"] = df["price"] * df["qty"] - df["discount"]

# eval: computes in one pass
df.eval("revenue = price * qty - discount", inplace=True)

# Multi-expression block
df.eval("""
    revenue    = price * qty
    margin     = revenue - cost
    margin_pct = margin / revenue
""", inplace=True)

When the speedup actually matters

The parsing overhead of eval/query is fixed — it costs the same whether the DataFrame has 100 rows or 10 million. The savings from avoiding intermediate allocations grow linearly with the number of rows. The crossover point is roughly 100k–500k rows depending on the expression.

Below that threshold, standard indexing is faster. Above it, eval/query can be 2–10x faster.

Playground — query and eval on a small frame

Limitations you should name in an interview

  • Requires numexpr installed for the actual speedup (pip install numexpr). Without it, eval falls back to Python — still works, no speedup.
  • Supports only a subset of Python: no list comprehensions, custom function calls, or complex attribute access.
  • String quoting inside the expression string can get awkward for column names with special characters — use backticks (pandas 0.25+): df.query("`column name` > 0").
  • No benefit for DataFrames under ~100k rows. The parsing overhead will actually make it slower for small frames.
Learn it properly Memory optimization

Keep practising

All Pandas & Data Wrangling questions

Explore further

Skip to content