Pandas & Data Wrangling Medium Asked at PalantirAsked at Two SigmaAsked at Citadel

When should you use pandas eval() and query(), and what are their limitations?

For Data Scientist Data Engineer ML Engineer

The short answer

eval() and query() parse expression strings and delegate evaluation to numexpr, which uses multi-threaded SIMD operations and avoids allocating intermediate arrays — giving 2-10x speedups on large DataFrames. They are most beneficial on DataFrames larger than a few hundred thousand rows where intermediate array allocation dominates; for small frames, the expression parsing overhead makes them slower than standard indexing.

How to think about it

This question tests whether you know why eval/query exist — and the answer is memory, not just readability. When you write df[(df["a"] > 0) & (df["b"] < 10)], pandas allocates a separate boolean array for each condition before combining them. On a 10M-row frame those intermediates are expensive. eval/query compile the whole expression and evaluate it in one pass, often via numexpr with multi-threaded SIMD.

So query filters rows from one expression with no per-condition intermediates, and eval computes derived columns in a single pass. Both read more cleanly too — and you can reference a Python variable with @.

A worked example — query and eval

import pandas as pd

df = pd.DataFrame({"product": ["A", "B", "C", "D", "E", "F"],
                   "region": ["East", "West", "East", "East", "West", "West"],
                   "price": [120.0, 85.0, 200.0, 55.0, 310.0, 95.0],
                   "qty": [3, 7, 1, 12, 2, 5],
                   "cost": [80.0, 60.0, 140.0, 30.0, 220.0, 65.0]})

print(df.query("price > 90 and region == 'East'"))

  product region  price  qty   cost
0       A   East  120.0    3   80.0
2       C   East  200.0    1  140.0

The compound filter reads like English and returns the two East products over 90 (D, at 55, is filtered out). External variables come in through @:

min_qty = 3
print(df.query("qty >= @min_qty"))

  product region  price  qty  cost
0       A   East  120.0    3  80.0
1       B   West   85.0    7  60.0
3       D   East   55.0   12  30.0
5       F   West   95.0    5  65.0

And eval computes multiple columns in one block — here revenue and margin together:

df2 = df.eval("""
    revenue = price * qty
    margin  = revenue - cost * qty
""")
print(df2[["product", "revenue", "margin"]])

  product  revenue  margin
0       A    360.0   120.0
1       B    595.0   175.0
2       C    200.0    60.0
3       D    660.0   300.0
4       E    620.0   180.0
5       F    475.0   150.0

margin even references revenue from the line above within the same block. Note these results are identical to what plain indexing produces — on this six-row frame eval/query are actually slower, because the parsing overhead is fixed while the allocation savings scale with row count. The crossover is roughly 100k–500k rows; below it, stick with standard indexing.

Learn it properly Memory optimization

When should you use pandas eval() and query(), and what are their limitations?

A worked example — query and eval

Keep practising

Explore further