datarekha
Pandas & Data Wrangling Easy Asked at GoogleAsked at NetflixAsked at Uber

How does boolean indexing work in pandas, and what are the common pitfalls?

The short answer

Boolean indexing filters a DataFrame by passing a boolean Series or array of the same length as the index. Common pitfalls include using Python's and/or instead of &/| and forgetting to wrap compound conditions in parentheses, both of which raise errors or produce wrong results.

How to think about it

What is really being tested

The interviewer wants to know whether you understand that pandas filtering works on whole-array boolean masks, not on row-by-row Python logic. Get this right and you unlock fast, readable filters. Get it wrong and you hit cryptic ValueErrors or, worse, silently wrong results.

How it works, step by step

A boolean mask is just a Series of True/False values aligned to the DataFrame’s index. When you write df["age"] > 30, pandas evaluates that comparison across the entire column at once (in C), producing a boolean Series. Passing that Series back into df[...] keeps only the rows where the mask is True.

df["age"] > 30
# row 0: False
# row 1: True
# row 2: False
# ...

Single vs. compound conditions

For a single condition, the syntax is straightforward:

seniors = df[df["age"] > 30]

For compound conditions you must use & (AND) and | (OR) — not Python’s and/or — and you must wrap each condition in parentheses because &/| have higher operator precedence than == or >:

# Correct
eng_senior = df[(df["dept"] == "eng") & (df["age"] > 30)]

# Wrong — raises ValueError
eng_senior = df[df["dept"] == "eng" and df["age"] > 30]

Other useful patterns

isin — set membership check, cleaner than chaining == with |:

target = df[df["dept"].isin(["eng", "mkt"])]

~ — negation of a mask:

not_eng = df[~(df["dept"] == "eng")]

query() — readable string syntax for ad-hoc exploration:

high_pay = df.query("salary > 80000 and dept == 'eng'")

Writing back with loc — always use loc for assignments to avoid the SettingWithCopyWarning:

df.loc[df["salary"] < 50000, "salary"] = 50000

Try it yourself

Why query() is for exploration, not production

query() evaluates strings at runtime. It is convenient for interactive work but makes code harder to compose programmatically (building filter strings dynamically is fragile) and slightly slower on very large DataFrames. Stick to mask syntax in production pipelines.

Learn it properly Selection: loc vs iloc

Keep practising

All Pandas & Data Wrangling questions

Explore further

Skip to content