Pandas & Data Wrangling Easy Asked at GoogleAsked at NetflixAsked at Uber

How does boolean indexing work in pandas, and what are the common pitfalls?

For Data Analyst Data Scientist ML Engineer

The short answer

Boolean indexing filters a DataFrame by passing a boolean Series or array of the same length as the index. Common pitfalls include using Python's and/or instead of &/| and forgetting to wrap compound conditions in parentheses, both of which raise errors or produce wrong results.

How to think about it

The interviewer wants to know whether you grasp that pandas filtering works on whole-array boolean masks, not row-by-row Python logic. Get it right and you write fast, readable filters; get it wrong and you hit cryptic ValueErrors or — worse — silently wrong results.

A boolean mask is a Series of True/False aligned to the DataFrame’s index. When you write df["age"] > 30, pandas evaluates that comparison across the whole column at once (in C), producing the mask; passing it back into df[...] keeps only the True rows. For compound conditions you must use &/| (not Python’s and/or) and parenthesise each condition, because &/| bind tighter than ==/>.

A worked example

import pandas as pd

df = pd.DataFrame({"age": [25, 34, 29, 41, 22, 37],
                   "salary": [55000, 90000, 72000, 120000, 48000, 85000],
                   "dept": ["eng", "eng", "mkt", "eng", "mkt", "sales"]})

print(df[df["age"] > 30])                              # single condition

   age  salary   dept
1   34   90000    eng
3   41  120000    eng
5   37   85000  sales

The mask kept rows 1, 3, 5 — the three over-30s — preserving their original index labels. A compound filter needs & and parentheses:

print(df[(df["dept"] == "eng") & (df["age"] > 30)])    # AND, each side parenthesised

   age  salary dept
1   34   90000  eng
3   41  120000  eng

Now only the over-30 engineers survive. Two more everyday patterns — isin for set membership and ~ for negation:

print(df[df["dept"].isin(["eng", "mkt"])])             # membership
print(df[~(df["dept"] == "eng")])                      # NOT

   age  salary dept
0   25   55000  eng
1   34   90000  eng
2   29   72000  mkt
3   41  120000  eng
4   22   48000  mkt

   age  salary   dept
2   29   72000    mkt
4   22   48000    mkt
5   37   85000  sales

isin(["eng", "mkt"]) keeps everyone in either department (the three sales rows drop); ~ flips the mask to give the non-eng rows. For assignments, always go through .loc — df.loc[df["salary"] < 50000, "salary"] = 50000 — to avoid the SettingWithCopyWarning. (query("salary > 70000 and dept == 'eng'") reads nicely for interactive work, but build production filters from masks — dynamic query strings are fragile.)

Learn it properly Selection: loc vs iloc

How does boolean indexing work in pandas, and what are the common pitfalls?

A worked example

Keep practising

Explore further