DataFrame basics
The single most important class in the Python data stack. Create one, inspect it, and select from it without surprises.
What you'll learn
- ✓ Three reliable ways to construct a DataFrame
- ✓ How to inspect a DataFrame in a few seconds
- ✓ Why `[]` vs `.loc` vs `.iloc` matters
A DataFrame is a 2D labeled table — like a spreadsheet, like a SQL result set. It has row labels (the index) and column labels (the columns), and each column has a single dtype.
Three ways to construct one
(click Run)In a real job you’ll almost always use option 3 (read from a file). But options 1 and 2 are perfect for tests and small examples.
Inspect a DataFrame in 10 seconds
(click Run).head(), .shape, .dtypes, .describe() and .value_counts() are the
five things you’ll run within seconds of loading any new dataset. Burn
them into muscle memory.
Selecting columns
df["age"] # one column → returns a Series
df[["age", "city"]] # multiple columns → returns a DataFrame
A Series is a 1D labeled array — basically a single column. A DataFrame is a collection of Series sharing an index.
Selecting rows — .loc vs .iloc
This is the most common source of confusion for newcomers, and the rule is actually simple:
.locuses labels (the index values, column names)..ilocuses integer positions (0-based).
(click Run)Boolean filtering — the workhorse
(click Run)The parentheses around each comparison are required because of Python’s operator precedence. Forget them and you’ll get a confusing error.
Creating, modifying, dropping columns
df["bonus"] = df["salary"] * 0.10 # new column
df["age"] = df["age"] + 1 # modify in place
df = df.drop(columns=["bonus"]) # drop returns a new DataFrame
df = df.rename(columns={"city": "loc"}) # rename
Quick check
✦ Quick check
0/3 answeredFinished the lesson?
Mark it complete to track your progress and keep your streak alive. +20 XP