SQL Easy Asked at AmazonAsked at MicrosoftAsked at Uber

What is the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column)?

For Data Analyst Data Scientist Data Engineer

The short answer

COUNT(*) counts every row including those with NULLs. COUNT(column) counts only rows where that column is non-NULL. COUNT(DISTINCT column) counts unique non-NULL values in the column.

How to think about it

It looks basic, but it’s a reliable filter for whether a candidate actually understands NULLs. The interviewer wants to watch you reason through a concrete case with NULLs and duplicates, not recite three definitions.

Picture a users table where some people never gave an email, and some share one (family accounts, test data):

COUNT(*) counts every row, NULLs and all — it never looks at column values.
COUNT(email) counts only rows where email is non-NULL — a data-completeness measure.
COUNT(DISTINCT email) counts unique non-NULL emails — true cardinality.

A worked example

Six users; two have no email; two share aarav@example.com:

SELECT COUNT(*)              AS total_rows,
       COUNT(email)          AS rows_with_email,
       COUNT(DISTINCT email) AS unique_emails
FROM users;

total_rows	rows_with_email	unique_emails
6	4	3

The three numbers tell three different stories from one column. COUNT(*) is 6 — every row. COUNT(email) drops to 4, skipping the two NULL emails. And COUNT(DISTINCT email) is 3, because the shared aarav@example.com collapses two rows into a single distinct value. Confuse these and your “how many users?” report is quietly wrong.

When to reach for each

COUNT(*) — total rows: pagination totals, table size, cardinality checks.
COUNT(col) — completeness: “how many rows actually have this field filled in?”
COUNT(DISTINCT col) — dimension cardinality: “how many unique users clicked?” — the standard funnel form.

At billions of rows

COUNT(DISTINCT col) has to track every unique value it sees, which gets expensive at scale. BigQuery and Redshift offer approximate variants that trade a small error (typically under 2%) for a big speedup:

APPROX_COUNT_DISTINCT(user_id)                  -- BigQuery
HLL_COUNT.MERGE(HLL_COUNT.INIT(user_id))        -- Redshift

Learn it properly NULLs done right

What is the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column)?

A worked example

When to reach for each

At billions of rows

Keep practising

Explore further