datarekha
SQL Easy Asked at AmazonAsked at MicrosoftAsked at Uber

What is the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column)?

The short answer

COUNT(*) counts every row including those with NULLs. COUNT(column) counts only rows where that column is non-NULL. COUNT(DISTINCT column) counts unique non-NULL values in the column.

How to think about it

This looks like a basic question but it’s a reliable filter for candidates who understand NULLs. The interviewer wants to see whether you can reason through a concrete example with NULLs and duplicates — not just recite three definitions.

The setup: what each form actually counts

Think of a users table where some users haven’t provided an email, and some share an email (family accounts, test data, etc.).

  • COUNT(*) — counts every row in the table, period. Doesn’t care about NULLs in any column.
  • COUNT(email) — counts only rows where email is not NULL. Good for data quality checks.
  • COUNT(DISTINCT email) — counts unique, non-NULL email values. Good for cardinality: “how many distinct users?”

Try it with real data

When to reach for each

  • COUNT(*) — total row count; pagination totals; cardinality checks.
  • COUNT(col) — data completeness: “how many rows actually have this field filled in?”
  • COUNT(DISTINCT col) — dimension cardinality: “how many unique users clicked?” This is the standard funnel analysis form.

The approximate version for large tables

COUNT(DISTINCT col) requires the engine to track every unique value it sees — expensive at scale. BigQuery and Redshift offer approximate variants:

-- BigQuery
APPROX_COUNT_DISTINCT(user_id)

-- Redshift
HLL_COUNT.MERGE(HLL_COUNT.INIT(user_id))

These trade a small error margin (typically under 2%) for dramatically better performance on billions of rows.

Learn it properly NULLs done right

Keep practising

All SQL questions

Explore further

Skip to content