datarekha
Statistics & Probability Easy Asked at GoogleAsked at AmazonAsked at MetaAsked at Microsoft

What is the chi-square test, and when do you use it?

The short answer

The chi-square test assesses whether observed categorical frequencies differ from expected frequencies (goodness-of-fit) or whether two categorical variables are independent of each other (test of independence). It requires count data, a sufficiently large sample, and expected cell counts of at least 5.

How to think about it

The chi-square family covers the most common tests on categorical data. Two variants dominate in practice; mixing them up or misapplying the test to non-count data are the most frequent errors.

Test statistic

chi^2 = sum over cells of (O - E)^2 / E

where O is the observed count and E is the expected count under H0. Large chi^2 values indicate the observed data deviates substantially from what H0 predicts. The statistic follows a chi-square distribution with degrees of freedom that depend on the variant.

Variant 1: Goodness-of-fit

Tests whether a single categorical variable follows a specified distribution.

  • H0: the population proportions equal the specified values.
  • H1: at least one proportion differs.
  • df = k - 1, where k is the number of categories.

Example: does the observed distribution of user device types (mobile/tablet/desktop) match last year’s proportions?

Variant 2: Test of independence (contingency table)

Tests whether two categorical variables are associated.

  • H0: the two variables are independent (joint probability = product of marginals).
  • H1: they are not independent.
  • df = (rows - 1) * (columns - 1).
  • Expected count for each cell: E_ij = (row total_i * col total_j) / grand total.

Example: is purchase completion independent of browser type?

Key assumptions

  1. Observations are independent — one observation per subject.
  2. Expected cell counts are at least 5 in each cell. For smaller samples, use Fisher’s exact test.
  3. Data are raw counts, not proportions, percentages, or continuous values binned post-hoc.

Effect size

Chi-square significance says nothing about effect magnitude. Report Cramér’s V for the test of independence: V = sqrt(chi^2 / (n * min(rows-1, cols-1))). V ranges from 0 (no association) to 1 (perfect association).

Learn it properly Distributions you should know

Keep practising

All Statistics & Probability questions

Explore further

Skip to content