What is the chi-square test, and when do you use it?
The chi-square test assesses whether observed categorical frequencies differ from expected frequencies (goodness-of-fit) or whether two categorical variables are independent of each other (test of independence). It requires count data, a sufficiently large sample, and expected cell counts of at least 5.
How to think about it
The chi-square family covers the most common tests on categorical data. Two variants dominate in practice; mixing them up or misapplying the test to non-count data are the most frequent errors.
Test statistic
chi^2 = sum over cells of (O - E)^2 / E
where O is the observed count and E is the expected count under H0. Large chi^2 values indicate the observed data deviates substantially from what H0 predicts. The statistic follows a chi-square distribution with degrees of freedom that depend on the variant.
Variant 1: Goodness-of-fit
Tests whether a single categorical variable follows a specified distribution.
- H0: the population proportions equal the specified values.
- H1: at least one proportion differs.
- df = k - 1, where k is the number of categories.
Example: does the observed distribution of user device types (mobile/tablet/desktop) match last year’s proportions?
Variant 2: Test of independence (contingency table)
Tests whether two categorical variables are associated.
- H0: the two variables are independent (joint probability = product of marginals).
- H1: they are not independent.
- df = (rows - 1) * (columns - 1).
- Expected count for each cell:
E_ij = (row total_i * col total_j) / grand total.
Example: is purchase completion independent of browser type?
Key assumptions
- Observations are independent — one observation per subject.
- Expected cell counts are at least 5 in each cell. For smaller samples, use Fisher’s exact test.
- Data are raw counts, not proportions, percentages, or continuous values binned post-hoc.
Effect size
Chi-square significance says nothing about effect magnitude. Report Cramér’s V for the test of independence: V = sqrt(chi^2 / (n * min(rows-1, cols-1))). V ranges from 0 (no association) to 1 (perfect association).