What is the significance level (alpha) in hypothesis testing, and how do you choose it?
The significance level alpha is the maximum tolerable probability of a Type I error — rejecting a true null hypothesis. It must be chosen before data collection based on the relative costs of false positives versus false negatives, not defaulted to 0.05 out of convention.
How to think about it
Alpha is the dial that controls your false-positive rate. Setting it correctly requires thinking about consequences, not copying convention.
Precise definition
Alpha is the pre-specified probability threshold at which you declare a result statistically significant:
alpha = P(reject H0 | H0 is true)
If you run a test at alpha = 0.05 and H0 is in fact true, there is a 5% chance you will incorrectly reject it due to sampling variation alone. This is not a defect — it is a known, acceptable error rate that you set in advance.
The decision rule
Compute the p-value from your data. Then:
p < alpha→ reject H0 (result is “statistically significant”)p >= alpha→ fail to reject H0
Alpha must be fixed before looking at the data. Adjusting alpha after seeing results — even slightly — is a form of p-hacking.
How to choose alpha
The conventional 0.05 came from R.A. Fisher’s informal writings in the 1920s. It is not universally correct. Consider:
| Context | Typical alpha | Reasoning |
|---|---|---|
| Exploratory A/B test | 0.05 – 0.10 | Cost of a false positive is low; getting a signal matters |
| Medical clinical trial | 0.01 – 0.001 | False positives can harm patients |
| Particle physics | 5-sigma (~0.0000003) | Extraordinary claims require extraordinary evidence |
| Genome-wide association | 5e-8 | Millions of simultaneous SNP tests |
| Early product experiment | 0.10 | Speed matters; decisions are reversible |
The alpha-power link
Alpha and power (1 - beta) move in opposite directions for a fixed sample size. Lowering alpha to reduce false positives simultaneously lowers power — increasing the chance of missing a real effect. The right alpha is the one that reflects the true cost ratio of the two error types in your domain.