What is a p-value, and what does it actually tell you?
A p-value is the probability of observing data at least as extreme as the collected data, assuming the null hypothesis is true. It measures surprise under H0 — not the probability that H0 is true or false.
How to think about it
The p-value is one of the most misquoted numbers in data science. The precise definition matters enormously, and interviewers test whether you know the difference.
Precise definition
p-value = P(test statistic as extreme or more extreme than observed | H0 is true)
That vertical bar is everything. The null hypothesis is assumed true; you then ask how often chance alone would produce data this extreme. A small p-value (say, p < 0.05) means your data would be rare if H0 were true — not that H0 is rare in reality.
What the p-value is NOT
- It is not
P(H0 is true | data). That is the posterior probability and requires Bayes’ theorem with a prior. - It is not the probability that your result was due to chance.
- It is not a measure of effect size or practical importance.
- A
p = 0.001does not mean the effect is large or meaningful — with millions of observations, tiny, irrelevant differences become highly significant.
Decision rule
Compare the p-value to a pre-chosen significance level alpha (often 0.05):
p < alpha→ reject H0p >= alpha→ fail to reject H0
That threshold is arbitrary and should be chosen before data collection, not after.
Why the direction matters
A p-value of 0.03 on a one-tailed test is very different from 0.03 on a two-tailed test; the same data can yield a significant result on one framing and not the other. Always specify which tail before looking at the data.