Walk me through Bayes' theorem with a disease-screening base-rate example.
The short answer
Bayes' theorem updates a prior probability with new evidence: P(H|E) = P(E|H) P(H) / P(E). In disease testing, ignoring the low base rate (prior) makes a positive test look far more alarming than it really is — most positives are false positives when the disease is rare.
How to think about it
Bayes’ theorem is non-negotiable for data roles. Demonstrate the formula, then plug in real numbers — interviewers want to see you handle a low base rate without flinching.
The formula
P(Disease | Positive) = P(Positive | Disease) × P(Disease) / P(Positive)
Expand the denominator using the law of total probability:
P(Positive) = P(Pos | Disease)×P(Disease) + P(Pos | No Disease)×P(No Disease)
Worked numeric example
Suppose:
- Prevalence (base rate):
P(D) = 0.01(1 % of the population has the disease) - Sensitivity:
P(Pos | D) = 0.95 - False-positive rate:
P(Pos | No D) = 0.05
Step 1 — denominator:
P(Pos) = 0.95×0.01 + 0.05×0.99 = 0.0095 + 0.0495 = 0.059
Step 2 — posterior:
P(D | Pos) = (0.95 × 0.01) / 0.059 ≈ 0.161
A positive test means only ~16 % chance of actually having the disease. Most positives are false alarms because the disease is rare.