Statistics & Probability Medium Asked at GoogleAsked at MetaAsked at AmazonAsked at Microsoft

Walk me through Bayes' theorem with a disease-screening base-rate example.

For Data Scientist Data Analyst ML Engineer AI / LLM Engineer

The short answer

Bayes' theorem updates a prior probability with new evidence: P(H|E) = P(E|H) P(H) / P(E). In disease testing, ignoring the low base rate (prior) makes a positive test look far more alarming than it really is — most positives are false positives when the disease is rare.

How to think about it

Bayes’ theorem is non-negotiable for data roles. Demonstrate the formula, then plug in real numbers — interviewers want to see you handle a low base rate without flinching.

The formula

P(Disease | Positive) = P(Positive | Disease) × P(Disease) / P(Positive)

Expand the denominator using the law of total probability:

P(Positive) = P(Pos | Disease)×P(Disease) + P(Pos | No Disease)×P(No Disease)

Worked numeric example

Suppose:

Prevalence (base rate): P(D) = 0.01 (1 % of the population has the disease)
Sensitivity: P(Pos | D) = 0.95
False-positive rate: P(Pos | No D) = 0.05

Step 1 — denominator:

P(Pos) = 0.95×0.01 + 0.05×0.99 = 0.0095 + 0.0495 = 0.059

Step 2 — posterior:

P(D | Pos) = (0.95 × 0.01) / 0.059 ≈ 0.161

A positive test means only ~16 % chance of actually having the disease. Most positives are false alarms because the disease is rare.

Natural-frequency tree

Of ~60 total positives, only ~10 are true positives — 50 are false alarms from the large healthy pool.

Learn it properly Bayes theorem

Walk me through Bayes' theorem with a disease-screening base-rate example.

The formula

Worked numeric example

Natural-frequency tree

Keep practising

Explore further