datarekha

Conditional & Total Probability

P(A given B) rescales probability to the world where B happened. The law of total probability stitches those pieces back together — and sets up Bayes.

8 min read Intermediate GATE DA Lesson 6 of 122

What you'll learn

  • Conditional probability P(A|B) = P(A and B) / P(B), and what 'given' really does
  • The multiplication rule P(A and B) = P(A|B)·P(B)
  • The law of total probability: splitting an event across exhaustive cases
  • Reading a probability tree to compute an overall probability

Before you start

It’s raining outside. You already know that — so the chance you also need an umbrella isn’t 30% any more, it’s basically 1. Conditioning is what your brain just did: once you learn B happened, you zoom into the slice of the world where B is true and re-measure A inside that slice. The notation P(A | B) (“probability of A given B”) is just a name for the move. It’s also the quantity almost every supervised ML model is built to estimate — a spam filter learns P(spam | the words in this email), a classifier learns P(label | features). We’ll write down the formula, then use it to stitch overall probabilities together from cases — that second move (the law of total probability) is the workhorse of the lesson.

The definition

P(A | B) = P(A ∩ B) / P(B),     for P(B) > 0

You restrict attention to outcomes where B holds (that’s the new denominator P(B)), then ask what fraction of those also have A. Rearranging gives the multiplication rule, which is often the more useful form:

P(A ∩ B) = P(A | B) · P(B)

“The chance both happen = chance B happens, times the chance A happens given B.”

Drag the circles to see this geometrically. Hit the Given B toggle and everything outside B fades — that’s the “zoom into the world where B is true” move. P(A | B) is then just the area of A∩B as a fraction of B’s area, exactly what the formula says.

The law of total probability

Often you can’t get P(A) directly, but you can split the world into exhaustive, mutually exclusive cases and find A’s probability within each. Then you recombine, weighting by how likely each case is.

0.60.4Machine 1Machine 20.020.05defective: 0.6×0.02 = 0.012defective: 0.4×0.05 = 0.020P(defective) = 0.012 + 0.020 = 0.032
Walk each path, multiply along it, then add the paths that reach the event.

If cases B₁, B₂, … are mutually exclusive and cover everything:

P(A) = Σ P(A | Bᵢ) · P(Bᵢ)

Worked example. A factory: Machine 1 makes 60% of items with a 2% defect rate; Machine 2 makes 40% with a 5% defect rate. The overall defect probability is 0.6·0.02 + 0.4·0.05 = 0.012 + 0.020 = 0.032 — i.e. 3.2%.

Flipping the question — given a defect, which machine did it come from? — is Bayes’ theorem, the next lesson. For a preview: dividing each path’s contribution by the total gives P(M2 | defective) = 0.020 / 0.032 ≈ 0.625.

Quick check

Quick check

0/4
Q1A box has 4 red and 6 blue balls. You draw two without replacement. What is P(both red)?numerical answer — type a number
Q2Using the factory above, what is the overall probability an item is defective?numerical answer — type a number
Q3Which statements about conditional probability are ALWAYS true (for P(B) > 0)? (select all that apply)select all that apply
Q4Which equation is the multiplication rule?

Practice this in an interview

All questions

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content