Explain joint, marginal, and conditional distributions and how to move between them.

The joint distribution P(X, Y) fully specifies two random variables together. Marginals P(X) and P(Y) are obtained by summing (or integrating) the joint over the other variable. Conditionals P(X|Y=y) are the joint sliced at a fixed y value, renormalized by the marginal P(Y=y).

What is conditional probability, and how does it differ from joint probability?

Conditional probability P(A|B) is the probability of A given that B has already occurred, computed as P(A and B) / P(B). It narrows the sample space to B, whereas joint probability P(A and B) lives in the full, unrestricted space.

When does each common distribution arise — Bernoulli, Binomial, Poisson, Normal, Exponential, Uniform?

Each distribution has a natural generative story: Bernoulli is a single coin flip; Binomial sums Bernoullis; Poisson counts rare arrivals; Normal emerges from sums of many small effects; Exponential models waiting times between Poisson events; Uniform assigns equal probability across a range. Choosing correctly comes from matching that story to the data-generating process.

State the law of total probability and give a concrete example of when you'd apply it.

The law of total probability decomposes P(A) over a mutually exclusive, exhaustive partition of the sample space: P(A) = Σ P(A|Bᵢ)·P(Bᵢ). It is the engine behind the Bayes denominator and any calculation where you want an overall rate built from segment-level rates.

Joint, Marginal & Conditional Distributions — GATE DA

Joint, Marginal & Conditional Distributions

Real questions rarely involve one variable alone — a height and a weight, a machine and a defect. One little table holds the chance of every pairing, and once you can read it two ways (sum it for a marginal, slice it for a conditional), the whole topic is bookkeeping.

9 min read Advanced GATE DA Lesson 14 of 122

What you'll learn

Joint PMF p(x,y): the chance of two values together

Marginal — sum out the OTHER variable; Conditional — re-scale one slice to total 1

Independence means joint = product of marginals in EVERY cell

Conditional expectation E[Y|X] and the law of total expectation E[E[Y|X]] = E[Y]

Every variable so far has lived alone — one die, one coin, one waiting time. Real questions are rarely so tidy. You usually have two things happening at once: a student’s height and their weight, a part’s machine and whether it is defective, an X and a Y. The one little table that holds the chance of every pairing is where we begin. And once you can read that table two ways — by summing it, and by slicing it — the whole topic turns into bookkeeping.

One table, two readings

The joint PMF p(x, y) = P(X = x, Y = y) gives the chance that both happen together. Across the whole table the entries are non-negative and add to one. From it fall the two simpler views.

Sum a row or column for a marginal; divide a row by its own total for a conditional.

The marginal is one variable on its own, found by summing out the other — collapse the columns and you have p(x) = Σ_y p(x,y); collapse the rows and you have p(y). The name comes from writing these totals in the margins of the table.

The conditional is Y once X is fixed, found by taking that one row and re-scaling it to total 1: p(y | x) = p(x,y) / p(x). Dividing by the marginal p(x) is exactly the normaliser that turns a single row back into a proper distribution — the same “shrink to the world where the condition holds” move from the conditional lesson. Drag the circles and toggle “Given B” to feel that shrink once more:

Tryconditional probability

Drag the events — conditioning shrinks the universe

A ⫫ B (independent)

Drag a circle to move it, or the small dot on its edge to resize. Dots are a fixed Monte Carlo sample of the universe.

P(A)0.166

P(B)0.136

P(A ∩ B)0.027

P(A | B)P(A∩B) / P(B)0.200

P(B | A)0.164

P(A)·P(B)0.023

P(A ∩ B)0.027

These match — A and B are independent.

Toggle Condition on B to dim everything outside B — conditioning throws away the rest of the universe.

Independence — every cell, not just one

X and Y are independent exactly when the joint splits into the product of the two marginals in every cell:

X ⊥ Y   ⇔   p(x, y) = p(x) · p(y)   for ALL (x, y)

This is a strong demand. A single cell where p(x,y) ≠ p(x)·p(y) breaks independence for the whole pair, so a matching cell proves nothing — you check them all, or find one mismatch to rule it out.

Conditional expectation and total expectation

Once you have the conditional p(y|x), its mean is the conditional expectation E[Y | X = x] = Σ_y y · p(y | x). Read as a function of x, the quantity E[Y | X] is itself a random variable, and averaging it over X brings back the plain mean — the law of total expectation:

E[ E[Y | X] ] = E[Y]

It is the “average of the group averages” rule: split the population by X, average Y inside each group, then average those group-means weighted by group size, and you recover E[Y]. A 2025 question handed over a joint setup and asked for E[E[X|Y]], where the entire trick is spotting that it collapses to E[X] with no computation.

Reading a 2×2 table

A joint PMF of (X, Y), each in {0, 1}:

Y = 0 Y = 1
X = 0 0.10 0.20
X = 1 0.30 0.40

The four cells sum to 1, so it is a valid joint PMF.

	Y = 0	Y = 1
X = 0	0.10	0.20
X = 1	0.30	0.40

Sum out a variable for each marginal — add across the rows for X, down the columns for Y:

p(X=0) = 0.10 + 0.20 = 0.30        p(Y=0) = 0.10 + 0.30 = 0.40
p(X=1) = 0.30 + 0.40 = 0.70        p(Y=1) = 0.20 + 0.40 = 0.60

Now slice. The conditional of Y given X = 1 is that row divided by p(X=1) = 0.7:

p(Y=0 | X=1) = 0.30 / 0.70 = 3/7 ≈ 0.4286
p(Y=1 | X=1) = 0.40 / 0.70 = 4/7 ≈ 0.5714      (the two sum to 1 ✓)

Are X and Y independent? Test the top-left cell: p(X=0, Y=0) = 0.10, but p(X=0)·p(Y=0) = 0.30·0.40 = 0.12. Since 0.10 ≠ 0.12, one mismatch is enough — they are dependent. Finally, with Y a 0/1 variable only its Y=1 term survives, so E[Y | X = 1] = 0·(3/7) + 1·(4/7) = 4/7 ≈ 0.5714 — for a 0/1 variable the conditional expectation is just the conditional chance that Y = 1.

A question to carry forward

We found X and Y here are dependent — but “dependent” is only a yes-or-no. Here is the thread onward: when two variables do move together, can we put a single number on how much, and on whether they rise together or pull in opposite directions?

In one breath

A joint PMF p(x,y) tabulates P(X=x, Y=y); entries ≥ 0, summing to 1.
Marginal = sum out the other variable: p(x) = Σ_y p(x,y) (the totals in the margins).
Conditional = take one slice and re-normalise: p(y|x) = p(x,y)/p(x).
Independence is strong: p(x,y) = p(x)·p(y) in every cell — one mismatch (0.10 vs 0.12) breaks it.
Conditional expectation E[Y|X=x] = Σ_y y·p(y|x); the law of total expectation E[E[Y|X]] = E[Y] (average of group averages) collapses a nested expectation with no algebra.

Practice

Quick check

0/6

Q1Recall: to get the marginal p(X=x) from a joint table, you…

Q2Trace: joint PMF p(0,0)=0.10, p(0,1)=0.20, p(1,0)=0.30, p(1,1)=0.40. Find the marginal P(X = 1).numerical answer — type a number

Q3Trace: same table. Find the conditional P(Y = 1 | X = 1).numerical answer — type a number

Q4Apply: same table. Compute E[Y | X = 0].numerical answer — type a number

Q5Apply: from the same table, are X and Y independent?

Q6Create: a joint setup gives E[X|Y]. Without any further numbers, what is E[E[X|Y]], and why?

Joint, Marginal & Conditional Distributions

What you'll learn

Before you start

One table, two readings

Drag the events — conditioning shrinks the universe

Independence — every cell, not just one

Conditional expectation and total expectation

Reading a 2×2 table

A question to carry forward

In one breath

Practice

Quick check

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further