datarekha

Joint, Marginal & Conditional Distributions

How two random variables live in one table: read off a marginal by summing out the other, a conditional by re-normalising a slice, and chain them with the law of total expectation.

9 min read Advanced GATE DA Lesson 14 of 122

What you'll learn

  • Joint PMF p(x,y) is the probability of two values happening together
  • Marginal: sum out the OTHER variable — p(x) = Σ_y p(x,y)
  • Conditional p(y|x) = p(x,y)/p(x); independence means joint = product of marginals in EVERY cell
  • Conditional expectation E[Y|X=x] and the law of total expectation E[E[Y|X]] = E[Y]

Before you start

Every variable so far has lived alone — one die, one coin, one waiting time. Real questions hardly ever stay that tidy. You usually have two things going on at once: a height and a weight, a machine and a defect, an X and a Y. The joint distribution is the one little table holding the probability of every pairing. Everything else — the marginals, the conditionals, the test for independence — you read straight off that table by either summing it or slicing it. Once you can do those two moves, this whole topic is bookkeeping. It is also the exact bookkeeping behind a naive-Bayes classifier and any feature-vs-label cross-tab you will build from real data.

One table, three readings

The joint PMF p(x, y) = P(X = x, Y = y) gives the probability that both happen together. Over the whole table the entries must be non-negative and sum to 1. From it you extract two simpler views:

Joint PMF p(x, y)X=0X=1Y=0Y=10.10.20.30.4→ sum row = p(X=1) = 0.7sum column = p(Y=1) = 0.6Margins are sums; a conditional re-scales one highlighted row to total 1.
Sum a row or column to get a marginal; divide a row by its own total to get a conditional.
  • Marginal — the distribution of one variable on its own, found by summing out the other: p(x) = Σ_y p(x,y) (collapse the columns), p(y) = Σ_x p(x,y) (collapse the rows). The name comes from writing these row/column totals in the margins of the table.
  • Conditional — the distribution of Y given a fixed X = x, found by taking that one row and re-scaling it to sum to 1: p(y|x) = p(x,y) / p(x). Dividing by the marginal p(x) is exactly the normaliser that makes the slice a valid distribution.

Drag the two circles to move events A and B; flip “Given B” and watch the universe shrink to just B. That shrinking is what “conditioning” is — you restrict the world to where the condition holds, then re-read the probability inside that smaller world. Look for the independence badge to light up: it flickers on only when P(A∩B) happens to equal P(A)·P(B), the same rule the joint table version below will demand.

Independence — every cell, not just one

X and Y are independent precisely when the joint factors into the product of marginals in every cell:

X ⊥ Y   ⇔   p(x, y) = p(x) · p(y)   for ALL (x, y)

This is a strong condition. A single cell where p(x,y) ≠ p(x)·p(y) breaks independence for the whole pair — you must check all of them (or find one counterexample to rule it out).

Conditional expectation and the law of total expectation

Once you have the conditional p(y|x), its mean is the conditional expectation

E[Y | X = x] = Σ_y  y · p(y | x).

Read as a function of x, the quantity E[Y | X] is itself a random variable. Averaging it over the distribution of X recovers the plain mean — the law of total expectation:

E[ E[Y | X] ]  =  E[Y].

It is the “average of the averages” rule: split the population into groups by X, average Y inside each group, then average those group-means weighted by how big each group is — you get the overall mean of Y.

How GATE asks this

Almost always a NAT built on a small joint table: “find the marginal p(X=1)”, “find P(Y=1 | X=0)”, or “find E[Y | X = 1]”. A 2025 question went one level up and tested the law of total expectation directly — it gave a joint setup and asked for E[ E[X | Y] ], where the entire trick is recognising that this collapses to E[X] with no further computation. Spot the nested expectation and you save all the algebra.

Worked example — read a 2×2 table

A joint PMF of (X, Y), each taking values in {0, 1}:

Y = 0Y = 1
X = 00.100.20
X = 10.300.40

The four cells sum to 0.1 + 0.2 + 0.3 + 0.4 = 1. Good — it’s a valid joint PMF.

Marginal of X (sum out Y, i.e. add across each row):

p(X = 0) = 0.10 + 0.20 = 0.30
p(X = 1) = 0.30 + 0.40 = 0.70

Marginal of Y (sum out X, i.e. add down each column):

p(Y = 0) = 0.10 + 0.30 = 0.40
p(Y = 1) = 0.20 + 0.40 = 0.60

Conditional of Y given X = 1 (take the X = 1 row, divide by p(X=1) = 0.7):

p(Y = 0 | X = 1) = 0.30 / 0.70 = 3/7 ≈ 0.4286
p(Y = 1 | X = 1) = 0.40 / 0.70 = 4/7 ≈ 0.5714      (these two sum to 1 ✓)

Independence check (does the joint equal the product of marginals?). Test the top-left cell:

p(X=0, Y=0) = 0.10      but      p(X=0) · p(Y=0) = 0.30 · 0.40 = 0.12.
0.10 ≠ 0.12  →  X and Y are NOT independent.

One failing cell is enough — they are dependent.

Conditional expectation E[Y | X = 1] (since Y is 0 or 1, only the Y=1 term survives):

E[Y | X = 1] = 0 · (3/7) + 1 · (4/7) = 4/7 ≈ 0.5714.

(For a 0/1 variable the conditional expectation is just the conditional probability that Y = 1 — a handy shortcut.)

Quick check

Quick check

0/5
Q1Joint PMF of (X,Y) with X,Y in {0,1}: p(0,0)=0.10, p(0,1)=0.20, p(1,0)=0.30, p(1,1)=0.40. Find the marginal P(X = 1). (2 decimals)numerical answer — type a number
Q2Same table. Find the conditional probability P(Y = 1 | X = 1). (2 decimals)numerical answer — type a number
Q3Same table. Compute E[Y | X = 0]. (2 decimals)numerical answer — type a number
Q4From the same table, are X and Y independent?
Q5Which statements are always TRUE for discrete random variables X and Y? (select all that apply)select all that apply

Practice this in an interview

All questions
Explain joint, marginal, and conditional distributions and how to move between them.

The joint distribution P(X, Y) fully specifies two random variables together. Marginals P(X) and P(Y) are obtained by summing (or integrating) the joint over the other variable. Conditionals P(X|Y=y) are the joint sliced at a fixed y value, renormalized by the marginal P(Y=y).

What is conditional probability, and how does it differ from joint probability?

Conditional probability P(A|B) is the probability of A given that B has already occurred, computed as P(A and B) / P(B). It narrows the sample space to B, whereas joint probability P(A and B) lives in the full, unrestricted space.

When does each common distribution arise — Bernoulli, Binomial, Poisson, Normal, Exponential, Uniform?

Each distribution has a natural generative story: Bernoulli is a single coin flip; Binomial sums Bernoullis; Poisson counts rare arrivals; Normal emerges from sums of many small effects; Exponential models waiting times between Poisson events; Uniform assigns equal probability across a range. Choosing correctly comes from matching that story to the data-generating process.

State the law of total probability and give a concrete example of when you'd apply it.

The law of total probability decomposes P(A) over a mutually exclusive, exhaustive partition of the sample space: P(A) = Σ P(A|Bᵢ)·P(Bᵢ). It is the engine behind the Bayes denominator and any calculation where you want an overall rate built from segment-level rates.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content