datarekha

Bayesian Networks & Joint Factorization

Draw who causes what as a DAG, attach a small probability table to each node, and the whole joint distribution falls out as a product. The compact picture GATE keeps asking about.

9 min read Advanced GATE DA Lesson 105 of 122

What you'll learn

  • A Bayesian network is a DAG where each node carries a conditional probability table (CPT) given its parents
  • The joint factorises as a product of CPTs — one factor per node, conditioned only on its parents
  • Conditional independence: every node is independent of its non-descendants given its parents
  • Posterior inference on a tiny net is just Bayes' theorem with the joint expanded by the factorisation

Before you start

Diseases cause symptoms. Symptoms cause test results. A Bayesian network is just a clean way to draw who causes what — and then ask probabilistic questions on top.

Each node in the picture is a random variable. Each arrow says “this directly influences that”. Attach one small probability table per node — its conditional probability table (CPT) — and you’ve quietly specified the entire joint distribution, in a fraction of the numbers it would take to list the joint outright. This is the model behind real diagnostic systems (medical risk, equipment fault-finding) and the probabilistic graphical models used across ML to reason under uncertainty — the compactness is what makes them tractable.

The DAG and its CPTs

A Bayesian network is a directed acyclic graph (DAG) in which every node X carries a CPT P(X | Parents(X)). No cycles, no self-influence — just parents pointing to children.

BurglaryP(B)=0.001EarthquakeP(E)=0.002AlarmP(B, E, A) = P(B) · P(E) · P(A | B, E)
Three nodes, three CPTs. The joint is the product — that’s the factorisation.

The Alarm node needs a CPT with one row per parent combination — four rows here, one for each setting of (B, E):

BEP(A=1 | B, E)
110.95
100.94
010.29
000.001
Alarm’s CPT: one row per parent assignment. Row probabilities don’t have to sum to 1 across rows — each row is its own distribution over A.

The joint factorisation

For any Bayes net on n variables:

P(X₁, X₂, …, Xₙ) = ∏ᵢ P(Xᵢ | Parents(Xᵢ))

That’s the headline. Instead of one giant table over all 2ⁿ joint outcomes, you store one small table per node — conditioned only on its parents, not on every earlier variable. That’s where the savings come from.

The corresponding independence statement: each node is independent of its non-descendants given its parents. In the alarm net, once you know B and E, the alarm is independent of any other non-descendant — its parents fully explain it.

Worked example — a 2-node net

A 2-node net Disease → Test with P(D) = 0.3, P(+ | D) = 0.8, P(+ | ¬D) = 0.1. The test reads positive. Compute P(D | +).

The joint factorises as P(D, T) = P(D) · P(T | D). So the joint for the observed outcome D = 1, T = + is:

P(D=1, +) = P(D) · P(+ | D) = 0.30 · 0.80 = 0.24

Posterior by Bayes — the denominator is the total probability of a positive test, which is itself just summing the joint over D:

P(D | +) =        P(+ | D) · P(D)
            ──────────────────────────────────
            P(+ | D)·P(D) + P(+ | ¬D)·P(¬D)

         =       0.80 · 0.30                =  0.24 / 0.31  ≈  0.77
            ──────────────────────────
            0.80·0.30 + 0.10·0.70

So ≈ 0.77 — the same shape as the disease/test problem from the Bayes’ theorem lesson (and GATE DA 2026 Q57). The factorisation just made the joint mechanical to write down before plugging into Bayes.

Drag the prior down and watch the same posterior collapse when the disease is rare — base rates dominate Bayes-net inference too.

How GATE asks this

Usually a NAT: a small 3-or-4-node net with CPTs given, asking for a joint probability of a specific assignment (multiply down the factorisation), or a posterior of one variable given evidence on another (factorise the joint, then Bayes). MSQs ask which independence statements the DAG implies. GATE DA 2026 Q57 was the disease/positive-test posterior above (answer 0.77). GATE DA 2024 ran the same machinery on a slightly larger net.

The recipe never changes: write the joint as a product of CPTs, plug in the observed values, normalise.

Quick check

Quick check

0/6
Q1In the Burglary → Alarm ← Earthquake net with P(B)=0.001, P(E)=0.002, P(A=1 | B=1, E=1)=0.95, compute the joint P(B=1, E=1, A=1). Give a value × 10⁻⁶ (i.e., enter the value in millionths, so 1.9 means 1.9 × 10⁻⁶).numerical answer — type a number
Q2A 2-node net Cloudy → Rain has P(C=1) = 0.5, P(R=1 | C=1) = 0.8, P(R=1 | C=0) = 0.2. It rained. Compute P(C=1 | R=1) to 2 decimals.numerical answer — type a number
Q3Which statements about a Bayesian network on variables X₁, …, Xₙ are TRUE? (select all that apply)select all that apply
Q4In the Disease → Test net (P(D)=0.3, P(+|D)=0.8, P(+|¬D)=0.1), what is the joint P(D=1, T=+)? (3 decimals)numerical answer — type a number
Q5For a Bayes net on 4 binary variables with no edges (all independent), how many independent parameters specify the joint distribution? (Compare to the 2⁴−1 = 15 needed to specify a full joint with no structure.)numerical answer — type a number
Q6Which are valid reasons to use a Bayesian network instead of storing the full joint distribution? (select all that apply)select all that apply

Practice this in an interview

All questions
Explain joint, marginal, and conditional distributions and how to move between them.

The joint distribution P(X, Y) fully specifies two random variables together. Marginals P(X) and P(Y) are obtained by summing (or integrating) the joint over the other variable. Conditionals P(X|Y=y) are the joint sliced at a fixed y value, renormalized by the marginal P(Y=y).

What is conditional probability, and how does it differ from joint probability?

Conditional probability P(A|B) is the probability of A given that B has already occurred, computed as P(A and B) / P(B). It narrows the sample space to B, whereas joint probability P(A and B) lives in the full, unrestricted space.

When does each common distribution arise — Bernoulli, Binomial, Poisson, Normal, Exponential, Uniform?

Each distribution has a natural generative story: Bernoulli is a single coin flip; Binomial sums Bernoullis; Poisson counts rare arrivals; Normal emerges from sums of many small effects; Exponential models waiting times between Poisson events; Uniform assigns equal probability across a range. Choosing correctly comes from matching that story to the data-generating process.

State the law of total probability and give a concrete example of when you'd apply it.

The law of total probability decomposes P(A) over a mutually exclusive, exhaustive partition of the sample space: P(A) = Σ P(A|Bᵢ)·P(Bᵢ). It is the engine behind the Bayes denominator and any calculation where you want an overall rate built from segment-level rates.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content