Bayesian Networks & Joint Factorization
Draw who causes what as a DAG, attach a small probability table to each node, and the whole joint distribution falls out as a product. The compact picture GATE keeps asking about.
What you'll learn
- A Bayesian network is a DAG where each node carries a conditional probability table (CPT) given its parents
- The joint factorises as a product of CPTs — one factor per node, conditioned only on its parents
- Conditional independence: every node is independent of its non-descendants given its parents
- Posterior inference on a tiny net is just Bayes' theorem with the joint expanded by the factorisation
Before you start
Diseases cause symptoms. Symptoms cause test results. A Bayesian network is just a clean way to draw who causes what — and then ask probabilistic questions on top.
Each node in the picture is a random variable. Each arrow says “this directly influences that”. Attach one small probability table per node — its conditional probability table (CPT) — and you’ve quietly specified the entire joint distribution, in a fraction of the numbers it would take to list the joint outright. This is the model behind real diagnostic systems (medical risk, equipment fault-finding) and the probabilistic graphical models used across ML to reason under uncertainty — the compactness is what makes them tractable.
The DAG and its CPTs
A Bayesian network is a directed acyclic graph (DAG) in which every node
X carries a CPT P(X | Parents(X)). No cycles, no self-influence — just
parents pointing to children.
The Alarm node needs a CPT with one row per parent combination — four rows here,
one for each setting of (B, E):
| B | E | P(A=1 | B, E) |
|---|---|---|
| 1 | 1 | 0.95 |
| 1 | 0 | 0.94 |
| 0 | 1 | 0.29 |
| 0 | 0 | 0.001 |
The joint factorisation
For any Bayes net on n variables:
P(X₁, X₂, …, Xₙ) = ∏ᵢ P(Xᵢ | Parents(Xᵢ))
That’s the headline. Instead of one giant table over all 2ⁿ joint outcomes,
you store one small table per node — conditioned only on its parents, not on
every earlier variable. That’s where the savings come from.
The corresponding independence statement: each node is independent of its non-descendants given its parents. In the alarm net, once you know B and E, the alarm is independent of any other non-descendant — its parents fully explain it.
Worked example — a 2-node net
A 2-node net
Disease → TestwithP(D) = 0.3,P(+ | D) = 0.8,P(+ | ¬D) = 0.1. The test reads positive. ComputeP(D | +).
The joint factorises as P(D, T) = P(D) · P(T | D). So the joint for the
observed outcome D = 1, T = + is:
P(D=1, +) = P(D) · P(+ | D) = 0.30 · 0.80 = 0.24
Posterior by Bayes — the denominator is the total probability of a positive
test, which is itself just summing the joint over D:
P(D | +) = P(+ | D) · P(D)
──────────────────────────────────
P(+ | D)·P(D) + P(+ | ¬D)·P(¬D)
= 0.80 · 0.30 = 0.24 / 0.31 ≈ 0.77
──────────────────────────
0.80·0.30 + 0.10·0.70
So ≈ 0.77 — the same shape as the disease/test problem from the Bayes’ theorem lesson (and GATE DA 2026 Q57). The factorisation just made the joint mechanical to write down before plugging into Bayes.
Drag the prior down and watch the same posterior collapse when the disease is rare — base rates dominate Bayes-net inference too.
How GATE asks this
Usually a NAT: a small 3-or-4-node net with CPTs given, asking for a joint probability of a specific assignment (multiply down the factorisation), or a posterior of one variable given evidence on another (factorise the joint, then Bayes). MSQs ask which independence statements the DAG implies. GATE DA 2026 Q57 was the disease/positive-test posterior above (answer 0.77). GATE DA 2024 ran the same machinery on a slightly larger net.
The recipe never changes: write the joint as a product of CPTs, plug in the observed values, normalise.
Quick check
Quick check
Practice this in an interview
All questionsThe joint distribution P(X, Y) fully specifies two random variables together. Marginals P(X) and P(Y) are obtained by summing (or integrating) the joint over the other variable. Conditionals P(X|Y=y) are the joint sliced at a fixed y value, renormalized by the marginal P(Y=y).
Conditional probability P(A|B) is the probability of A given that B has already occurred, computed as P(A and B) / P(B). It narrows the sample space to B, whereas joint probability P(A and B) lives in the full, unrestricted space.
Each distribution has a natural generative story: Bernoulli is a single coin flip; Binomial sums Bernoullis; Poisson counts rare arrivals; Normal emerges from sums of many small effects; Exponential models waiting times between Poisson events; Uniform assigns equal probability across a range. Choosing correctly comes from matching that story to the data-generating process.
The law of total probability decomposes P(A) over a mutually exclusive, exhaustive partition of the sample space: P(A) = Σ P(A|Bᵢ)·P(Bᵢ). It is the engine behind the Bayes denominator and any calculation where you want an overall rate built from segment-level rates.