State the law of total probability and give a concrete example of when you'd apply it.

The law of total probability decomposes P(A) over a mutually exclusive, exhaustive partition of the sample space: P(A) = Σ P(A|Bᵢ)·P(Bᵢ). It is the engine behind the Bayes denominator and any calculation where you want an overall rate built from segment-level rates.

What is the zero-probability problem in Naive Bayes and how do you fix it?

If a feature value never appears with a given class in training, its conditional probability is zero, and since Naive Bayes multiplies probabilities, the whole posterior for that class becomes zero regardless of other evidence. The fix is Laplace (additive) smoothing, which adds a small count to every feature-class combination so no probability is ever exactly zero. This is essential for text where many words are unseen per class.

How does MLE differ from MAP estimation, and what is the frequentist vs Bayesian divide?

MLE maximises the likelihood of the data alone; MAP (Maximum A Posteriori) adds a prior over parameters and maximises the posterior, making it equivalent to regularised MLE. Frequentists treat parameters as fixed unknowns; Bayesians treat them as random variables with a prior distribution.

Walk me through Bayes' theorem with a disease-screening base-rate example.

Bayes' theorem updates a prior probability with new evidence: P(H|E) = P(E|H) P(H) / P(E). In disease testing, ignoring the low base rate (prior) makes a positive test look far more alarming than it really is — most positives are false positives when the disease is rare.

Exact Inference: Variable Elimination — GATE DA

What you'll learn

Variable elimination computes EXACT conditional probabilities on a Bayes net

The procedure: factor the joint into CPTs, sum out hidden variables one by one, normalise

How to do one elimination step on a small chain net

Classify inference methods as exact (VE, enumeration) vs approximate (rejection / likelihood-weighting / Gibbs)

Last lesson left the Bayes net storing the joint compactly, yet warned that querying it — “given the alarm, what is the chance of a burglary?” — means summing that joint over every variable you did not observe, a sum with exponentially many terms on a big net. Variable elimination is the routine that performs that sum exactly — no sampling, no approximation — by sweeping the unwanted variables out one at a time and, crucially, never expanding the whole joint in the first place.

It is slower than sampling on giant networks, but precise. For exam-sized nets of two to four nodes it is the right tool, and a single elimination step is usually all GATE asks you to perform. The same sum-product engine runs inside probabilistic-programming libraries and diagnosis systems whenever an exact answer is affordable — so the hand-trace you practise here is a scaled-down version of what those tools do.

The procedure

To compute P(Query | Evidence):

Write the joint as a product of CPT factors — one factor per node, from the Bayes-net factorisation.
Fix the evidence variables to their observed values.
Pick a hidden (non-query, non-evidence) variable and sum it out: replace every factor that contains it with one factor that is the sum over that variable’s values.
Repeat until only the query variable remains.
Normalise so the surviving distribution sums to 1.

The answer is exact (up to floating-point error). That is the headline property — VE is not a sampling method.

A→B→C→D. To find P(D), sum out A, B, and C — one at a time. B is being eliminated.

Worked example — eliminate one variable

A chain A → B → C. Given P(A=1) = 0.5, P(B=1 | A=1) = 0.7, P(B=1 | A=0) = 0.2, P(C=1 | B=1) = 0.9, P(C=1 | B=0) = 0.4. Find P(C=1) by summing out A and B.

The factorisation is P(A, B, C) = P(A) · P(B | A) · P(C | B), and we want P(C=1) = Σ_A Σ_B P(A) · P(B | A) · P(C=1 | B). Sum the variables out one at a time.

Step 1 — eliminate A. Collapse P(A) · P(B | A) into a single factor over B:

P(B=1) = P(B=1 | A=1)·P(A=1) + P(B=1 | A=0)·P(A=0)
       = 0.7 · 0.5            + 0.2 · 0.5
       = 0.35                 + 0.10              =  0.45

P(B=0) = 1 − 0.45  =  0.55

Step 2 — eliminate B. Combine that new P(B) with P(C=1 | B):

P(C=1) = P(C=1 | B=1)·P(B=1) + P(C=1 | B=0)·P(B=0)
       = 0.9 · 0.45           + 0.4 · 0.55
       = 0.405                + 0.220             =  0.625

So P(C=1) = 0.625 — exactly, no sampling involved, and above 0.5 just as the leaning CPTs suggested. Each elimination step is one weighted sum over the values of the variable being removed, and you never built the full three-variable joint.

How GATE asks this

Two patterns. MSQ: which of the listed methods compute exact posteriors on a Bayes net? Variable elimination yes; enumeration of the joint yes; rejection / likelihood-weighting / Gibbs no — they are sampling. NAT: perform one elimination step on a 3-node chain or v-structure and report the marginal. GATE DA 2025 ran an MSQ asking exactly this classification.

Method	Type
Variable elimination	Exact
Enumeration / brute-force joint	Exact
Rejection sampling	Approximate
Likelihood weighting	Approximate
Gibbs sampling (MCMC)	Approximate

In one breath

Variable elimination computes an exact posterior on a Bayes net by writing the joint as a product of CPT factors, fixing the evidence, then summing out each hidden variable one at a time (replacing every factor that mentions it with the sum-over-its-values), and finally normalising — so it never expands the full 2ⁿ joint, the answer is exact up to floating-point, and it sits firmly on the exact side of the ledger (with full enumeration) opposite the approximate sampling methods.

Practice

Quick check

0/6

Q1Recall — Which statements about variable elimination are TRUE? (select all that apply)select all that apply

Q2Recall — Which Bayes-net inference methods produce EXACT posterior probabilities (up to floating-point error)? (select all that apply)select all that apply

Q3Recall — In a 4-node Bayes net with query Q and evidence E, what does ONE elimination step on hidden variable H accomplish?

Q4Trace — Chain A → B with P(A=1)=0.4, P(B=1 | A=1)=0.8, P(B=1 | A=0)=0.3. Compute P(B=1) by eliminating A. (3 decimals)numerical answer — type a number

Q5Trace — Continuing the worked chain A → B → C (P(A=1)=0.5, P(B=1|A=1)=0.7, P(B=1|A=0)=0.2, P(C=1|B=1)=0.9, P(C=1|B=0)=0.4), what is P(C=0)? (3 decimals)numerical answer — type a number

Q6Apply — Net A → C ← B with P(A=1)=0.5, P(B=1)=0.5, P(C=1 | A=1, B=1)=0.9, P(C=1 | A=1, B=0)=0.6, P(C=1 | A=0, B=1)=0.6, P(C=1 | A=0, B=0)=0.1. Compute P(C=1) by summing out A and B. (3 decimals)numerical answer — type a number

A question to carry forward

Variable elimination is exact, and on a small net it is fast. But its cost has a hidden teeth. As you sum out variables on a large, tangled network, the intermediate factors can swell — combining a variable’s many neighbours into ever-wider tables — until the careful summing-out is no cheaper than the full joint it set out to avoid. On a dense net of fifty variables, “exact” can mean “will not finish this century.”

So when exactness becomes unaffordable, you strike a different bargain: give up the guarantee of the true answer in exchange for a good enough one, fast. Instead of computing the probability, you estimate it — by conjuring up thousands of random scenarios consistent with the net and simply counting how often the thing you care about happens. Here is the thread onward, and the chapter’s last step: how do you draw such samples from a Bayes net, what three classic recipes turn that counting into a posterior estimate — and what is the price you always pay for trading exact arithmetic for random draws?

Exact Inference: Variable Elimination

What you'll learn

Before you start

The procedure

Worked example — eliminate one variable

How GATE asks this

In one breath

Practice

Quick check

A question to carry forward

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further