datarekha

Exact Inference: Variable Elimination

Want the exact posterior on a Bayes net? Multiply the CPTs together and sum out the variables you don't care about, one at a time. That's variable elimination.

8 min read Advanced GATE DA Lesson 106 of 122

What you'll learn

  • Variable elimination computes EXACT conditional probabilities on a Bayes net
  • The procedure: factor the joint into CPTs, sum out hidden variables one by one, normalise
  • How to do one elimination step on a small chain net
  • Classify inference methods as exact (VE, enumeration) vs approximate (rejection / likelihood-weighting / Gibbs)

Before you start

You’ve drawn the Bayes net and written out the CPTs. Now somebody asks the practical question: given a few observed values, what’s the probability of another one? Variable elimination is the routine that gives you the exact answer — no sampling, no approximation — by sweeping the unwanted variables out of the joint, one at a time.

It’s slower than sampling on giant networks, but precise. For exam-sized nets (2-4 nodes), it’s the right tool and one elimination step is usually all GATE asks you to perform. The same sum-product engine runs inside probabilistic programming libraries and diagnosis systems whenever an exact answer is affordable — so the hand-trace you practise here is a scaled-down version of what those tools do.

The procedure

To compute P(Query | Evidence):

  1. Write the joint as a product of CPT factors — one factor per node, from the Bayes-net factorisation.
  2. Fix the evidence variables to their observed values.
  3. Pick a hidden (non-query, non-evidence) variable and sum it out: replace every factor that contains it with one factor that’s the sum over that variable’s values.
  4. Repeat until only the query variable remains.
  5. Normalise so the surviving distribution sums to 1.

The answer is exact (up to floating-point error). That’s the headline property — VE is not a sampling method.

ABeliminateCDquery∑_B P(A) · P(B|A) · P(C|B) · P(D|C)
A→B→C→D. To find P(D), sum out A, B, and C — one at a time. B is being eliminated.

Worked example — eliminate one variable

A chain A → B → C. Given P(A=1) = 0.5, P(B=1 | A=1) = 0.7, P(B=1 | A=0) = 0.2, P(C=1 | B=1) = 0.9, P(C=1 | B=0) = 0.4. Find P(C=1) by summing out A and B.

Factorisation: P(A, B, C) = P(A) · P(B | A) · P(C | B). We need the marginal P(C=1) = Σ_A Σ_B P(A) · P(B | A) · P(C=1 | B).

Step 1 — eliminate A. Combine P(A) · P(B | A) into a factor over B:

P(B=1) = P(B=1 | A=1)·P(A=1) + P(B=1 | A=0)·P(A=0)
       = 0.7 · 0.5            + 0.2 · 0.5
       = 0.35                 + 0.10              =  0.45

P(B=0) = 1 − 0.45  =  0.55

Step 2 — eliminate B. Combine the new P(B) with P(C=1 | B):

P(C=1) = P(C=1 | B=1)·P(B=1) + P(C=1 | B=0)·P(B=0)
       = 0.9 · 0.45           + 0.4 · 0.55
       = 0.405                + 0.220             =  0.625

So P(C=1) = 0.625exactly, no sampling involved. Each elimination step is one weighted sum over the values of the variable being removed.

How GATE asks this

Two patterns. MSQ: which of the listed methods compute exact posteriors on a Bayes net? Variable elimination yes; enumeration of the joint yes; rejection / likelihood-weighting / Gibbs no — they’re sampling. NAT: perform one elimination step on a 3-node chain or v-structure and report the marginal. GATE DA 2025 ran an MSQ asking exactly this classification.

Method-classification cheat sheet:

MethodType
Variable eliminationExact
Enumeration / brute-force jointExact
Rejection samplingApproximate
Likelihood weightingApproximate
Gibbs sampling (MCMC)Approximate

Quick check

Quick check

0/6
Q1Chain A → B with P(A=1)=0.4, P(B=1 | A=1)=0.8, P(B=1 | A=0)=0.3. Compute P(B=1) by eliminating A. (3 decimals)numerical answer — type a number
Q2Continuing the worked chain A → B → C (P(A=1)=0.5, P(B=1|A=1)=0.7, P(B=1|A=0)=0.2, P(C=1|B=1)=0.9, P(C=1|B=0)=0.4), what is P(C=0)? (3 decimals)numerical answer — type a number
Q3Which Bayes-net inference methods produce EXACT posterior probabilities (up to floating-point error)? (select all that apply)select all that apply
Q4In a 4-node Bayes net with query Q and evidence E, what does ONE elimination step on hidden variable H accomplish?
Q5Which statements about variable elimination are TRUE? (select all that apply)select all that apply
Q6Net A → C ← B with P(A=1)=0.5, P(B=1)=0.5, P(C=1 | A=1, B=1)=0.9, P(C=1 | A=1, B=0)=0.6, P(C=1 | A=0, B=1)=0.6, P(C=1 | A=0, B=0)=0.1. Compute P(C=1) by summing out A and B. (3 decimals)numerical answer — type a number

Practice this in an interview

All questions
State the law of total probability and give a concrete example of when you'd apply it.

The law of total probability decomposes P(A) over a mutually exclusive, exhaustive partition of the sample space: P(A) = Σ P(A|Bᵢ)·P(Bᵢ). It is the engine behind the Bayes denominator and any calculation where you want an overall rate built from segment-level rates.

How does MLE differ from MAP estimation, and what is the frequentist vs Bayesian divide?

MLE maximises the likelihood of the data alone; MAP (Maximum A Posteriori) adds a prior over parameters and maximises the posterior, making it equivalent to regularised MLE. Frequentists treat parameters as fixed unknowns; Bayesians treat them as random variables with a prior distribution.

Walk me through Bayes' theorem with a disease-screening base-rate example.

Bayes' theorem updates a prior probability with new evidence: P(H|E) = P(E|H) P(H) / P(E). In disease testing, ignoring the low base rate (prior) makes a positive test look far more alarming than it really is — most positives are false positives when the disease is rare.

What is the Bayesian interpretation of Ridge regression, and what prior does it correspond to?

Ridge regression is equivalent to maximum a posteriori (MAP) estimation with a zero-mean Gaussian prior on the coefficients. The regularization strength λ corresponds to the ratio of the noise variance to the prior variance — stronger regularization means you believe coefficients are drawn from a tighter distribution around zero.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content