What are the main sampling methods and how can sampling introduce bias?

The main probability sampling methods are simple random sampling, stratified sampling, cluster sampling, and systematic sampling. Bias enters when some units have a zero or systematically different probability of selection — as in convenience sampling, survivorship bias, or non-response bias — making the sample unrepresentative of the target population regardless of size.

What is bootstrapping, and when should you use resampling methods?

Bootstrapping estimates the sampling distribution of a statistic by repeatedly drawing samples with replacement from the observed data and computing the statistic on each resample. It works when the analytic sampling distribution is unknown, intractable, or the sample size is too small for asymptotic approximations to hold.

How do you choose between batch and real-time inference for a model?

Decide based on how fresh the prediction must be versus the cost and complexity of serving live. Use batch when results are needed every few hours or days, like daily churn lists, because it is cheap, simple, and can use spot or scheduled compute. Use real-time when a late or stale decision causes immediate loss, like fraud or ad auctions needing sub-100ms responses, accepting higher cost and complexity. Most production systems are hybrid: precompute heavy signals offline and do lightweight re-ranking online.

When should you use grid search vs random search vs Bayesian optimisation for hyperparameter tuning?

Grid search exhaustively tries every combination in a predefined grid, which is only practical for 1–2 hyperparameters. Random search samples combinations uniformly at random and finds good values faster per compute budget, especially when only a few hyperparameters actually matter. Bayesian optimisation fits a surrogate model of the objective and proposes the next trial intelligently, giving the best sample efficiency for expensive evaluations.

Approximate Inference: Sampling — GATE DA

What you'll learn

Why sampling — exact inference (VE / enumeration) gets too expensive on large networks

Rejection sampling: draw full samples, discard those that don't match the evidence

Likelihood weighting: fix the evidence, weight each sample by the product of evidence CPT entries

Gibbs sampling: an MCMC method that resamples one variable at a time given the others

All three sampling methods are APPROXIMATE; only VE / enumeration give exact answers

Last lesson left variable elimination exact but fragile — on a big tangled network its intermediate factors swell until “exact” means “never finishes.” When that happens you strike a different bargain: give up the guarantee of the true answer for a good enough one, fast. Stop computing and start sampling — conjure up a pile of full assignments drawn from the net’s own distribution, count what fraction match your query, and report that fraction as the answer.

The three classic recipes — rejection sampling, likelihood weighting, and Gibbs sampling — all do this, with different trade-offs. The headline you must carry into GATE: all three are approximate. They edge closer to the true posterior as you draw more samples, but for any finite sample size they are estimates with wobble. That very trade — accept approximation to stay tractable — is why modern Bayesian ML leans on sampling: MCMC and its relatives are what tools like Stan and PyMC run to fit models far too large for exact inference.

The three methods, at a glance

Rejection sampling. Draw a full sample from the prior — sample each node in topological order from its CPT. If the sample’s evidence values match the observed ones, keep it; otherwise throw it away, and estimate the posterior from the survivors. Simple, but wasteful: when the evidence is rare, almost every sample is discarded.

Likelihood weighting. Fix the evidence variables to their observed values (do not sample them), sample the rest from their CPTs, and weight each sample by the product of the evidence-node CPT entries: w = ∏ᵢ P(Eᵢ = eᵢ | Parents(Eᵢ)). No sample is wasted, but the variance grows when you condition on many evidence nodes.

Gibbs sampling (MCMC). Start anywhere consistent with the evidence, then repeatedly pick a non-evidence variable and resample it given the current values of all the others (its Markov blanket). The chain converges to the true posterior in the limit. It handles rare evidence well, but needs a burn-in period first.

Rejection throws most samples away when evidence is rare. Likelihood weighting and Gibbs use every sample.

Worked example — one likelihood-weighting weight

A 3-node net A → B → E. CPTs: P(A=1) = 0.5, P(B=1 | A=1) = 0.6, P(B=1 | A=0) = 0.2, P(E=T | B=1) = 0.9, P(E=T | B=0) = 0.3. The evidence is E = T. We sample A = T from P(A), then B = F from P(B | A=T). What is the weight of this sample?

In likelihood weighting the evidence is fixed — we do not sample it. The weight is the product of the CPT entries for each evidence node, evaluated at its observed value given the parents this sample chose. Here only E is evidence, with parent B, and the sample set B = F, so:

w = P(E = T | B = F) = 0.3

That is the whole calculation — one factor per evidence node, and the sampled value of A never enters it. The sample {A=T, B=F, E=T} is recorded with weight 0.3, contributing 0.3 to the tally for any query involving A=T, B=F. Posterior estimates then use the weighted counts:

P(Q = q | E = e)  ≈   Σ samples where Q=q  w_sample
                      ──────────────────────────────
                          Σ all samples  w_sample

This “weight = product of evidence-CPT entries” pattern is the GATE DA 2024 Q32 shape.

How GATE asks this

Two recurring patterns. MCQ: a list of methods to classify as exact or approximate — variable elimination is the only exact one; rejection, likelihood weighting, and Gibbs are approximate. MSQ: properties of the sampling methods — rejection wastes samples when evidence is rare; likelihood weighting weights by the evidence-CPT product and wastes none; Gibbs resamples one variable at a time. GATE DA 2025 and 2026 both ran this MCQ.

In one breath

When exact inference is too costly, sampling estimates a Bayes-net posterior by drawing many assignments and counting: rejection sampling samples from the prior and discards mismatches (wasteful on rare evidence), likelihood weighting fixes the evidence and weights each kept sample by ∏ P(eᵢ | Parents(Eᵢ)) (no waste, variance rises with more evidence), and Gibbs sampling is an MCMC method that resamples one non-evidence variable at a time given the rest (needs burn-in) — and all three are approximate, converging to the truth only in the limit, unlike the exact variable elimination.

Practice

Quick check

0/6

Q1Recall — Which statements about likelihood weighting are TRUE? (select all that apply)select all that apply

Q2Recall — Which statements about rejection sampling are TRUE? (select all that apply)select all that apply

Q3Recall — Which statements about Gibbs sampling on a Bayes net are TRUE? (select all that apply)select all that apply

Q4Trace — In a net A → B → E with P(E=T | B=1)=0.9 and P(E=T | B=0)=0.3, evidence E=T. A likelihood-weighting sample produces A=F, B=T. What weight does this sample receive? (2 decimals)numerical answer — type a number

Q5Apply — Pick the single inference method that gives an EXACT posterior on a Bayes net (up to floating-point error).

Q6Apply — Which of the following Bayes-net inference methods is NOT approximate (i.e., is exact)?

A question to carry forward

That closes the Artificial Intelligence chapter — and with it the deep technical spine of the whole syllabus. Look at the ground covered: probability, linear algebra, calculus, programming and data structures, databases, the full sweep of machine learning, and now search, logic, and probabilistic reasoning. That is the 85-mark core, and you have walked every inch of it.

But the GATE paper carries 15 more marks that test none of this machinery — not one Bayes net, not a single eigenvalue. They test plain reading, sound arithmetic, and everyday reasoning: the General Aptitude section every candidate sits, regardless of stream. And here is the quiet truth about those marks — they are the cheapest on the entire paper to win, and the cheapest to throw away. A topper and a borderline candidate often differ by exactly these fifteen. Here is the thread onward into the final chapters: how is that section built, which of its pieces give the most marks per hour of preparation, and how do you stop treating it as an afterthought?

Approximate Inference: Sampling

What you'll learn

Before you start

The three methods, at a glance

Worked example — one likelihood-weighting weight

How GATE asks this

In one breath

Practice

Quick check

A question to carry forward

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further