Statistics & Probability Hard Asked at GoogleAsked at DeepMindAsked at MetaAsked at Stripe

How does MLE differ from MAP estimation, and what is the frequentist vs Bayesian divide?

For Data Scientist ML Engineer AI / LLM Engineer

The short answer

MLE maximises the likelihood of the data alone; MAP (Maximum A Posteriori) adds a prior over parameters and maximises the posterior, making it equivalent to regularised MLE. Frequentists treat parameters as fixed unknowns; Bayesians treat them as random variables with a prior distribution.

How to think about it

The frequentist/Bayesian divide is philosophical but has direct practical consequences for regularisation, uncertainty quantification, and what you can legally claim about a parameter.

MLE vs MAP side by side

	MLE	MAP
Objective	max P(data \| θ)	max P(θ \| data) ∝ max P(data \| θ) · P(θ)
Uses prior?	No	Yes
Regularisation	None built in	Prior acts as regulariser
Uncertainty	Point estimate	Still a point estimate

MAP with a Gaussian prior on θ is algebraically identical to L2-regularised (ridge) regression. MAP with a Laplace prior is identical to L1-regularised (lasso) regression.

Concrete example — coin with prior belief

Suppose you observe 2 heads in 3 flips and use a Beta(5, 5) prior (encoding prior belief that the coin is roughly fair).

MLE: θ̂ = 2/3 ≈ 0.667
MAP: θ̂ = (2 + 5 - 1) / (3 + 5 + 5 - 2) = 6/11 ≈ 0.545

The prior pulls the MAP estimate toward 0.5. With more data the prior’s influence shrinks and both estimates converge.

Frequentist vs Bayesian in practice

Frequentist view:

Parameters are fixed, unknown constants.
Probability = long-run frequency of events.
A 95% confidence interval means: if we repeated the experiment many times, 95% of such intervals would contain the true parameter.
Cannot say “there is a 95% probability the parameter is in this interval.”

Bayesian view:

Parameters are random variables; uncertainty is expressed as a probability distribution (the posterior).
A 95% credible interval means: given the data and prior, there is a 95% probability the parameter lies in this range.
Requires specifying a prior, which can be controversial but also encodes domain knowledge.

Full Bayesian vs MAP

MAP still returns a point estimate. Full Bayesian inference retains the entire posterior distribution, enabling uncertainty-aware predictions, better calibration, and principled model comparison via marginal likelihoods (evidence). The cost is computational: often requires MCMC or variational inference.

Learn it properly Bayes theorem