How does MLE differ from MAP estimation, and what is the frequentist vs Bayesian divide?
MLE maximises the likelihood of the data alone; MAP (Maximum A Posteriori) adds a prior over parameters and maximises the posterior, making it equivalent to regularised MLE. Frequentists treat parameters as fixed unknowns; Bayesians treat them as random variables with a prior distribution.
How to think about it
The frequentist/Bayesian divide is philosophical but has direct practical consequences for regularisation, uncertainty quantification, and what you can legally claim about a parameter.
MLE vs MAP side by side
| MLE | MAP | |
|---|---|---|
| Objective | max P(data | θ) | max P(θ | data) ∝ max P(data | θ) · P(θ) |
| Uses prior? | No | Yes |
| Regularisation | None built in | Prior acts as regulariser |
| Uncertainty | Point estimate | Still a point estimate |
MAP with a Gaussian prior on θ is algebraically identical to L2-regularised (ridge) regression. MAP with a Laplace prior is identical to L1-regularised (lasso) regression.
Concrete example — coin with prior belief
Suppose you observe 2 heads in 3 flips and use a Beta(5, 5) prior (encoding prior belief that the coin is roughly fair).
- MLE: θ̂ = 2/3 ≈ 0.667
- MAP: θ̂ = (2 + 5 - 1) / (3 + 5 + 5 - 2) = 6/11 ≈ 0.545
The prior pulls the MAP estimate toward 0.5. With more data the prior’s influence shrinks and both estimates converge.
Frequentist vs Bayesian in practice
Frequentist view:
- Parameters are fixed, unknown constants.
- Probability = long-run frequency of events.
- A 95% confidence interval means: if we repeated the experiment many times, 95% of such intervals would contain the true parameter.
- Cannot say “there is a 95% probability the parameter is in this interval.”
Bayesian view:
- Parameters are random variables; uncertainty is expressed as a probability distribution (the posterior).
- A 95% credible interval means: given the data and prior, there is a 95% probability the parameter lies in this range.
- Requires specifying a prior, which can be controversial but also encodes domain knowledge.
Full Bayesian vs MAP
MAP still returns a point estimate. Full Bayesian inference retains the entire posterior distribution, enabling uncertainty-aware predictions, better calibration, and principled model comparison via marginal likelihoods (evidence). The cost is computational: often requires MCMC or variational inference.