How does the Bernoulli distribution relate to the Binomial, and what are their parameters and moments?
A Bernoulli(p) trial is the atomic unit: a single experiment with success probability p. Binomial(n, p) is the sum of n independent, identically distributed Bernoulli(p) trials, counting total successes. Because Binomial is a sum of independent random variables, its mean and variance are n times those of a single Bernoulli.
How to think about it
Bernoulli and Binomial are the building blocks for all binary count modeling. Understanding their relationship makes it obvious why Binomial moments scale linearly with n.
Bernoulli(p)
A single binary trial with outcome 1 (success) or 0 (failure):
P(X = 1) = p P(X = 0) = 1 - p
E[X] = p Var(X) = p(1-p)
The variance is maximised at p = 0.5 and is zero when the outcome is certain (p = 0 or 1). This directly governs uncertainty in A/B test metrics measured as binary events.
Binomial(n, p)
If X₁, X₂, …, Xₙ are independent Bernoulli(p), then S = X₁ + ··· + Xₙ ~ Binomial(n, p):
P(S = k) = C(n,k) · p^k · (1-p)^(n-k) for k = 0, 1, …, n
E[S] = np (linearity of expectation)
Var(S) = np(1-p) (sum of independent variances)
The C(n,k) term counts the number of ways to arrange k successes in n trials.
Worked numeric example
A model predicts click probability p = 0.08. You serve 200 ads (n = 200).
E[S] = 200 × 0.08 = 16 expected clicks
Var(S) = 200 × 0.08 × 0.92 = 14.72
SD(S) = √14.72 ≈ 3.84
For large n, the normal approximation applies: clicks will be roughly N(16, 3.84²).
Shape of the PMF
When p = 0.5 the Binomial is symmetric. For small p, it is right-skewed; as np grows large the distribution approaches Normal (by CLT). The mode is at floor((n+1)p) or ceil((n+1)p) - 1.
Quick parameter guide
| Distribution | Parameter(s) | Mean | Variance |
|---|---|---|---|
| Bernoulli(p) | p ∈ (0,1) | p | p(1-p) |
| Binomial(n,p) | n ∈ ℕ, p ∈ (0,1) | np | np(1-p) |