Sample Space, Events & Axioms
Probability is built on three short rules. Get the axioms and inclusion-exclusion right and every later probability question rests on solid ground.
What you'll learn
- Sample space, outcomes, and events as subsets of the sample space
- The three axioms: non-negativity, P(S) = 1, and additivity for disjoint events
- Derived rules: P(complement) = 1 - P(A), P(empty) = 0, monotonicity
- Inclusion-exclusion: P(A union B) = P(A) + P(B) - P(A intersect B)
Before you start
Roll a die. The outcome can be 1, 2, 3, 4, 5, or 6. That whole list is the
sample space S; one number is an outcome; “even” means the subset
{2,4,6} — that’s an event. Probability is just a function that puts a
number on each event.
Here’s the surprising part. Three short rules — the axioms — already pin down every probability identity you’ll ever use. Get them right and the rest of the chapter is mostly bookkeeping. (They’re also the sanity checks every model’s output must pass: a classifier that reports class probabilities summing to 1.3, or a negative probability, has violated an axiom — and is simply broken.)
The three axioms
- Non-negativity:
P(E) ≥ 0for every eventE. - Normalisation:
P(S) = 1— the total probability of the whole sample space is 1. - Additivity: if
AandBare disjoint (no shared outcomes,A ∩ B = ∅), thenP(A ∪ B) = P(A) + P(B).
What the axioms force to be true
These small rules already settle the everyday facts you reach for constantly:
- Empty event:
P(∅) = 0. SinceSand∅are disjoint andS ∪ ∅ = S, Axiom 3 givesP(S) = P(S) + P(∅), soP(∅) = 0. - Complement rule:
P(Aᶜ) = 1 − P(A). An event and its complement are disjoint and together fillS, soP(A) + P(Aᶜ) = P(S) = 1. Rearrange. - Monotonicity: if
A ⊆ BthenP(A) ≤ P(B)— a bigger event can’t have smaller probability. (SplitBintoAand the disjoint leftoverB ∩ Aᶜ, both ≥ 0.) - Bounded: combining the above,
0 ≤ P(A) ≤ 1for every event.
Inclusion-exclusion — when events overlap
Axiom 3 only adds probabilities for disjoint events. When A and B overlap,
adding P(A) + P(B) counts the overlap A ∩ B twice. Subtract it once:
This is inclusion-exclusion: P(A ∪ B) = P(A) + P(B) − P(A ∩ B). When the events
are disjoint, P(A ∩ B) = 0 and it collapses straight back to Axiom 3 — so additivity
is just the no-overlap special case.
Drag the two circles around. Watch P(A), P(B), and the overlap P(A ∩ B)
update as you move them. Slide the circles apart and the overlap collapses to
zero — that’s when additivity (Axiom 3) applies cleanly without subtracting
anything.
How GATE asks this
Usually an MCQ asking which statements are valid consequences of the axioms (the
complement rule is true; “always add probabilities” is a trap), or a short NAT
that hands you P(A), P(B), and P(A ∩ B) and asks for P(A ∪ B) — a direct
inclusion-exclusion plug-in. The numbers are easy; the marks are lost by forgetting
to subtract the overlap.
Worked example
Given
P(A) = 0.5,P(B) = 0.4, andP(A ∩ B) = 0.2, findP(A ∪ B).
Apply inclusion-exclusion directly:
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
= 0.5 + 0.4 − 0.2
= 0.7
So P(A ∪ B) = 0.7. Contrast with the disjoint case: had the events been
mutually exclusive, P(A ∩ B) would be 0 and P(A ∪ B) = 0.5 + 0.4 = 0.9. But here
they overlap by 0.2 — ignoring that gives the wrong answer 0.9 instead of 0.7.
Quick check
Quick check
Practice this in an interview
All questionsThe law of total probability decomposes P(A) over a mutually exclusive, exhaustive partition of the sample space: P(A) = Σ P(A|Bᵢ)·P(Bᵢ). It is the engine behind the Bayes denominator and any calculation where you want an overall rate built from segment-level rates.
Each distribution has a natural generative story: Bernoulli is a single coin flip; Binomial sums Bernoullis; Poisson counts rare arrivals; Normal emerges from sums of many small effects; Exponential models waiting times between Poisson events; Uniform assigns equal probability across a range. Choosing correctly comes from matching that story to the data-generating process.
In a room of just 23 people, the probability that at least two share a birthday exceeds 50 %. The counterintuitive result comes from counting the large number of pairs rather than comparing each person to a fixed date — an example of how our intuition systematically underestimates collision probabilities.
Conditional probability P(A|B) is the probability of A given that B has already occurred, computed as P(A and B) / P(B). It narrows the sample space to B, whereas joint probability P(A and B) lives in the full, unrestricted space.