Covariance, Correlation & Total Expectation
Covariance measures how two variables move together; correlation rescales it into [−1, 1]. Independence forces both to zero — but zero covariance does not buy back independence.
What you'll learn
- Cov(X,Y) = E[XY] − E[X]·E[Y], and Cov(X,X) = Var(X)
- Correlation ρ = Cov(X,Y)/(σ_X·σ_Y) always lies in [−1, 1]
- Independent implies Cov = 0, but Cov = 0 does NOT imply independent
- Var(X+Y) = Var(X) + Var(Y) + 2·Cov(X,Y)
Before you start
Imagine plotting hours studied against exam score for a class. The dots tilt
upward — more study, higher score. Imagine plotting hours of TV against the
same score. The dots tilt down. Sometimes the dots are just a cloud with no
tilt at all. Covariance is the single number that captures that tilt;
correlation is the same number polished into a clean [−1, 1] scale so
“strong” and “weak” mean the same thing across any pair. It is the same number a
correlation heatmap reports when you screen features before fitting a model. GATE
almost always tests these straight from the definition, often on a pair of 0/1
indicators (variables that are 1 when some event happens and 0 otherwise) you can
enumerate by hand in a minute.
Covariance — co-movement around the means
The defining identity, the one form you compute from:
- Sign tells the story. Positive covariance: above-average
Xtends to pair with above-averageY. Negative: they move oppositely. Zero: no linear co-movement. - Covariance with itself is the variance:
Cov(X, X) = E[X²] − (E[X])² = Var(X). So variance is just the self-covariance — the same formula withY = X.
Correlation — covariance, rescaled to [−1, 1]
Covariance carries the units of X times Y, so its raw size is hard to read.
Dividing by both standard deviations strips the units and bounds it:
ρ(X, Y) = Cov(X, Y) / (σ_X · σ_Y), with −1 ≤ ρ ≤ 1.
ρ = +1 is a perfect increasing line, ρ = −1 a perfect decreasing line, ρ = 0
no linear relationship. The bound [−1, 1] is guaranteed — a correlation outside it
is an arithmetic error.
Variance of a sum carries a covariance term
Variances do not simply add unless the cross-term vanishes:
Var(X + Y) = Var(X) + Var(Y) + 2·Cov(X, Y).
Only when Cov(X, Y) = 0 (in particular, when X and Y are independent) does
this collapse to the familiar Var(X) + Var(Y).
How GATE asks this
A NAT that hands you a small experiment — frequently two 0/1 indicator
variables — and asks for Cov(X, Y). The drill never changes: get E[X], E[Y],
and E[XY], then subtract. Because the variables are indicators, XY = 1 only when
both indicators are 1, so E[XY] is just the probability of that joint event —
which is what makes these enumerable in seconds.
Worked example — a real GATE DA 2024 question
Toss two fair coins. Let
X = 1if both are heads (else 0), andY = 1if at least one is heads (else 0). FindCov(X, Y).
The four equally likely outcomes are HH, HT, TH, TT, each with probability 1/4.
Step 1 — E[X]. X = 1 only on HH:
E[X] = P(both heads) = 1/4.
Step 2 — E[Y]. Y = 1 on HH, HT, TH (everything except TT):
E[Y] = P(at least one head) = 3/4.
Step 3 — E[XY]. XY = 1 requires X = 1 and Y = 1. But X = 1 (both
heads) already forces Y = 1 (at least one head), so XY = 1 exactly on HH:
E[XY] = P(both heads) = 1/4.
Step 4 — apply the definition.
Cov(X, Y) = E[XY] − E[X]·E[Y]
= 1/4 − (1/4)·(3/4)
= 1/4 − 3/16
= 4/16 − 3/16
= 1/16
= 0.0625.
So Cov(X, Y) = 1/16 = 0.0625 — this is a verified GATE DA 2024 question. The
covariance is positive, which makes sense: X = 1 guarantees Y = 1, so the two
indicators move together.
Quick check
Quick check
Practice this in an interview
All questionsCovariance measures the direction of the linear relationship between two variables and is expressed in the product of their units, making it scale-dependent and hard to interpret across different variable pairs. Correlation normalises covariance by both standard deviations to produce a dimensionless measure bounded between -1 and 1, enabling comparison across pairs.
Expected value is the probability-weighted average outcome of a random variable; variance measures average squared deviation from that mean. Both are linear/additive in specific ways — knowing these rules prevents algebraic mistakes under interview pressure.
Correlation measures the strength of a linear relationship between two variables, but a shared cause, reverse causation, or coincidence can all produce correlation without any causal link. Treating correlation as causation leads to interventions that fail or cause harm.
Pearson correlation measures the strength of the linear relationship between two continuous variables and is sensitive to outliers and non-normality. Spearman correlation is Pearson applied to the ranks of the data, making it appropriate for monotonic (not necessarily linear) relationships, ordinal variables, and data with outliers or heavy-tailed distributions.