What is the difference between covariance and correlation, and when does each matter?

Covariance measures the direction of the linear relationship between two variables and is expressed in the product of their units, making it scale-dependent and hard to interpret across different variable pairs. Correlation normalises covariance by both standard deviations to produce a dimensionless measure bounded between -1 and 1, enabling comparison across pairs.

Define expected value and variance. What are their key properties?

Expected value is the probability-weighted average outcome of a random variable; variance measures average squared deviation from that mean. Both are linear/additive in specific ways — knowing these rules prevents algebraic mistakes under interview pressure.

What is the difference between correlation and causation, and why does the distinction matter?

Correlation measures the strength of a linear relationship between two variables, but a shared cause, reverse causation, or coincidence can all produce correlation without any causal link. Treating correlation as causation leads to interventions that fail or cause harm.

When would you use Spearman correlation instead of Pearson correlation?

Pearson correlation measures the strength of the linear relationship between two continuous variables and is sensitive to outliers and non-normality. Spearman correlation is Pearson applied to the ranks of the data, making it appropriate for monotonic (not necessarily linear) relationships, ordinal variables, and data with outliers or heavy-tailed distributions.

Covariance, Correlation & Total Expectation — GATE DA

Covariance, Correlation & Total Expectation

Plot study hours against marks and the dots tilt up; plot TV hours and they tilt down. That tilt is the single number the last lesson wanted — covariance. Correlation rescales it to [−1, 1], and the catch is that zero covariance does not buy back independence.

9 min read Advanced GATE DA Lesson 15 of 122

The last lesson left us wanting a single number for how much two variables move together. Picture it. Plot a class’s study hours against their exam marks and the dots tilt upward — more study, higher marks. Plot hours of television against the same marks and the dots tilt down. Sometimes the dots are just a shapeless cloud with no tilt at all. Covariance is the one number that captures that tilt, and correlation is the same number polished onto a clean [−1, 1] scale so that “strong” means the same thing whatever the units.

Covariance — co-movement about the means

The form you compute from is the mean of the product minus the product of the means:

Positive when X and Y tend to be large together; negative when one rises as the other falls.

The sign carries the story. Positive covariance: an above-average X tends to ride with an above-average Y. Negative: they pull in opposite directions. Zero: no linear co-movement. And a tidy special case — set Y = X and the formula becomes Cov(X, X) = E[X²] − (E[X])² = Var(X). Variance is just covariance with itself.

Correlation — the same number, rescaled

Covariance carries the units of X times Y, so its raw size is hard to read. Dividing by both standard deviations strips the units away and pins it down:

ρ(X, Y) = Cov(X, Y) / (σ_X · σ_Y),       with   −1 ≤ ρ ≤ 1

ρ = +1 is a perfect rising line, ρ = −1 a perfect falling one, ρ = 0 no linear relationship. The [−1, 1] bound is guaranteed — a correlation outside it is an arithmetic mistake.

Variance of a sum carries a covariance term

Variances do not simply add unless the cross-term vanishes:

Var(X + Y) = Var(X) + Var(Y) + 2·Cov(X, Y)

Only when Cov(X, Y) = 0 (in particular, when X and Y are independent) does this collapse to the familiar Var(X) + Var(Y).

A worked example — a real GATE DA 2024 question

Toss two fair coins. Let X = 1 if both are heads (else 0), and Y = 1 if at least one is heads (else 0). Find Cov(X, Y).

The four equally likely outcomes are HH, HT, TH, TT. Get the three pieces, then subtract:

E[X]  = P(both heads)        = 1/4
E[Y]  = P(at least one head) = 3/4
E[XY] = P(X=1 and Y=1)       = 1/4     (X=1 already forces Y=1, so XY=1 only on HH)

Cov(X, Y) = E[XY] − E[X]·E[Y]
          = 1/4 − (1/4)·(3/4)
          = 1/4 − 3/16
          = 1/16 = 0.0625

So Cov(X, Y) = 1/16 = 0.0625, a verified GATE DA 2024 answer. It is positive, as predicted: X = 1 guarantees Y = 1, so the two indicators rise together. For 0/1 indicators this is always the shortcut — E[XY] is just the chance that both indicators are 1.

A question to carry forward

Every mean and variance so far assumed we knew the whole population. But in practice you almost never do — you have a sample of a few hundred and must reason about the millions behind it. Here is the thread onward: if you average a sample, how close is that average to the true mean, and what shape does the averaging produce?

In one breath

Covariance Cov(X,Y) = E[XY] − E[X]E[Y] = the tilt: positive (together), negative (opposite), zero (no linear co-movement); Cov(X,X) = Var(X).
Correlation ρ = Cov/(σ_X·σ_Y) rescales it to [−1, 1] (outside is an arithmetic error).
Variance of a sum: Var(X+Y) = Var(X) + Var(Y) + 2·Cov(X,Y) — variances add only when Cov = 0.
Independent ⇒ Cov = 0, but Cov = 0 ⇏ independent (Y = X² on symmetric X: dependent yet uncorrelated).
The reliable NAT: two 0/1 indicators, where E[XY] = P(both = 1). 2024: two coins → Cov = 1/16 = 0.0625.

Practice

Quick check

0/6

Q1Recall: what does Cov(X, X) equal?

Q2Trace: two fair coins. X = 1 if both heads, Y = 1 if at least one head. Compute Cov(X,Y). (4 decimals)numerical answer — type a number

Q3Trace: Var(X) = 4, Var(Y) = 9, Cov(X,Y) = 2. Find Var(X + Y).numerical answer — type a number

Q4Apply: X is symmetric about 0 with Y = X², so Cov(X,Y) = 0. Are X and Y independent?

Q5Apply: which statements are TRUE? (select all that apply)select all that apply

Q6Create: using the two-coin X, Y above, both are Bernoulli so Var(X)=Var(Y)=3/16. With Cov=1/16, find the correlation ρ(X,Y). (2 decimals)

Covariance, Correlation & Total Expectation

What you'll learn

Before you start

Covariance — co-movement about the means

Correlation — the same number, rescaled

Variance of a sum carries a covariance term

A worked example — a real GATE DA 2024 question

A question to carry forward

In one breath

Practice

Quick check

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further