datarekha
Statistics & Probability Easy Asked at GoogleAsked at MetaAsked at Two Sigma

What is the difference between covariance and correlation, and when does each matter?

The short answer

Covariance measures the direction of the linear relationship between two variables and is expressed in the product of their units, making it scale-dependent and hard to interpret across different variable pairs. Correlation normalises covariance by both standard deviations to produce a dimensionless measure bounded between -1 and 1, enabling comparison across pairs.

How to think about it

State the formulas, explain why normalisation matters, then mention where covariance (not correlation) is the natural quantity — this surprises many interviewers and demonstrates depth.

Covariance

Cov(X, Y) = E[ (X - mu_X)(Y - mu_Y) ]

Sample estimate: Cov(X,Y) = (1/(n-1)) * sum( (x_i - x_bar)(y_i - y_bar) )

Sign indicates direction: positive means the variables move together; negative means they move in opposite directions. Magnitude depends on the scale of X and Y — covariance of height in meters vs centimetres differs by a factor of 100 for the same underlying relationship. This makes raw covariance useless for comparing relationships across different variable pairs.

Pearson Correlation

r = Cov(X, Y) / (SD(X) * SD(Y))

Correlation divides covariance by the product of the two standard deviations, yielding a dimensionless number in [-1, 1]. |r| = 1 means a perfect linear relationship; r = 0 means no linear association (not necessarily no association at all).

When covariance is the right quantity

  • Portfolio variance: Var(aX + bY) = a^2 Var(X) + b^2 Var(Y) + 2ab Cov(X,Y). The covariance matrix drives portfolio optimisation (Markowitz), not the correlation matrix, because you care about the absolute magnitude of co-movement in dollar terms.
  • Multivariate Gaussian: The distribution is parameterised by the covariance matrix Σ, not by correlations alone.
  • PCA: Principal components are eigenvectors of the covariance matrix (or correlation matrix if variables are standardised — these give different results).

Covariance matrix vs correlation matrix

The covariance matrix has variances on its diagonal. The correlation matrix has 1s on the diagonal. To convert: divide each off-diagonal entry by the product of the corresponding standard deviations. Standardising your data before PCA implicitly switches you from covariance-PCA to correlation-PCA — appropriate when variables are on very different scales.

Keep practising

All Statistics & Probability questions

Explore further

Skip to content