datarekha

Expectation, Variance & SD

Expectation is the long-run average of a random variable; variance and SD measure how far it spreads. The summary numbers every distribution and ML lesson leans on.

8 min read Intermediate GATE DA Lesson 9 of 122

What you'll learn

  • Expectation E[X] = Σ x·p(x) — the probability-weighted average
  • Linearity: E[aX + b] = aE[X] + b, always
  • Variance Var(X) = E[X²] − (E[X])², and Var(aX + b) = a²·Var(X)
  • Standard deviation SD = √Var, back in the units of X
  • Nonlinearity traps: E[g(X)] ≠ g(E[X]) in general, e.g. E[1/X] ≠ 1/E[X]

Before you start

A whole PMF table is more than anyone wants to carry around. Most of the time you only need two numbers: roughly where the variable sits, and roughly how much it swings. Those are the expectation (the average value if you repeated the experiment forever) and the variance (the average squared distance from that average). Take the square root of variance and you’re back in the original units — the standard deviation.

Pretty much every later lesson — binomial, Poisson, model error, the central limit theorem — is “what’s E[X] and Var(X) for this distribution?” GATE turns that into a NAT almost every year.

The three formulas

For a discrete random variable X with PMF p(x):

Expectation (mean)E[X] = ∑ x·p(x)probability-weighted averageVariance (spread)E[X²] − (E[X])²mean square minus square of meanStd deviationSD = √Var(X)spread in the units of X
Mean, spread, and spread-in-original-units — the three numbers every distribution reduces to.

Two algebra rules turn these into quick answers without rebuilding sums:

  • Linearity of expectationE[aX + b] = aE[X] + b. Scaling and shifting the variable scales and shifts its mean, exactly. This holds always, even when variables are dependent.
  • Variance under shift and scaleVar(aX + b) = a²·Var(X). A constant shift b moves every value equally, so it changes the mean but not the spread — b vanishes. A scale a stretches deviations, and since variance is squared distance it picks up .

The variance formula itself is worth saying in words: the mean of the squares minus the square of the mean, Var(X) = E[X²] − (E[X])². The two pieces are different computations — E[X²] weights by p(x); (E[X])² squares the single number E[X].

A handy fact for the geometric setting: if each independent trial succeeds with probability p, the expected number of trials until the first success is 1/p. For a fair coin, p = 1/2, so you expect 1/(1/2) = 2 tosses to see the first head.

How GATE asks this

Overwhelmingly a NAT: a small PMF (or a fair die / coin) is given and you compute E[X], Var(X), or SD to a few decimals. The reliable route is a two-row table — one row for x·p(x) summing to E[X], one for x²·p(x) summing to E[X²] — then Var = E[X²] − (E[X])². The occasional MCQ tests the identities instead: which of E[aX+b] = aE[X]+b, Var(X+c) = Var(X), E[X²] = (E[X])² are always true.

Worked example — a fair six-sided die

A fair die shows 1–6, each with probability 1/6. Find E[X], E[X²], Var(X), and SD.

E[X]  = (1+2+3+4+5+6)/6        = 21/6  = 3.5

E[X²] = (1+4+9+16+25+36)/6     = 91/6  ≈ 15.1667

Var(X) = E[X²] − (E[X])²
       = 91/6 − 3.5²
       = 15.1667 − 12.25
       = 2.9167          (exactly 35/12)

SD = √2.9167 ≈ 1.708

Note the order: square each face and average for E[X²] = 15.1667, then subtract the square of the mean 3.5² = 12.25. Subtracting first or squaring the wrong quantity is the usual slip. As a second mini-example, the expected number of fair-coin tosses to get the first head is 1/(1/2) = 2.

Quick check

Quick check

0/6
Q1A discrete RV X has PMF p(0)=0.2, p(1)=0.5, p(2)=0.3. Compute Var(X).numerical answer — type a number
Q2For a fair six-sided die, what is Var(X)? (3 decimals)numerical answer — type a number
Q3If Var(X) = 4, what is Var(3X + 7)?numerical answer — type a number
Q4Which identities are ALWAYS true for a random variable X and constants a, b? (select all that apply)select all that apply
Q5A biased coin lands heads with probability p = 0.25. What is the expected number of independent tosses until the first head?numerical answer — type a number
Q6X takes values 1 and 3, each with probability 0.5. Is E[1/X] equal to 1/E[X]?

Practice this in an interview

All questions
Define expected value and variance. What are their key properties?

Expected value is the probability-weighted average outcome of a random variable; variance measures average squared deviation from that mean. Both are linear/additive in specific ways — knowing these rules prevents algebraic mistakes under interview pressure.

What is the difference between variance and standard deviation, and why do we need both?

Variance is the average squared deviation from the mean; standard deviation is its square root and lives in the same units as the data. Variance is mathematically tractable — variances of independent variables add — while standard deviation is interpretable as a typical distance from the mean.

What is the difference between standard error and standard deviation?

Standard deviation measures the spread of individual observations around the population mean. Standard error measures the spread of sample means around the true mean — it equals the standard deviation divided by the square root of the sample size, so it shrinks as the sample grows while the standard deviation does not.

What makes the Normal distribution so central in statistics, and when does it fail?

The Normal distribution is justified by the Central Limit Theorem — averages of large i.i.d. samples converge to Normal regardless of the underlying distribution. It is fully characterized by mean and variance, enabling closed-form inference. It fails for heavy-tailed data, skewed outcomes, bounded quantities, and rare extreme events.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content