PGDDSA Study · Semester 1

Core Titles

Key headlines and terms for quick recall

Expectation $E[X] = \sum x p(x)$ or $\int x f(x) \, dx$
Linearity $E[aX + bY] = a E[X] + b E[Y]$
Variance $\text{Var}(X) = E[(X - \mu)^2] = E[X^2] - E[X]^2$
Covariance $\text{Cov}(X, Y) = E[XY] - E[X] E[Y]$
Conditional Expectation $E[X | Y]$
Law of Total Expectation $E[X] = E[E[X | Y]]$
Law of Total Variance $\text{Var}(X) = E[\text{Var}(X|Y)] + \text{Var}(E[X|Y])$

Basic Idea

What it is, why it matters, how it works

Expectation (mean)

Centre of mass of a distribution: $E[X] = \begin{cases} \sum_x x \, p(x) & \text{discrete} \\ \int x f(x) \, dx & \text{continuous} \end{cases}$

Law of the Unconscious Statistician (LOTUS). For $Y = g(X)$ : $E[g(X)] = \sum g(x) p(x)$ or $\int g(x) f(x) \, dx$ .

Linearity (always holds)

$E[aX + bY + c] = aE[X] + bE[Y] + c.$ No independence needed.

Variance

$\text{Var}(X) = E[(X - \mu)^2] = E[X^2] - (E[X])^2.$ Spread around the mean. Standard deviation $\sigma = \sqrt{\text{Var}(X)}$ .

Properties.

$\text{Var}(aX + b) = a^2 \text{Var}(X)$
$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2 \text{Cov}(X, Y)$
If $X \perp Y$ : $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$

Covariance

$\text{Cov}(X, Y) = E[(X - \mu_X)(Y - \mu_Y)] = E[XY] - E[X]E[Y].$ Symmetric, bilinear. Correlation $\rho = \text{Cov}(X,Y) / (\sigma_X \sigma_Y) \in [-1, 1]$ .

Conditional expectation

$E[X | Y = y] = \int x \, f_{X|Y}(x | y) \, dx.$ A function of $y$ . Viewed as a random variable $E[X | Y]$ .

Law of Total Expectation (Adam's law)

$E[X] = E\big[ E[X | Y] \big].$ Average the conditional means over $Y$ .

Law of Total Variance (Eve's law)

$\text{Var}(X) = E\big[ \text{Var}(X | Y) \big] + \text{Var}\big( E[X | Y] \big).$ Within-group variance + between-group variance.

Why this matters in Data Science

Mean and variance summarise distributions. Bias / variance trade-off lives here. Conditional expectation = optimal predictor under squared loss — the foundation of regression.

Mind Map

Visual structure of the concept

EXPECTATION & MOMENTS
├── E[X] center of mass
├── LOTUS: E[g(X)] without finding dist of g(X)
├── Linearity (always): E[aX + bY] = aE[X] + bE[Y]
├── Var(X) = E[X²] − (E[X])²
│   ├── Var(aX + b) = a² Var(X)
│   └── Var(X + Y) = Var(X) + Var(Y) + 2Cov
├── Cov(X, Y) = E[XY] − E[X]E[Y]
├── Conditional Expectation E[X|Y]
├── Total Expectation: E[X] = E[E[X|Y]]
└── Total Variance: Var(X) = E[Var(X|Y)] + Var(E[X|Y])

Exam Q&A

Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. State linearity of expectation. $E[aX + bY] = aE[X] + bE[Y]$ , regardless of independence.

Q2. Define variance. $\text{Var}(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2$ .

Q3. Define covariance. $\text{Cov}(X, Y) = E[XY] - E[X] E[Y]$ .

Q4. State the law of total expectation. $E[X] = E[E[X | Y]]$ .

Part B (20 marks)

Q. Derive the formula $\text{Var}(X) = E[X^2] - (E[X])^2$ . State and prove the linearity of expectation. State the laws of total expectation and total variance. Compute mean and variance of $X \sim \text{Binomial}(n, p)$ using linearity.

Variance identity. $\text{Var}(X) = E[(X - \mu)^2] = E[X^2 - 2\mu X + \mu^2] = E[X^2] - 2\mu E[X] + \mu^2.$ Since $E[X] = \mu$ : $\text{Var}(X) = E[X^2] - 2\mu^2 + \mu^2 = E[X^2] - \mu^2 = E[X^2] - (E[X])^2. \quad \blacksquare$

Linearity of expectation. Theorem. For RVs $X, Y$ and scalars $a, b$ : $E[aX + bY] = aE[X] + bE[Y]$ .

Proof (continuous). $E[aX + bY] = \iint (ax + by) f(x, y) \, dx \, dy = a \iint x f \, dx \, dy + b \iint y f \, dx \, dy = a E[X] + b E[Y]$ .

(Discrete case identical with sums.) No independence required. ∎

Total expectation. $E[X] = E[E[X | Y]]$ .

Total variance. $\text{Var}(X) = E[\text{Var}(X | Y)] + \text{Var}(E[X | Y])$ . (Within-group variance + between-group variance.)

Binomial mean and variance via linearity.

Write $X = X_1 + X_2 + \dots + X_n$ where $X_i \sim \text{Bernoulli}(p)$ are independent.

Mean. $E[X_i] = p$ . By linearity: $E[X] = \sum E[X_i] = np$ .

Variance. $\text{Var}(X_i) = p(1-p)$ . By independence: $\text{Var}(X) = \sum \text{Var}(X_i) = np(1-p)$ .

So $\text{Binomial}(n, p)$ has mean $np$ and variance $np(1-p)$ . ✓

Sanity check. If $p = 0$ or $p = 1$ , variance is 0 (deterministic). Variance is maximised at $p = 1/2$ . Intuitively: the most "random" Bernoulli is the fair coin.

Expectation: Mean, Variance, Covariance, Conditional Expectation