PGDDSA Study · Semester 1

Core Titles

Key headlines and terms for quick recall

Random variable (RV) $X : \Omega \to \mathbb{R}$
Discrete vs Continuous RV
Probability mass function (PMF) $p(x) = P(X = x)$
Probability density function (PDF) $f(x)$
Cumulative distribution function (CDF) $F(x) = P(X \le x)$
Standard distributions: Bernoulli, Binomial, Poisson, Uniform, Exponential, Normal

Basic Idea

What it is, why it matters, how it works

Random variable

An RV is a function $X: \Omega \to \mathbb{R}$ that assigns a real number to each outcome.

Discrete — takes countably many values (coin flips, counts).
Continuous — takes values in an interval (time, measurements).

Discrete RV — PMF

$p(x) = P(X = x), \quad \sum_x p(x) = 1.$

Examples.

Bernoulli( $p$ ): $X \in \{0, 1\}$ ; $P(X=1) = p$ .
Binomial( $n, p$ ): number of successes in $n$ Bernoulli trials. $P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}$ .
Poisson( $\lambda$ ): $P(X=k) = \dfrac{e^{-\lambda} \lambda^k}{k!}$ — rare event counts.

Continuous RV — PDF

$f(x) \ge 0$ , $\int_{-\infty}^\infty f(x) \, dx = 1$ . Probabilities come from integrating: $P(a \le X \le b) = \int_a^b f(x) \, dx.$ Important: $P(X = c) = 0$ for any single point.

Examples.

Uniform $(a,b)$ : $f(x) = \dfrac{1}{b-a}$ on $[a, b]$ .
Exponential( $\lambda$ ): $f(x) = \lambda e^{-\lambda x}, x \ge 0$ — waiting times.
Normal( $\mu, \sigma^2$ ): $f(x) = \dfrac{1}{\sqrt{2\pi}\sigma} \exp\!\left(-\dfrac{(x-\mu)^2}{2\sigma^2}\right)$ .

CDF

$F(x) = P(X \le x).$ Defined for both discrete and continuous. Non-decreasing, $\lim_{x \to -\infty} F = 0$ , $\lim_{x \to \infty} F = 1$ .

For continuous $X$ : $F'(x) = f(x)$ .

Why this matters in Data Science

Every dataset is modeled by RVs and their distributions. Choosing the right distribution drives generative models, MLE, hypothesis testing.

Mind Map

Visual structure of the concept

RANDOM VARIABLES
├── X : Ω → ℝ
├── DISCRETE
│   ├── PMF p(x) = P(X=x)
│   ├── Σ p(x) = 1
│   └── Bernoulli, Binomial, Poisson, Geometric
├── CONTINUOUS
│   ├── PDF f(x) ≥ 0, ∫f = 1
│   ├── P(a≤X≤b) = ∫ₐᵇ f dx
│   └── Uniform, Exponential, Normal
└── CDF F(x) = P(X ≤ x)
    ├── Non-decreasing
    ├── Limits 0 and 1
    └── Continuous: F'(x) = f(x)

Exam Q&A

Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. Define random variable. A measurable function $X : \Omega \to \mathbb{R}$ that assigns a real number to each outcome.

Q2. State the PMF of $\text{Binomial}(n, p)$ . $P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \; k = 0, 1, \dots, n$ .

Q3. State the PDF of $\text{Exponential}(\lambda)$ . $f(x) = \lambda e^{-\lambda x}, \; x \ge 0$ .

Q4. Define CDF and state two properties. $F(x) = P(X \le x)$ . It is non-decreasing and right-continuous, with $\lim_{-\infty} F = 0$ and $\lim_{\infty} F = 1$ .

Part B (20 marks)

Q. Differentiate discrete and continuous random variables. Discuss Bernoulli, Binomial, Poisson, Uniform, Exponential and Normal distributions with their PMF/PDFs, means and variances. Verify that the binomial mean is $np$ .

Discrete vs continuous.

Discrete	Continuous
Countable values	Uncountable, interval
PMF $p(x) = P(X=x)$	PDF $f(x) \ge 0$ , $\int f = 1$
$P(X=c)$ can be positive	$P(X=c) = 0$
Sum: $\sum_x p(x) = 1$	Integral: $\int f = 1$
CDF is step function	CDF is continuous

Standard distributions.

Name	PMF / PDF	Mean	Variance	Use
Bernoulli( $p$ )	$P(1)=p, P(0)=1-p$	$p$	$p(1-p)$	single trial
Binomial( $n, p$ )	$\binom{n}{k}p^k(1-p)^{n-k}$	$np$	$np(1-p)$	# successes in $n$ trials
Poisson( $\lambda$ )	$e^{-\lambda}\lambda^k / k!$	$\lambda$	$\lambda$	rare events
Uniform $(a, b)$	$\dfrac{1}{b-a}$ on $[a,b]$	$\dfrac{a+b}{2}$	$\dfrac{(b-a)^2}{12}$	equal-likelihood
Exponential( $\lambda$ )	$\lambda e^{-\lambda x}$	$1/\lambda$	$1/\lambda^2$	waiting time
Normal( $\mu, \sigma^2$ )	$\dfrac{1}{\sqrt{2\pi}\sigma}e^{-(x-\mu)^2/(2\sigma^2)}$	$\mu$	$\sigma^2$	natural variation

Derivation of binomial mean.

A binomial RV $X = \sum_{i=1}^n X_i$ where each $X_i \sim \text{Bernoulli}(p)$ is independent.

$E[X_i] = 0 \cdot (1-p) + 1 \cdot p = p$ .

By linearity (no independence needed for the mean): $E[X] = E\left[\sum X_i\right] = \sum E[X_i] = np.$

Direct calculation also works: $E[X] = \sum_{k=0}^n k \binom{n}{k} p^k (1-p)^{n-k} = np \sum_{k=1}^n \binom{n-1}{k-1} p^{k-1} (1-p)^{n-k} = np \cdot 1 = np.$

Discrete and Continuous Random Variables