PGD01C01
Module 5 · Probability Theory

Probability Spaces, Conditional Probability and Independence

Core Titles
Key headlines and terms for quick recall
  • Sample space Ω\Omega, event AΩA \subseteq \Omega
  • Probability axioms (Kolmogorov): P(A)0P(A) \ge 0, P(Ω)=1P(\Omega) = 1, σ\sigma-additivity
  • Conditional probability P(AB)=P(AB)P(B)P(A | B) = \dfrac{P(A \cap B)}{P(B)}
  • Independence P(AB)=P(A)P(B)P(A \cap B) = P(A) P(B)
  • Law of Total Probability
  • Bayes' theorem P(AB)=P(BA)P(A)P(B)P(A | B) = \dfrac{P(B|A)P(A)}{P(B)}
Basic Idea
What it is, why it matters, how it works

Probability space

A probability model has three pieces:

  • Sample space Ω\Omega — all possible outcomes.
  • σ\sigma-algebra F\mathcal{F} — the events we measure (subsets of Ω\Omega).
  • Probability measure PP — function F[0,1]\mathcal{F} \to [0, 1] satisfying Kolmogorov's axioms.

Axioms.

  1. P(A)0P(A) \ge 0 for every event AA.
  2. P(Ω)=1P(\Omega) = 1.
  3. σ\sigma-additivity: for disjoint events A1,A2,A_1, A_2, \dots, P(Ai)=P(Ai)P(\bigcup A_i) = \sum P(A_i).

Useful consequences.

  • P(Ac)=1P(A)P(A^c) = 1 - P(A)
  • P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)
  • P()=0P(\emptyset) = 0

Conditional probability

If BB has occurred, what's the probability of AA? P(AB)=P(AB)P(B),P(B)>0.P(A | B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0.

Independence

AA and BB are independent iff P(AB)=P(A)P(B).P(A \cap B) = P(A) \, P(B). Equivalently P(AB)=P(A)P(A | B) = P(A) — knowing BB doesn't change the probability of AA.

Law of Total Probability

If {B1,,Bn}\{B_1, \dots, B_n\} partitions Ω\Omega: P(A)=i=1nP(ABi)P(Bi).P(A) = \sum_{i=1}^n P(A | B_i) \, P(B_i).

Bayes' theorem

P(AB)=P(BA)P(A)P(B)P(A | B) = \frac{P(B | A) P(A)}{P(B)} "Invert conditioning" — update prior P(A)P(A) into posterior P(AB)P(A|B) using likelihood P(BA)P(B|A).

Why this matters in Data Science

  • Naive Bayes classifier is direct Bayes.
  • Bayesian inference / probabilistic ML / Markov chains.
  • A/B testing rests on probability axioms.
  • Conditional independence is the spine of Bayesian networks.
Mind Map
Visual structure of the concept
PROBABILITY SPACES
├── (Ω, ℱ, P)
├── Axioms
│   ├── P(A) ≥ 0
│   ├── P(Ω) = 1
│   └── σ-additivity
├── P(A∪B) = P(A) + P(B) − P(A∩B)
├── Conditional P(A|B) = P(A∩B)/P(B)
├── Independence P(A∩B) = P(A)P(B)
├── Total Probability  Σ P(A|Bᵢ)P(Bᵢ)
└── Bayes  P(A|B) = P(B|A)P(A)/P(B)
Exam Q&A
Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. State Kolmogorov's axioms of probability. P(A)0P(A) \ge 0; P(Ω)=1P(\Omega) = 1; for disjoint events, P(Ai)=P(Ai)P(\bigcup A_i) = \sum P(A_i).

Q2. Define conditional probability. P(AB)=P(AB)P(B)P(A | B) = \dfrac{P(A \cap B)}{P(B)} for P(B)>0P(B) > 0.

Q3. Define independent events. AA and BB are independent iff P(AB)=P(A)P(B)P(A \cap B) = P(A) P(B).

Q4. State Bayes' theorem. P(AB)=P(BA)P(A)P(B)P(A | B) = \dfrac{P(B | A) P(A)}{P(B)}.


Part B (20 marks)

Q. State and prove Bayes' theorem from the definition of conditional probability and the law of total probability. A disease affects 1% of a population. A test detects it 99% of the time but has a 5% false-positive rate. If a person tests positive, what is the probability they actually have the disease?

Bayes' theorem. If {B1,,Bn}\{B_1, \dots, B_n\} is a partition of Ω\Omega with P(Bi)>0P(B_i) > 0 and AA is any event with P(A)>0P(A) > 0: P(BkA)=P(ABk)P(Bk)i=1nP(ABi)P(Bi).P(B_k | A) = \frac{P(A | B_k) P(B_k)}{\sum_{i=1}^n P(A | B_i) P(B_i)}.

Proof. From the definition of conditional probability: P(BkA)=P(ABk)P(Bk)=P(BkA)P(A).P(B_k \cap A) = P(A | B_k) P(B_k) = P(B_k | A) P(A).

So P(BkA)=P(ABk)P(Bk)P(A)P(B_k | A) = \dfrac{P(A | B_k) P(B_k)}{P(A)}.

By the law of total probability, P(A)=iP(ABi)P(Bi)P(A) = \sum_i P(A | B_i) P(B_i).

Substituting gives Bayes' formula. ∎

Disease example. Let DD = event "has disease", TT = event "tests positive".

Given.

  • P(D)=0.01,  P(Dc)=0.99P(D) = 0.01, \; P(D^c) = 0.99
  • Sensitivity: P(TD)=0.99P(T | D) = 0.99
  • False positive: P(TDc)=0.05P(T | D^c) = 0.05

Total probability. P(T)=P(TD)P(D)+P(TDc)P(Dc)=0.990.01+0.050.99=0.0099+0.0495=0.0594P(T) = P(T|D)P(D) + P(T|D^c)P(D^c) = 0.99 \cdot 0.01 + 0.05 \cdot 0.99 = 0.0099 + 0.0495 = 0.0594.

Bayes. P(DT)=P(TD)P(D)P(T)=0.00990.05940.1667    (16.7%).P(D | T) = \frac{P(T | D) P(D)}{P(T)} = \frac{0.0099}{0.0594} \approx 0.1667 \;\; (\approx 16.7\%).

Interpretation. Even with a 99%-accurate test, a positive result means only about 1 in 6 people truly have the disease — because the disease is rare, false positives dominate. This is the base-rate fallacy and the classic motivation for Bayesian reasoning.