PGD24110101C

November 2024 — Solved

Mathematical and Statistical Foundation of Data Science
PGD01C01
2 hours 30 minutes
50 marks
Solved

End-Semester Examination, First Semester PG Diploma in Data Science and Analytics (2024 Admission).

Part A — Short Essay

Answer any five questions. Each question carries 2 marks. (5 × 2 = 10 Marks)


Q1. What does the Rank and Nullity theorem define for a matrix?

For an m×nm \times n matrix AA representing a linear map A:RnRmA : \mathbb{R}^n \to \mathbb{R}^m, the rank-nullity theorem states:

rank(A)+nullity(A)=n\text{rank}(A) + \text{nullity}(A) = n

where rank(A)\text{rank}(A) is the dimension of the column space and nullity(A)=dimN(A)\text{nullity}(A) = \dim N(A) is the dimension of the null space {x:Ax=0}\{x : Ax = 0\}. It says the input dimension is partitioned into "dimensions preserved" + "dimensions collapsed to zero."


Q2. What are denumerable sets?

A set AA is denumerable (countably infinite) if there exists a bijection f:NAf : \mathbb{N} \to A, i.e., its elements can be listed as a1,a2,a3,a_1, a_2, a_3, \dots The sets N,Z\mathbb{N}, \mathbb{Z} and Q\mathbb{Q} are all denumerable, while R\mathbb{R} is uncountable (Cantor's diagonal argument).


Q3. Find the symmetric equation of the line through A(1,2,3)A(1, 2, 3) and B(4,0,1)B(4, 0, -1).

Direction vector: d=BA=(3,2,4)\vec{d} = B - A = (3, -2, -4).

Using point AA: x13=y22=z34.\boxed{\dfrac{x - 1}{3} = \dfrac{y - 2}{-2} = \dfrac{z - 3}{-4}}.


Q4. What defines an Eulerian circuit in graph theory?

An Eulerian circuit is a closed walk in a connected graph that traverses every edge exactly once and returns to the starting vertex. By Euler's theorem, a connected graph has an Eulerian circuit if and only if every vertex has even degree.


Q5. How many ways can a committee of 3 men and 2 women be selected from 8 men and 6 women?

Choose 3 of 8 men and 2 of 6 women independently:

(83)(62)=5615=840 ways.\binom{8}{3} \cdot \binom{6}{2} = 56 \cdot 15 = \boxed{840} \text{ ways.}


Q6. What is meant by Independent Events? How can you determine if two events AA and BB are independent?

Two events AA and BB are independent if the occurrence of one does not affect the probability of the other, i.e.,

P(AB)=P(A)P(B).P(A \cap B) = P(A) \cdot P(B).

To test independence: compute P(A)P(A), P(B)P(B) and P(AB)P(A \cap B) from the data and check whether the product equals the joint probability. Equivalently, check P(AB)=P(A)P(A \mid B) = P(A).


Q7. Define a Tree.

A tree is a connected, acyclic, undirected graph. Equivalent characterisations on nn vertices: it has exactly n1n - 1 edges, there is a unique path between any two vertices, removing any edge disconnects it, and adding any edge creates a cycle.


Part B — Long Essay

Answer any two questions. Each question carries 20 marks. (2 × 20 = 40 Marks)


Q8.

(a) What are the different types of random variables that can be generated? Explain with their PDF or PMF. (10 marks)

A random variable (RV) XX is a function X:ΩRX : \Omega \to \mathbb{R} that assigns a real number to each outcome of a random experiment. RVs fall into two main types — discrete and continuous — each with several standard families.

1. Discrete Random Variables. Take countably many values; described by a Probability Mass Function (PMF) p(x)=P(X=x)p(x) = P(X = x) with xp(x)=1\sum_x p(x) = 1.

DistributionPMFMeanVarianceTypical use
Bernoulli(p)(p)P(X=1)=p,  P(X=0)=1pP(X=1) = p, \; P(X=0) = 1 - pppp(1p)p(1-p)single trial / yes-no
Binomial(n,p)(n, p)(nk)pk(1p)nk\binom{n}{k} p^k (1-p)^{n-k}npnpnp(1p)np(1-p)# successes in nn trials
Poisson(λ)(\lambda)eλλk/k!e^{-\lambda}\lambda^k / k!λ\lambdaλ\lambdarare event counts
Geometric(p)(p)(1p)k1p(1-p)^{k-1} p1/p1/p(1p)/p2(1-p)/p^2trials until first success

2. Continuous Random Variables. Take values in an interval of R\mathbb{R}; described by a Probability Density Function (PDF) f(x)0f(x) \ge 0 with f(x)dx=1\int_{-\infty}^\infty f(x)\,dx = 1. Note P(X=c)=0P(X = c) = 0 for any single point; probabilities come from intervals: P(aXb)=abf(x)dxP(a \le X \le b) = \int_a^b f(x)\,dx.

DistributionPDFMeanVarianceTypical use
Uniform(a,b)(a, b)1ba\dfrac{1}{b - a} on [a,b][a, b]a+b2\dfrac{a+b}{2}(ba)212\dfrac{(b-a)^2}{12}equal-likelihood model
Exponential(λ)(\lambda)λeλx,  x0\lambda e^{-\lambda x}, \; x \ge 01/λ1/\lambda1/λ21/\lambda^2waiting times, lifetimes
Normal(μ,σ2)(\mu, \sigma^2)1σ2πe(xμ)2/(2σ2)\dfrac{1}{\sigma\sqrt{2\pi}} e^{-(x-\mu)^2 / (2\sigma^2)}μ\muσ2\sigma^2natural variation, errors
Gamma(α,β)(\alpha, \beta)βαΓ(α)xα1eβx\dfrac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}α/β\alpha / \betaα/β2\alpha / \beta^2sum of exponentials

Generation. Discrete RVs are typically generated by sampling from a uniform UUniform(0,1)U \sim \text{Uniform}(0, 1) and bucketing UU against the cumulative PMF. Continuous RVs are generated by inverse-transform sampling X=F1(U)X = F^{-1}(U) when the CDF is invertible (e.g., Exponential(λ)(\lambda) via X=ln(1U)/λX = -\ln(1 - U)/\lambda), the Box–Muller transform for normals, or rejection sampling for complex shapes.

(b) How does rejection sampling facilitate sampling from a complex probability distribution? Explain the algorithm. (10 marks)

Motivation. When the target density f(x)f(x) is hard to invert (so inverse-transform sampling cannot be used) but can be evaluated point-wise, rejection sampling lets us sample from ff using an easier proposal distribution.

Setup.

  • Target density f(x)f(x) — what we want to sample from.
  • Proposal density g(x)g(x) — easy to sample (e.g., uniform, normal).
  • Envelope constant MM such that Mg(x)f(x)M \cdot g(x) \ge f(x) for all xx.

Algorithm.

loop forever:
    1. Draw a candidate X from g
    2. Draw U from Uniform(0, 1)
    3. If U ≤ f(X) / (M · g(X)):  return X     (accept)
       else:                       continue    (reject and try again)

Why it works.

P(accept and Xx)=xg(y)f(y)Mg(y)dy=1Mxf(y)dy=F(x)MP(\text{accept and } X \le x) = \int_{-\infty}^{x} g(y) \cdot \dfrac{f(y)}{M\, g(y)} \, dy = \dfrac{1}{M} \int_{-\infty}^{x} f(y)\,dy = \dfrac{F(x)}{M}.

Hence P(accept)=1/MP(\text{accept}) = 1/M and the conditional CDF of the accepted samples is F(x)F(x) — so accepted samples follow ff. ∎

Efficiency and trade-off.

  • Acceptance rate is 1/M1/M. A tight envelope (gg hugging ff) gives small MM and high acceptance.
  • A loose envelope wastes many candidate draws.
  • gg should be chosen in the same support as ff and ideally with a similar shape.

Example — sampling from fex2/2f \propto e^{-x^2/2} (standard normal half) on [0,)[0, \infty) using exponential proposal g(x)=exg(x) = e^{-x}: the envelope constant is M=2e/π1.32M = \sqrt{2e/\pi} \approx 1.32, giving an acceptance rate of ~76%.

Applications: Bayesian inference (sampling posteriors), Monte Carlo integration, generative simulation, and as a building block within MCMC methods like Metropolis–Hastings.


Q9.

(a) State the Principle of Inclusion-Exclusion for three sets. Class survey problem. (10 marks)

Statement. For any three finite sets A,B,CA, B, C:

ABC=A+B+CABACBC+ABC|A \cup B \cup C| = |A| + |B| + |C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C|

Class survey problem.

Let M,P,CM, P, C be the sets of students who like Mathematics, Physics, Chemistry respectively.

Given: M=60|M| = 60, P=45|P| = 45, C=50|C| = 50, MP=25|M \cap P| = 25, MC=20|M \cap C| = 20, PC=15|P \cap C| = 15, MPC=10|M \cap P \cap C| = 10. Total students = 120.

Students who like at least one subject: MPC=60+45+50252015+10=105.|M \cup P \cup C| = 60 + 45 + 50 - 25 - 20 - 15 + 10 = \boxed{105}.

So 105 students like at least one subject (and 120105=15120 - 105 = 15 like none of the three).

Students who like only Mathematics = MMPMC+MPC|M| - |M \cap P| - |M \cap C| + |M \cap P \cap C| =602520+10=25.= 60 - 25 - 20 + 10 = \boxed{25}.

Venn diagram (regions and counts).

RegionCount
Only Math25
Only Physics45 − 25 − 15 + 10 = 15
Only Chemistry50 − 20 − 15 + 10 = 25
Math ∩ Physics only (not Chem)25 − 10 = 15
Math ∩ Chemistry only20 − 10 = 10
Physics ∩ Chemistry only15 − 10 = 5
All three10
None15
Total120

Sketch: three overlapping circles labeled M, P, C; centre region (all three) = 10; pairwise outer rings 15, 10, 5; outer rings 25, 15, 25; outside the three circles = 15.

(b) Illustrate different graph terminologies and traversals. (10 marks)

Graph terminologies.

  • Vertex (node) and edge — fundamental units. G=(V,E)G = (V, E).
  • Degree deg(v)\deg(v) — number of edges incident to vv. Handshaking lemma: deg(v)=2E\sum \deg(v) = 2|E|.
  • Simple graph — no self-loops, no parallel edges.
  • Multigraph / Pseudograph — allows parallel edges (and loops).
  • Directed graph (digraph) — edges with direction; vertices have in-degree and out-degree.
  • Weighted graph — edges carry weights (costs, distances).
  • Complete graph KnK_n — every pair joined, (n2)\binom{n}{2} edges.
  • Bipartite graph — vertex set splits as V1V2V_1 \cup V_2 with edges only between V1V_1 and V2V_2.
  • Walk — sequence of vertices/edges (repetition allowed).
  • Trail — walk with no repeated edge.
  • Path — walk with no repeated vertex.
  • Cycle/Circuit — closed path / closed trail.
  • Connected — every pair of vertices joined by a path.
  • Tree — connected, acyclic graph with n1n - 1 edges.
  • Subgraph, Spanning subgraph, Spanning tree.
  • Planar graph — drawable without edge crossings.

Graph traversals.

1. Breadth-First Search (BFS). Explore neighbours layer by layer using a queue.

BFS(G, s):
    mark s as visited, enqueue s
    while queue not empty:
        u = dequeue
        for each neighbour v of u:
            if v not visited:
                mark v, enqueue v

Complexity O(V+E)O(V + E). Used for shortest path in unweighted graphs, level-order processing, web crawling.

2. Depth-First Search (DFS). Explore as deep as possible before backtracking, using recursion or a stack.

DFS(G, u):
    mark u as visited
    for each neighbour v of u:
        if v not visited:
            DFS(G, v)

Complexity O(V+E)O(V + E). Used for cycle detection, topological sort, connected components, articulation points.

Example on a small graph with vertices {1,2,3,4,5}\{1,2,3,4,5\} and edges {(1,2),(1,3),(2,4),(3,4),(4,5)}\{(1,2),(1,3),(2,4),(3,4),(4,5)\}:

  • BFS from 1: 123451 \to 2 \to 3 \to 4 \to 5.
  • DFS from 1: 124351 \to 2 \to 4 \to 3 \to 5 (one possible order).

Other traversals.

  • Eulerian traversal — visits every edge once (Hierholzer's algorithm).
  • Hamiltonian traversal — visits every vertex once (NP-hard in general).
  • Dijkstra's — shortest paths in weighted graphs with non-negative weights.

Q10.

(a) What do eigenvalues and eigenvectors contribute? Find eigenvalues and eigenvectors of A=(5423)A = \begin{pmatrix} 5 & 4 \\ 2 & 3 \end{pmatrix}. (10 marks)

Contribution of eigenvalues / eigenvectors.

For a square matrix AA, a scalar λ\lambda and non-zero vector vv satisfying Av=λvAv = \lambda v form an eigenpair. Eigenvectors are directions that AA only scales — direction unchanged. Their importance:

  1. Reveal intrinsic structure of a linear transformation: principal directions of stretch (largest λ\lambda) and compression (smallest).
  2. Diagonalisation A=PDP1A = P D P^{-1} — turns matrix powers and ODE solutions into simple scalar exponentials: Ak=PDkP1A^k = P D^k P^{-1}.
  3. PCA — eigenvectors of the covariance matrix are the principal components; eigenvalues are the variances along them.
  4. Spectral clustering, PageRank — leading eigenvectors of graph Laplacians or stochastic matrices reveal community structure and importance.
  5. Stability analysis of Markov chains, dynamical systems, neural-network optimisation.

Computation for A=(5423)A = \begin{pmatrix} 5 & 4 \\ 2 & 3 \end{pmatrix}.

Step 1 — Characteristic polynomial. det(AλI)=(5λ)(3λ)8=λ28λ+7\det(A - \lambda I) = (5 - \lambda)(3 - \lambda) - 8 = \lambda^2 - 8\lambda + 7.

Factor: (λ7)(λ1)=0(\lambda - 7)(\lambda - 1) = 0.

λ1=7,λ2=1.\boxed{\lambda_1 = 7, \quad \lambda_2 = 1}.

(Check: sum = 8 = trace ✓, product = 7 = det ✓.)

Step 2 — Eigenvector for λ1=7\lambda_1 = 7. Solve (A7I)v=0(A - 7I) v = 0: (2424)v=02x+4y=0x=2y\begin{pmatrix} -2 & 4 \\ 2 & -4 \end{pmatrix} v = 0 \Rightarrow -2 x + 4 y = 0 \Rightarrow x = 2 y. Take y=1y = 1: v1=(2,1)T\boxed{v_1 = (2, 1)^T}.

Step 3 — Eigenvector for λ2=1\lambda_2 = 1. Solve (AI)v=0(A - I) v = 0: (4422)v=04x+4y=0x=y\begin{pmatrix} 4 & 4 \\ 2 & 2 \end{pmatrix} v = 0 \Rightarrow 4 x + 4 y = 0 \Rightarrow x = -y. Take y=1y = 1: v2=(1,1)T\boxed{v_2 = (-1, 1)^T}.

Diagonalisation. P=(2111),D=(7001),A=PDP1.P = \begin{pmatrix} 2 & -1 \\ 1 & 1 \end{pmatrix}, \quad D = \begin{pmatrix} 7 & 0 \\ 0 & 1 \end{pmatrix}, \quad A = P D P^{-1}.

(b) How are inner products and similarities computed between vectors? (5 marks)

Inner product (dot product). For x,yRnx, y \in \mathbb{R}^n: x,y=xTy=i=1nxiyi.\langle x, y \rangle = x^T y = \sum_{i=1}^n x_i y_i.

It induces the norm x=x,x\|x\| = \sqrt{\langle x, x \rangle} and measures alignment via the angle: cosθ=x,yxy.\cos \theta = \frac{\langle x, y \rangle}{\|x\| \, \|y\|}.

Cosine similarity x,yxy[1,1]\dfrac{\langle x, y \rangle}{\|x\|\|y\|} \in [-1, 1] — the standard similarity measure for text vectors, embeddings, and recommender systems. Value 1 ⇒ identical direction; 0 ⇒ orthogonal; −1 ⇒ opposite.

Two vectors are orthogonal iff x,y=0\langle x, y \rangle = 0. By Cauchy–Schwarz, x,yxy|\langle x, y \rangle| \le \|x\| \|y\|.

(c) Discuss different metrics employed to measure the distance between matrices. (5 marks)

For matrices A,BRm×nA, B \in \mathbb{R}^{m \times n}, common distances are:

MetricFormulaInterpretation
FrobeniusABF=i,j(aijbij)2\|A - B\|_F = \sqrt{\sum_{i,j} (a_{ij} - b_{ij})^2}Entry-wise L2L_2; equals tr((AB)T(AB))\sqrt{\text{tr}((A-B)^T (A-B))}. Most common.
Spectral (operator L2L_2)AB2=σmax(AB)\|A - B\|_2 = \sigma_{\max}(A - B)Largest singular value of the difference. Worst-case stretching.
Nuclear / Trace normAB=iσi(AB)\|A - B\|_* = \sum_i \sigma_i(A - B)Sum of singular values. Used in low-rank recovery.
Manhattan (element L1L_1)$\sum_{i,j}a_{ij} - b_{ij}
Max norm / LL_\infty$\max_{i,j}a_{ij} - b_{ij}

When to use which. Frobenius is the default for least-squares matrix problems and PCA reconstruction error. Spectral norm appears in stability analysis. Nuclear norm is a convex surrogate for matrix rank (used in matrix completion / Netflix prize). Manhattan / max are useful for outlier-sensitive comparisons in robust ML.


Q11.

(a) Construct equations and analyse lines and plane. (10 marks)

Line A. Passes through A(1,0,2)A(1, 0, 2) with direction dA=(2,1,3)\vec{d}_A = (2, -1, 3).

Vector form: r(t)=(1,0,2)+t(2,1,3)\vec{r}(t) = (1, 0, 2) + t(2, -1, 3).

Parametric: x=1+2t,  y=t,  z=2+3tx = 1 + 2t, \; y = -t, \; z = 2 + 3t.

Symmetric: x12=y1=z23\dfrac{x - 1}{2} = \dfrac{y}{-1} = \dfrac{z - 2}{3}.

Line B. Through B(3,1,4)B(3, 1, 4) and C(5,0,7)C(5, 0, 7). Direction dB=CB=(2,1,3)\vec{d}_B = C - B = (2, -1, 3).

Symmetric: x32=y11=z43\dfrac{x - 3}{2} = \dfrac{y - 1}{-1} = \dfrac{z - 4}{3}.

Comparing Line A and Line B.

Direction vectors dA=dB=(2,1,3)\vec{d}_A = \vec{d}_B = (2, -1, 3) ⇒ the lines are parallel (or coincident).

Check whether Line A's point A(1,0,2)A(1, 0, 2) lies on Line B by plugging into Line B's symmetric form: 132=1,011=1,243=23\dfrac{1 - 3}{2} = -1, \quad \dfrac{0 - 1}{-1} = 1, \quad \dfrac{2 - 4}{3} = -\dfrac{2}{3}.

The three ratios are not equal, so AA does not lie on Line B.

Conclusion: Lines A and B are parallel but distinct (not intersecting, not the same line).

Plane through P(3,1,1),Q(4,1,3),R(2,0,2)P(3, -1, 1), Q(4, 1, 3), R(2, 0, 2).

Edge vectors: PQ=(1,2,2),  PR=(1,1,1)\vec{PQ} = (1, 2, 2), \; \vec{PR} = (-1, 1, 1).

Normal n=PQ×PR\vec{n} = \vec{PQ} \times \vec{PR}: n=i^j^k^122111=i^(22)j^(1+2)+k^(1+2)=(0,3,3)\vec{n} = \begin{vmatrix} \hat{i} & \hat{j} & \hat{k} \\ 1 & 2 & 2 \\ -1 & 1 & 1 \end{vmatrix} = \hat{i}(2 - 2) - \hat{j}(1 + 2) + \hat{k}(1 + 2) = (0, -3, 3).

Simplify: n=(0,1,1)\vec{n} = (0, -1, 1).

Plane equation (using P(3,1,1)P(3, -1, 1)): 0(x3)1(y+1)+1(z1)=00(x - 3) - 1(y + 1) + 1(z - 1) = 0 yz+2=0or equivalentlyy+z=2.\boxed{y - z + 2 = 0 \quad \text{or equivalently} \quad -y + z = 2}.

(b) Explain Bayes theorem in detail. Solve the defect-test problem. (10 marks)

Bayes' Theorem — statement. For events AA and BB with P(B)>0P(B) > 0:

P(AB)=P(BA)P(A)P(B).P(A \mid B) = \frac{P(B \mid A) \, P(A)}{P(B)}.

If {B1,,Bn}\{B_1, \dots, B_n\} partitions Ω\Omega: P(BkA)=P(ABk)P(Bk)i=1nP(ABi)P(Bi).P(B_k \mid A) = \frac{P(A \mid B_k) P(B_k)}{\sum_{i=1}^n P(A \mid B_i) P(B_i)}.

Why it matters. Bayes' theorem inverts conditioning — it turns a likelihood P(BA)P(B \mid A) (often easy to measure: "given the disease, the test reads positive 98% of the time") into a posterior P(AB)P(A \mid B) (often what we actually want: "given a positive test, what's the chance the patient has the disease"). The prior P(A)P(A) injects base-rate information.

Proof. From the definition of conditional probability:

  • P(AB)=P(AB)P(B)P(A \cap B) = P(A \mid B) P(B)
  • P(AB)=P(BA)P(A)P(A \cap B) = P(B \mid A) P(A)

Equating: P(AB)P(B)=P(BA)P(A)P(A \mid B) P(B) = P(B \mid A) P(A), hence P(AB)=P(BA)P(A)/P(B)P(A \mid B) = P(B \mid A) P(A) / P(B). ∎

Applications. Naive Bayes classifier (text, spam), Bayesian networks, medical diagnosis, A/B testing, Bayesian inference, Kalman filters.


Defect-test problem.

Let:

  • DD = "item is defective", DcD^c = "item is non-defective".
  • TT = "test reports defect (positive)".

Given.

  • Prior: P(D)=1/200=0.005,P(Dc)=0.995P(D) = 1/200 = 0.005, \quad P(D^c) = 0.995.
  • Sensitivity: P(TD)=0.98P(T \mid D) = 0.98.
  • False-positive rate: P(TDc)=0.02P(T \mid D^c) = 0.02.

Step 1 — Total probability of a positive test. P(T)=P(TD)P(D)+P(TDc)P(Dc)P(T) = P(T \mid D) P(D) + P(T \mid D^c) P(D^c) =0.98×0.005+0.02×0.995= 0.98 \times 0.005 + 0.02 \times 0.995 =0.0049+0.0199=0.0248= 0.0049 + 0.0199 = 0.0248.

Step 2 — Apply Bayes. P(DT)=P(TD)P(D)P(T)=0.00490.02480.1976.P(D \mid T) = \frac{P(T \mid D) P(D)}{P(T)} = \frac{0.0049}{0.0248} \approx 0.1976.

P(DT)19.76%.\boxed{P(D \mid T) \approx 19.76\%.}

Interpretation. Even though the test is 98% accurate, a positive result implies the item is actually defective only about 1 in 5 times. This is the classic base-rate fallacy: when the underlying condition is rare, false positives dominate true positives. This is also why screening tests for rare diseases are often followed by confirmatory tests.