November 2024 — Solved
Mathematical and Statistical Foundation of Data Science
End-Semester Examination, First Semester PG Diploma in Data Science and Analytics (2024 Admission).
Part A — Short Essay
Answer any five questions. Each question carries 2 marks. (5 × 2 = 10 Marks)
Q1. What does the Rank and Nullity theorem define for a matrix?
For an matrix representing a linear map , the rank-nullity theorem states:
where is the dimension of the column space and is the dimension of the null space . It says the input dimension is partitioned into "dimensions preserved" + "dimensions collapsed to zero."
Q2. What are denumerable sets?
A set is denumerable (countably infinite) if there exists a bijection , i.e., its elements can be listed as The sets and are all denumerable, while is uncountable (Cantor's diagonal argument).
Q3. Find the symmetric equation of the line through and .
Direction vector: .
Using point :
Q4. What defines an Eulerian circuit in graph theory?
An Eulerian circuit is a closed walk in a connected graph that traverses every edge exactly once and returns to the starting vertex. By Euler's theorem, a connected graph has an Eulerian circuit if and only if every vertex has even degree.
Q5. How many ways can a committee of 3 men and 2 women be selected from 8 men and 6 women?
Choose 3 of 8 men and 2 of 6 women independently:
Q6. What is meant by Independent Events? How can you determine if two events and are independent?
Two events and are independent if the occurrence of one does not affect the probability of the other, i.e.,
To test independence: compute , and from the data and check whether the product equals the joint probability. Equivalently, check .
Q7. Define a Tree.
A tree is a connected, acyclic, undirected graph. Equivalent characterisations on vertices: it has exactly edges, there is a unique path between any two vertices, removing any edge disconnects it, and adding any edge creates a cycle.
Part B — Long Essay
Answer any two questions. Each question carries 20 marks. (2 × 20 = 40 Marks)
Q8.
(a) What are the different types of random variables that can be generated? Explain with their PDF or PMF. (10 marks)
A random variable (RV) is a function that assigns a real number to each outcome of a random experiment. RVs fall into two main types — discrete and continuous — each with several standard families.
1. Discrete Random Variables. Take countably many values; described by a Probability Mass Function (PMF) with .
| Distribution | PMF | Mean | Variance | Typical use |
|---|---|---|---|---|
| Bernoulli | single trial / yes-no | |||
| Binomial | # successes in trials | |||
| Poisson | rare event counts | |||
| Geometric | trials until first success |
2. Continuous Random Variables. Take values in an interval of ; described by a Probability Density Function (PDF) with . Note for any single point; probabilities come from intervals: .
| Distribution | Mean | Variance | Typical use | |
|---|---|---|---|---|
| Uniform | on | equal-likelihood model | ||
| Exponential | waiting times, lifetimes | |||
| Normal | natural variation, errors | |||
| Gamma | sum of exponentials |
Generation. Discrete RVs are typically generated by sampling from a uniform and bucketing against the cumulative PMF. Continuous RVs are generated by inverse-transform sampling when the CDF is invertible (e.g., Exponential via ), the Box–Muller transform for normals, or rejection sampling for complex shapes.
(b) How does rejection sampling facilitate sampling from a complex probability distribution? Explain the algorithm. (10 marks)
Motivation. When the target density is hard to invert (so inverse-transform sampling cannot be used) but can be evaluated point-wise, rejection sampling lets us sample from using an easier proposal distribution.
Setup.
- Target density — what we want to sample from.
- Proposal density — easy to sample (e.g., uniform, normal).
- Envelope constant such that for all .
Algorithm.
loop forever:
1. Draw a candidate X from g
2. Draw U from Uniform(0, 1)
3. If U ≤ f(X) / (M · g(X)): return X (accept)
else: continue (reject and try again)
Why it works.
.
Hence and the conditional CDF of the accepted samples is — so accepted samples follow . ∎
Efficiency and trade-off.
- Acceptance rate is . A tight envelope ( hugging ) gives small and high acceptance.
- A loose envelope wastes many candidate draws.
- should be chosen in the same support as and ideally with a similar shape.
Example — sampling from (standard normal half) on using exponential proposal : the envelope constant is , giving an acceptance rate of ~76%.
Applications: Bayesian inference (sampling posteriors), Monte Carlo integration, generative simulation, and as a building block within MCMC methods like Metropolis–Hastings.
Q9.
(a) State the Principle of Inclusion-Exclusion for three sets. Class survey problem. (10 marks)
Statement. For any three finite sets :
Class survey problem.
Let be the sets of students who like Mathematics, Physics, Chemistry respectively.
Given: , , , , , , . Total students = 120.
Students who like at least one subject:
So 105 students like at least one subject (and like none of the three).
Students who like only Mathematics =
Venn diagram (regions and counts).
| Region | Count |
|---|---|
| Only Math | 25 |
| Only Physics | 45 − 25 − 15 + 10 = 15 |
| Only Chemistry | 50 − 20 − 15 + 10 = 25 |
| Math ∩ Physics only (not Chem) | 25 − 10 = 15 |
| Math ∩ Chemistry only | 20 − 10 = 10 |
| Physics ∩ Chemistry only | 15 − 10 = 5 |
| All three | 10 |
| None | 15 |
| Total | 120 ✓ |
Sketch: three overlapping circles labeled M, P, C; centre region (all three) = 10; pairwise outer rings 15, 10, 5; outer rings 25, 15, 25; outside the three circles = 15.
(b) Illustrate different graph terminologies and traversals. (10 marks)
Graph terminologies.
- Vertex (node) and edge — fundamental units. .
- Degree — number of edges incident to . Handshaking lemma: .
- Simple graph — no self-loops, no parallel edges.
- Multigraph / Pseudograph — allows parallel edges (and loops).
- Directed graph (digraph) — edges with direction; vertices have in-degree and out-degree.
- Weighted graph — edges carry weights (costs, distances).
- Complete graph — every pair joined, edges.
- Bipartite graph — vertex set splits as with edges only between and .
- Walk — sequence of vertices/edges (repetition allowed).
- Trail — walk with no repeated edge.
- Path — walk with no repeated vertex.
- Cycle/Circuit — closed path / closed trail.
- Connected — every pair of vertices joined by a path.
- Tree — connected, acyclic graph with edges.
- Subgraph, Spanning subgraph, Spanning tree.
- Planar graph — drawable without edge crossings.
Graph traversals.
1. Breadth-First Search (BFS). Explore neighbours layer by layer using a queue.
BFS(G, s):
mark s as visited, enqueue s
while queue not empty:
u = dequeue
for each neighbour v of u:
if v not visited:
mark v, enqueue v
Complexity . Used for shortest path in unweighted graphs, level-order processing, web crawling.
2. Depth-First Search (DFS). Explore as deep as possible before backtracking, using recursion or a stack.
DFS(G, u):
mark u as visited
for each neighbour v of u:
if v not visited:
DFS(G, v)
Complexity . Used for cycle detection, topological sort, connected components, articulation points.
Example on a small graph with vertices and edges :
- BFS from 1: .
- DFS from 1: (one possible order).
Other traversals.
- Eulerian traversal — visits every edge once (Hierholzer's algorithm).
- Hamiltonian traversal — visits every vertex once (NP-hard in general).
- Dijkstra's — shortest paths in weighted graphs with non-negative weights.
Q10.
(a) What do eigenvalues and eigenvectors contribute? Find eigenvalues and eigenvectors of . (10 marks)
Contribution of eigenvalues / eigenvectors.
For a square matrix , a scalar and non-zero vector satisfying form an eigenpair. Eigenvectors are directions that only scales — direction unchanged. Their importance:
- Reveal intrinsic structure of a linear transformation: principal directions of stretch (largest ) and compression (smallest).
- Diagonalisation — turns matrix powers and ODE solutions into simple scalar exponentials: .
- PCA — eigenvectors of the covariance matrix are the principal components; eigenvalues are the variances along them.
- Spectral clustering, PageRank — leading eigenvectors of graph Laplacians or stochastic matrices reveal community structure and importance.
- Stability analysis of Markov chains, dynamical systems, neural-network optimisation.
Computation for .
Step 1 — Characteristic polynomial. .
Factor: .
(Check: sum = 8 = trace ✓, product = 7 = det ✓.)
Step 2 — Eigenvector for . Solve : . Take : .
Step 3 — Eigenvector for . Solve : . Take : .
Diagonalisation.
(b) How are inner products and similarities computed between vectors? (5 marks)
Inner product (dot product). For :
It induces the norm and measures alignment via the angle:
Cosine similarity — the standard similarity measure for text vectors, embeddings, and recommender systems. Value 1 ⇒ identical direction; 0 ⇒ orthogonal; −1 ⇒ opposite.
Two vectors are orthogonal iff . By Cauchy–Schwarz, .
(c) Discuss different metrics employed to measure the distance between matrices. (5 marks)
For matrices , common distances are:
| Metric | Formula | Interpretation |
|---|---|---|
| Frobenius | Entry-wise ; equals . Most common. | |
| Spectral (operator ) | Largest singular value of the difference. Worst-case stretching. | |
| Nuclear / Trace norm | Sum of singular values. Used in low-rank recovery. | |
| Manhattan (element ) | $\sum_{i,j} | a_{ij} - b_{ij} |
| Max norm / | $\max_{i,j} | a_{ij} - b_{ij} |
When to use which. Frobenius is the default for least-squares matrix problems and PCA reconstruction error. Spectral norm appears in stability analysis. Nuclear norm is a convex surrogate for matrix rank (used in matrix completion / Netflix prize). Manhattan / max are useful for outlier-sensitive comparisons in robust ML.
Q11.
(a) Construct equations and analyse lines and plane. (10 marks)
Line A. Passes through with direction .
Vector form: .
Parametric: .
Symmetric: .
Line B. Through and . Direction .
Symmetric: .
Comparing Line A and Line B.
Direction vectors ⇒ the lines are parallel (or coincident).
Check whether Line A's point lies on Line B by plugging into Line B's symmetric form: .
The three ratios are not equal, so does not lie on Line B.
Conclusion: Lines A and B are parallel but distinct (not intersecting, not the same line).
Plane through .
Edge vectors: .
Normal : .
Simplify: .
Plane equation (using ):
(b) Explain Bayes theorem in detail. Solve the defect-test problem. (10 marks)
Bayes' Theorem — statement. For events and with :
If partitions :
Why it matters. Bayes' theorem inverts conditioning — it turns a likelihood (often easy to measure: "given the disease, the test reads positive 98% of the time") into a posterior (often what we actually want: "given a positive test, what's the chance the patient has the disease"). The prior injects base-rate information.
Proof. From the definition of conditional probability:
Equating: , hence . ∎
Applications. Naive Bayes classifier (text, spam), Bayesian networks, medical diagnosis, A/B testing, Bayesian inference, Kalman filters.
Defect-test problem.
Let:
- = "item is defective", = "item is non-defective".
- = "test reports defect (positive)".
Given.
- Prior: .
- Sensitivity: .
- False-positive rate: .
Step 1 — Total probability of a positive test. .
Step 2 — Apply Bayes.
Interpretation. Even though the test is 98% accurate, a positive result implies the item is actually defective only about 1 in 5 times. This is the classic base-rate fallacy: when the underlying condition is rare, false positives dominate true positives. This is also why screening tests for rare diseases are often followed by confirmatory tests.