PGDDSA Study · Semester 1

PGD24110105B

November 2024 — Solved

Computational Methods for Data Science

PGD01C05

2 hours 30 minutes

50 marks

Solved

End-Semester Examination, First Semester PG Diploma in Data Science and Analytics (2024 Admission).

Part A — Short Essay

Answer any five questions. Each question carries 2 marks. (5 × 2 = 10 Marks)

Q1. Write any two applications of Fourier transform.

Audio / signal processing — decomposing a sound signal into its frequency components for equalisation, noise filtering, and compression (MP3 uses a related transform).
Image processing — frequency-domain filtering for denoising, edge detection, and JPEG compression (which is built on the related DCT).

(Other valid uses: solving differential equations, MRI reconstruction, radar, telecommunications, spectroscopy.)

Q2. Explain the Strong Law of Large Numbers.

The Strong Law of Large Numbers (SLLN) states that if $X_1, X_2, \dots$ are i.i.d. random variables with mean $\mu$ , then the sample mean converges almost surely to $\mu$ :

$\overline{X}_n = \frac{1}{n}\sum_{i=1}^{n} X_i \xrightarrow{a.s.} \mu \quad \text{as } n \to \infty.$

This is the theoretical justification for estimating means from large samples — and underpins Monte Carlo simulation. "Strong" because the convergence holds with probability 1 (almost-sure), unlike the Weak Law which is convergence in probability.

Q3. What is the role of image acquisition in the image processing pipeline?

Image acquisition is the first stage — converting a physical scene into a digital image via a sensor (camera, scanner, MRI coil, satellite imager). The sensor performs sampling (spatial discretisation, governed by Nyquist) and quantisation (intensity discretisation). The quality of acquisition (resolution, noise, exposure, lens distortion) sets a hard upper bound on every downstream step (denoising, enhancement, segmentation, recognition).

Q4. How are eigenvectors related to principal components?

In Principal Component Analysis (PCA), the principal components are the eigenvectors of the covariance matrix of the centred data. They are the orthogonal directions of maximum variance. The corresponding eigenvalues equal the variance along each direction. The first PC = top eigenvector, second PC = next eigenvector, and so on.

Q5. List the three main steps involved in image processing.

Image acquisition — capture and digitise (sensor → pixels).
Image processing / enhancement — denoising, sharpening, restoration, transformation, segmentation, feature extraction.
Image analysis / interpretation — classification, recognition, decision (e.g., is there a tumour in this MRI?).

Q6. Explain Type I and Type II error in statistical hypothesis testing.

Type I error ( $\alpha$ , false positive) — rejecting a true null hypothesis $H_0$ . Probability = significance level $\alpha$ (typically 0.05).
Type II error ( $\beta$ , false negative) — failing to reject a false null hypothesis. The power of a test is $1 - \beta$ .

There is an unavoidable trade-off: lowering $\alpha$ raises $\beta$ (and vice-versa) for a fixed sample size.

Q7. What mathematical theory forms the foundation of the Fourier Series?

The Fourier Series is grounded in the theory of orthogonal function expansions in a Hilbert space, specifically $L^2$ inner product spaces. The sines and cosines $\{1, \cos(nx), \sin(nx)\}_{n=1}^{\infty}$ form a complete orthonormal basis on $[-\pi, \pi]$ , so any (square-integrable) periodic function can be uniquely written as a linear combination — exactly as a vector decomposes onto an orthonormal basis.

Part B — Long Essay

Answer any two questions. Each question carries 20 marks. (2 × 20 = 40 Marks)

Q8.

(a) Compare the Gaussian filter and Mean filter for noise reduction. Discuss their effects on image edges and noise suppression. (10 marks)

Both are linear spatial filters that smooth images by convolving with a kernel. They are widely used as the first step in image-processing pipelines.

Mean (Box / Average) filter.

Each output pixel is the unweighted average of its $k \times k$ neighbourhood:

$I'(x, y) = \frac{1}{k^2} \sum_{i,j \in W} I(x+i, y+j).$

3×3 example kernel: $M = \frac{1}{9}\begin{pmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{pmatrix}.$

Gaussian filter.

Weighted average using a 2-D Gaussian kernel, so pixels closer to the centre contribute more:

$G(x, y) = \frac{1}{2\pi \sigma^2} e^{-(x^2 + y^2)/(2 \sigma^2)}.$

Larger $\sigma$ → wider, smoother blur. The kernel is separable: a 2D Gaussian convolution equals two 1D convolutions — fast.

Comparison.

Aspect	Mean filter	Gaussian filter
Kernel weights	Uniform	Centre-weighted (bell-shape)
Frequency response	Sinc-shaped — has side-lobes, leaks high frequencies	Gaussian — monotonically decreasing, no ringing
Noise suppression	Good for additive Gaussian noise — but rough	Better, smoother suppression of Gaussian noise
Edge preservation	Poor — blurs edges hard; produces blocky artefacts	Smoother blur but still blurs edges
Artefacts	Ringing, halos near edges	Soft halos; minimal ringing
Smoothness vs detail	Strong smoothing, loses fine detail	Tunable via $\sigma$ — better trade-off
Computational cost	Cheapest	Slightly higher (separable kernel mitigates)
Salt-and-pepper noise	Fails — keeps spikes	Fails — use a median filter instead
Theoretical link	Special case of weighted average	Maximum-entropy smoother
Typical use	Quick smoothing, downsampling preview	Pre-processing for edge detection (Canny uses Gaussian), feature extraction

Effect on edges. Both filters are isotropic and linear — they don't know where the edges are. Both blur edges, but a Gaussian blurs more gracefully (no ringing), giving a softer result. For edge-preserving denoising one must instead use bilateral, median, non-local means, or modern BM3D / deep-learning denoisers.

Effect on noise. For additive Gaussian noise, the Gaussian filter is the optimal linear smoother (Wiener-filter limit) — its frequency response matches the noise's spectral assumptions and avoids the side-lobes of a box filter. Box filtering is acceptable for casual smoothing but generally inferior.

(b) Benefits of digital image processing for improving image quality and automating tasks. (10 marks)

Image-quality improvements.

Noise reduction — Gaussian, median, non-local means and modern deep denoisers remove sensor noise, increasing perceived sharpness.
Contrast enhancement — histogram equalisation, CLAHE, gamma correction reveal detail hidden in shadows / highlights.
Sharpening — unsharp masking, Laplacian-based enhancement strengthen edges and structure.
Restoration — deblurring (motion / focus), inpainting of missing regions, dust / scratch removal in archival images.
White-balance and colour correction — restore natural colours under arbitrary illumination.
Super-resolution — single-image and learning-based methods reconstruct higher-resolution images.
Geometric correction — perspective rectification, lens-distortion removal.
Compression — JPEG/JPEG-2000/HEIF reduce file size with minimal perceptible loss.

Task automation.

Object detection and classification — autonomous driving, surveillance, defect detection on factory lines.
Medical diagnosis — automatic detection of tumours in CT/MRI, diabetic retinopathy screening, X-ray triage.
Optical Character Recognition (OCR) — automated form processing, postal sorting, document digitisation.
Face recognition — biometric authentication, unlocking phones, airport gates.
Satellite and remote sensing — land-use classification, deforestation monitoring, crop-yield estimation.
Agriculture — disease detection on leaves, fruit-ripeness sorting, weed identification.
Industrial inspection — robotic quality control on production lines.
Sports and entertainment — Hawk-Eye in tennis/cricket, motion capture, AR filters.
Security and forensics — license-plate recognition, video analytics, fingerprint matching.
Augmented reality — real-time scene understanding for AR overlays.

Strategic benefits.

Speed and scale. A model can process millions of images / hour — impossible manually.
Consistency. No fatigue, no inter-rater variability.
Cost reduction. Automates labour-intensive tasks (medical screening, quality control).
New capabilities. Sees patterns humans miss (subtle pixel-level cancer features).
Data-driven decisions. Quantifies what was previously subjective ("looks like a defect" → 0.87 probability).
24/7 operation. Round-the-clock surveillance, monitoring, screening.

Q9.

(a) Process of computing the Singular Value Decomposition (SVD) of a matrix, including the steps to calculate eigenvalues and eigenvectors. (10 marks)

Statement. For any real matrix $A \in \mathbb{R}^{m \times n}$ there exist orthogonal $U \in \mathbb{R}^{m \times m}$ , $V \in \mathbb{R}^{n \times n}$ and a diagonal $\Sigma \in \mathbb{R}^{m \times n}$ (with non-negative entries $\sigma_1 \ge \sigma_2 \ge \dots \ge 0$ ) such that

$\boxed{A = U \Sigma V^T}.$

The $\sigma_i$ are the singular values, columns of $U$ are left singular vectors, columns of $V$ are right singular vectors.

Computational process.

Step 1 — Form $A^T A$ . This is $n \times n$ , symmetric, positive semi-definite.

Step 2 — Eigendecompose $A^T A$ .

Solve $\det(A^T A - \lambda I) = 0$ to obtain eigenvalues $\lambda_1 \ge \lambda_2 \ge \dots \ge \lambda_n \ge 0$ and corresponding orthonormal eigenvectors $v_1, v_2, \dots, v_n$ via $(A^T A - \lambda_i I) v_i = 0$ .

The right singular vectors are these $v_i$ ; assemble into $V = [v_1 \, v_2 \, \dots \, v_n]$ .

Step 3 — Singular values.

$\sigma_i = \sqrt{\lambda_i}, \quad i = 1, \dots, \min(m, n).$

Place them on the diagonal of $\Sigma$ , descending.

Step 4 — Left singular vectors.

For each non-zero $\sigma_i$ :

$u_i = \frac{1}{\sigma_i} A v_i.$

Verify $\|u_i\| = 1$ and they are orthogonal. Stack into $U = [u_1 \, u_2 \, \dots \, u_r]$ (where $r$ = rank of $A$ ). Extend $U$ to a full orthonormal basis if needed.

Step 5 — Verify. Reconstruct $A = U \Sigma V^T$ and check.

Worked example. $A = \begin{pmatrix} 3 & 0 \\ 4 & 5 \end{pmatrix}$ .

Step 1. $A^T A = \begin{pmatrix} 3 & 4 \\ 0 & 5 \end{pmatrix} \begin{pmatrix} 3 & 0 \\ 4 & 5 \end{pmatrix} = \begin{pmatrix} 25 & 20 \\ 20 & 25 \end{pmatrix}$ .

Step 2. Eigenvalues: $\det(A^T A - \lambda I) = (25 - \lambda)^2 - 400 = 0 \Rightarrow \lambda = 45, 5$ .

Eigenvectors: for $\lambda = 45$ : $-20 v_1 + 20 v_2 = 0 \Rightarrow v = \frac{1}{\sqrt{2}}(1, 1)^T$ .

For $\lambda = 5$ : $20 v_1 + 20 v_2 = 0 \Rightarrow v = \frac{1}{\sqrt{2}}(1, -1)^T$ .

Step 3. Singular values $\sigma_1 = \sqrt{45} \approx 6.71, \; \sigma_2 = \sqrt{5} \approx 2.24$ .

Step 4. Left singular vectors: $u_1 = \dfrac{1}{\sqrt{45}} A \cdot \dfrac{1}{\sqrt{2}}(1,1)^T = \dfrac{1}{\sqrt{90}}(3, 9)^T = \dfrac{1}{\sqrt{10}}(1, 3)^T$ .

$u_2 = \dfrac{1}{\sqrt{5}} A \cdot \dfrac{1}{\sqrt{2}}(1, -1)^T = \dfrac{1}{\sqrt{10}}(3, -1)^T$ .

Result. $A = U \Sigma V^T$ with $U = \dfrac{1}{\sqrt{10}}\begin{pmatrix} 1 & 3 \\ 3 & -1 \end{pmatrix}, \; \Sigma = \begin{pmatrix} \sqrt{45} & 0 \\ 0 & \sqrt{5} \end{pmatrix}, \; V = \dfrac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}$ .

Practical algorithms. Direct eigendecomposition is unstable for large matrices. Production SVD uses Householder bidiagonalisation + QR iterations (Golub–Kahan), Jacobi rotations, or Lanczos/randomised SVD for very large sparse matrices.

(b) Step-by-step process of Principal Component Analysis (PCA), including the mathematical rationale for each step. (10 marks)

Goal. Find a lower-dimensional orthogonal basis that preserves maximum variance of the data — equivalently, that minimises reconstruction error (Eckart–Young).

Step 1 — Centre the data. $x_i' = x_i - \overline{x}.$ Rationale. PCA finds directions through the origin; centring places the data's centre of mass at the origin so the first PC truly captures variance, not the offset of the mean.

Step 2 — Standardise (optional). Divide by std of each feature. Rationale. Variance is scale-sensitive: a feature in millimetres will dwarf one in kilograms unless both are standardised. Skip only if features share units.

Step 3 — Compute covariance matrix. $\Sigma = \frac{1}{n - 1} X^T X \quad (X = \text{centred data}).$ $\Sigma$ is symmetric, positive semi-definite. $\Sigma_{ij}$ tells how features $i$ and $j$ co-vary. Rationale. PCA seeks directions of maximum variance — variance is encoded in $\Sigma$ .

Step 4 — Eigendecompose $\Sigma$ .

Solve $\Sigma v = \lambda v$ . Obtain eigenpairs $(\lambda_i, v_i)$ .

Eigenvectors $v_i$ are orthonormal directions in feature space.
Eigenvalues $\lambda_i$ are the variance along $v_i$ .

Rationale. The direction $v$ that maximises $\text{Var}(v^T X) = v^T \Sigma v$ subject to $\|v\|=1$ is, by the Rayleigh quotient, the eigenvector of $\Sigma$ with the largest eigenvalue. The next-best orthogonal direction is the second eigenvector, and so on.

Step 5 — Sort and select top- $k$ components.

Sort $\lambda_1 \ge \lambda_2 \ge \dots \ge \lambda_p$ and choose top $k$ — typically to keep, say, 95% cumulative variance: $\frac{\sum_{i=1}^{k} \lambda_i}{\sum_{i=1}^{p} \lambda_i} \ge 0.95.$ Rationale. Most variance is concentrated in a few directions; the rest are noise.

Step 6 — Project.

Form $W = [v_1, v_2, \dots, v_k]$ and project: $Z = X W \quad (\text{shape: } n \times k).$ Rationale. $Z$ is the data in the reduced-dim PCA basis. Each row $z_i$ is the new representation of $x_i$ .

Step 7 — Reconstruct or analyse.

Reconstruct: $\hat X = Z W^T$ (back to original space). The reconstruction error is $\sum_{i > k} \lambda_i$ — by Eckart–Young this is the minimum possible for any rank- $k$ approximation.

Equivalence with SVD. PCA can be done directly via SVD of the centred data matrix $X = U \Sigma V^T$ : the principal components are columns of $V$ , and the singular values squared (over $n-1$ ) are the variances. SVD is numerically preferable to forming $X^T X$ explicitly.

Applications.

Dimensionality reduction for ML (faster training, less overfitting).
Visualisation in 2D / 3D.
Denoising by reconstructing from top components.
Compression — Eigenfaces, hyperspectral imaging.

Caveats.

PCA is linear — for non-linear manifolds use Kernel PCA, t-SNE, UMAP, autoencoders.
Principal components are not interpretable as original features.
Sensitive to scaling and outliers.

Q10.

(a) Process of image recognition and its significance in computer vision, with examples. (10 marks)

Definition. Image recognition is the task of identifying what is in an image — assigning class labels to entire images, regions, or pixels. It is a central problem in computer vision.

Typical pipeline.

1. Image acquisition. Camera, scanner, MRI — produces raw pixel array.

2. Pre-processing.

Resize, crop, normalise pixel intensities to $[0, 1]$ .
Denoise (Gaussian / median).
Colour normalisation, white-balance.
Augment (rotate, flip, jitter) during training.

3. Feature extraction. Two paradigms:

Hand-crafted features (classical): SIFT, SURF, HOG, LBP, Haar, colour histograms.
Learned features (deep learning): Convolutional Neural Networks (CNNs) learn hierarchical features automatically — early layers detect edges, mid layers detect parts, deep layers detect object concepts.

4. Classification / decision.

Classical: SVM, Random Forest, KNN on the extracted features.
Deep: softmax head on top of a CNN backbone (ResNet, EfficientNet, ViT).

5. Post-processing. Non-max suppression (object detection), morphological cleanup, threshold tuning.

6. Output. Label + confidence; or bounding boxes; or pixel-wise segmentation mask depending on task.

Categories of recognition tasks.

Task	Output
Classification	Whole-image label (e.g., "cat")
Localisation	Label + bounding box
Object detection	Multiple labels + boxes (YOLO, Faster R-CNN)
Semantic segmentation	Class label per pixel (U-Net, DeepLab)
Instance segmentation	Per-pixel labels with instance IDs (Mask R-CNN)
Face recognition	Identity verification or identification
OCR	Text strings from images

Significance in computer vision.

Foundation of modern AI applications — autonomous driving, medical imaging, robotics, AR/VR, surveillance.
Replaces or augments human vision for high-volume / dangerous / detailed tasks.
Enables new capabilities — recognises microscopic patterns, satellite features, defect types invisible to humans.
Drives ImageNet-style breakthroughs that propelled deep learning from 2012 onward (AlexNet → ResNet → ViT).

Examples.

Medical imaging. Detect diabetic retinopathy from retinal fundus images (Google's deep-learning model achieved cardiologist-level accuracy). CheXNet detects 14 chest conditions from X-rays.
Autonomous vehicles. Tesla, Waymo recognise pedestrians, traffic lights, lane markings in real time using CNNs.
Face recognition. Apple Face ID, airport e-gates, photo-tagging in Google Photos.
OCR / Document AI. Bank cheque scanning, invoice processing, Google Lens translation.
Agriculture. Detect crop disease from leaf photos; classify ripeness; estimate yield from drone imagery.
Industrial QC. Detect manufacturing defects on assembly lines (cracks, scratches, missing components).
Wildlife conservation. Camera-trap image analysis to track endangered species automatically.
Retail. Visual search ("find this product"), checkout-free stores (Amazon Go).

(b) Concept of compressed sensing and its key principles in signal reconstruction. (10 marks)

Motivation. Classical sampling theory (Nyquist–Shannon) demands sampling at at least twice the highest frequency. For modern signals (MRI, hyperspectral imaging, radar), Nyquist sampling is expensive in time and storage. Compressed Sensing (CS) — pioneered by Candès, Romberg, Tao and Donoho (2004–06) — shows we can recover a signal from far fewer measurements than Nyquist if the signal is sparse in some basis.

Setup. Let $x \in \mathbb{R}^n$ be the signal we want to recover. We take linear measurements: $y = \Phi x \in \mathbb{R}^m, \quad m \ll n.$

The matrix $\Phi$ is called the sensing / measurement matrix. Recovering $x$ from $y$ is underdetermined — infinitely many solutions. CS gives us conditions under which the correct $x$ can still be uniquely recovered.

Key principles.

1. Sparsity. The signal $x$ is $k$ -sparse in some basis $\Psi$ — i.e., $x = \Psi \alpha$ where $\alpha$ has at most $k$ non-zero entries. Natural images are sparse in wavelets; audio in DCT; communication signals in Fourier basis.

2. Incoherent measurements. The sensing matrix $\Phi$ must be incoherent with the sparsity basis $\Psi$ — meaning rows of $\Phi$ are "spread out" in the basis $\Psi$ . Random Gaussian / Bernoulli / random Fourier matrices are highly incoherent with almost any fixed basis.

3. Restricted Isometry Property (RIP). For some $\delta_k < 1$ , $(1 - \delta_k) \|x\|_2^2 \le \|\Phi x\|_2^2 \le (1 + \delta_k) \|x\|_2^2$ for all $k$ -sparse $x$ . RIP guarantees that distinct sparse signals produce distinct measurements — making them recoverable.

4. $\ell_1$ minimisation reconstruction. Given measurements $y$ , recover $x$ by solving the convex programme

$\hat x = \arg\min_x \|x\|_1 \quad \text{subject to } \Phi x = y.$

The $\ell_1$ norm $\sum |x_i|$ is a convex proxy for the (non-convex) $\ell_0$ count of non-zeros, and Candès–Tao showed that with sufficient incoherent measurements ( $m \gtrsim k \log(n/k)$ ), $\ell_1$ recovery exactly equals $\ell_0$ recovery.

5. Stability and robustness. Realistic measurements have noise $y = \Phi x + n$ . The variant $\hat x = \arg\min_x \|x\|_1 \quad \text{s.t.} \quad \|\Phi x - y\|_2 \le \eta$ (or its Lagrangian Lasso form) recovers $x$ with error bounded by the noise level.

Algorithms. Basis Pursuit / Lasso (convex optimisation), Orthogonal Matching Pursuit (greedy), Iterative Hard / Soft Thresholding, approximate message passing.

Why it matters — applications.

MRI — measure far fewer Fourier samples; reconstruct via CS. Reduces scan time from 30 min to a few minutes, important for paediatric and cardiac MRI.
Single-pixel camera (Rice Univ.) — uses random masks to capture an image with one photodetector and CS reconstruction.
Compressed sensing radar — fewer pulses, sparser signal recovery.
Hyperspectral imaging, seismic data, networked sensors — wherever Nyquist sampling is prohibitive.
Machine learning — sparse recovery / Lasso / dictionary learning all rest on the same principles.

Significance. Compressed sensing rewrote our understanding of sampling: sparsity beats bandwidth. Whenever signals have few effective degrees of freedom in some basis, we can sense them with proportionally few measurements — a paradigm now embedded in MRI scanners, cameras, communication systems, and machine-learning regularisers.

Q11.

(a) Fourier Series, Fourier Transform, Fast Fourier Transform (FFT), Time-Frequency Analysis — differences and applications. (10 marks)

A family of related tools for decomposing signals into frequency components, each suited to different signal types and computational regimes.

1. Fourier Series (FS).

Applies to periodic signals. Any sufficiently nice periodic signal $f(t)$ of period $T$ can be written as

$f(t) = \sum_{n=-\infty}^{\infty} c_n e^{i 2\pi n t / T}, \quad c_n = \frac{1}{T} \int_0^T f(t) e^{-i 2\pi n t / T} dt.$

Outputs a discrete spectrum at integer multiples of $1/T$ .

Applications. Analysing musical notes, AC power waveforms, periodic vibrations in mechanical systems, solving heat / wave equations on bounded domains.

2. Fourier Transform (FT).

Generalises FS to non-periodic, continuous signals over $\mathbb{R}$ :

$F(\omega) = \int_{-\infty}^{\infty} f(t) e^{-i \omega t} dt, \quad f(t) = \frac{1}{2\pi}\int_{-\infty}^{\infty} F(\omega) e^{i\omega t} d\omega.$

Outputs a continuous spectrum $F(\omega)$ .

Applications. Signal-processing theory, telecommunications, image processing, optics, quantum mechanics, solving differential equations.

3. Discrete & Fast Fourier Transform (DFT / FFT).

Real-world signals are sampled. The DFT of $\{x_0, x_1, \dots, x_{N-1}\}$ is

$X_k = \sum_{n=0}^{N-1} x_n \, e^{-i 2\pi k n / N}, \quad k = 0, \dots, N-1.$

Naive DFT costs $O(N^2)$ . The FFT (Cooley–Tukey 1965) computes it in $O(N \log N)$ — the algorithmic breakthrough that made digital signal processing practical.

Applications. MP3 / JPEG / video compression (related transforms), spectrum analysers, OFDM in 4G/5G, radar, MRI image reconstruction, real-time audio effects, convolutions via the frequency domain.

4. Time-Frequency Analysis.

The standard FT tells us what frequencies are present but not when they occur — it averages over all time. For non-stationary signals (speech, music, EEG), we need both. Approaches:

Short-Time Fourier Transform (STFT) — slide a short window across the signal, take FT in each window: $X(t, \omega) = \int f(\tau) w(\tau - t) e^{-i \omega \tau} d\tau.$ Produces a spectrogram (frequency vs time vs amplitude). Trade-off: a short window gives good time resolution but poor frequency resolution and vice versa (Heisenberg uncertainty principle).
Wavelet transform — uses scalable wavelets instead of a fixed window, giving good time resolution at high frequency and good frequency resolution at low frequency. Multi-resolution analysis.
Wigner–Ville distribution — joint time-frequency distribution; high resolution but cross-term artefacts.

Applications. Speech recognition, music analysis, biomedical signals (EEG, ECG), seismic data, sonar / radar, fault detection in rotating machinery.

Side-by-side comparison.

Tool	Signal type	Output	Time/freq resolution	Complexity	Best use
Fourier Series	Periodic, continuous	Discrete spectrum	No time info	–	Periodic signals
Fourier Transform	Non-periodic, continuous	Continuous spectrum	No time info	–	Theory, continuous signals
DFT / FFT	Discrete, finite	Discrete spectrum	No time info	$O(N \log N)$	Digital signals, real-time DSP
STFT / Wavelets	Discrete or continuous	Time–frequency map	Time AND freq	$O(N \log N)$ per frame	Non-stationary, transient signals

(b) What are random variables and their types? Give any four types of random variables with their probability distribution functions. (10 marks)

Random variable (RV). A random variable is a function $X : \Omega \to \mathbb{R}$ that assigns a real number to each outcome of a random experiment. It is the bridge between abstract sample spaces and numerical analysis — it lets us talk about probabilities of numbers ( $P(X \le 5)$ ) rather than probabilities of outcomes.

Distribution function. For any RV, the Cumulative Distribution Function (CDF) $F(x) = P(X \le x)$ exists; it is non-decreasing, right-continuous, with $\lim_{x \to -\infty} F = 0$ and $\lim_{x \to \infty} F = 1$ .

Types of random variables.

1. Discrete RV. Takes countably many values. Described by a Probability Mass Function (PMF) $p(x) = P(X = x)$ , with $\sum_x p(x) = 1$ .

2. Continuous RV. Takes values in an interval. Described by a Probability Density Function (PDF) $f(x) \ge 0$ , with $\int_{-\infty}^{\infty} f(x) dx = 1$ . $P(X = c) = 0$ for any single point; probabilities arise from intervals.

3. Mixed RV. Combination — e.g., insurance payouts that are 0 with non-zero probability and continuous when positive.

4. Multivariate RV (random vector). A vector $X = (X_1, \dots, X_n)$ whose components are jointly random.

Four specific distributions.

1. Bernoulli $(p)$ — Discrete.

PMF: $P(X = 1) = p, \; P(X = 0) = 1 - p$ , $0 \le p \le 1$ .

Mean $= p$ , variance $= p(1-p)$ . Models a single yes/no trial.

2. Binomial $(n, p)$ — Discrete.

PMF: $P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}, \; k = 0, 1, \dots, n$ .

Mean $= np$ , variance $= np(1 - p)$ . Models the number of successes in $n$ independent Bernoulli trials. Used in A/B testing, quality control.

3. Poisson $(\lambda)$ — Discrete.

PMF: $P(X = k) = \dfrac{e^{-\lambda} \lambda^k}{k!}, \; k = 0, 1, 2, \dots$ .

Mean $= \lambda$ , variance $= \lambda$ . Models the count of rare independent events in a fixed time/space window — call-centre arrivals, defects per page, photons per pixel.

4. Normal $(\mu, \sigma^2)$ — Continuous.

PDF: $f(x) = \dfrac{1}{\sigma \sqrt{2\pi}} \exp\!\left( -\dfrac{(x - \mu)^2}{2\sigma^2} \right), \; x \in \mathbb{R}$ .

Mean $= \mu$ , variance $= \sigma^2$ . The most important continuous distribution — by the Central Limit Theorem, sample means of any finite-variance distribution converge to normal as $n \to \infty$ . Models measurement errors, heights, IQ scores, asset returns.

(Two more common ones, for context.)

5. Exponential $(\lambda)$ — Continuous. PDF $f(x) = \lambda e^{-\lambda x}, x \ge 0$ . Mean $1/\lambda$ . Models inter-arrival times in a Poisson process — waiting times, lifetimes.

6. Uniform $(a, b)$ — Continuous. PDF $f(x) = 1/(b - a)$ on $[a, b]$ . Mean $(a+b)/2$ . Models complete equal-likelihood — basis of inverse-transform sampling.