PGD24110105B

November 2024 — Solved

Computational Methods for Data Science
PGD01C05
2 hours 30 minutes
50 marks
Solved

End-Semester Examination, First Semester PG Diploma in Data Science and Analytics (2024 Admission).

Part A — Short Essay

Answer any five questions. Each question carries 2 marks. (5 × 2 = 10 Marks)


Q1. Write any two applications of Fourier transform.

  1. Audio / signal processing — decomposing a sound signal into its frequency components for equalisation, noise filtering, and compression (MP3 uses a related transform).
  2. Image processing — frequency-domain filtering for denoising, edge detection, and JPEG compression (which is built on the related DCT).

(Other valid uses: solving differential equations, MRI reconstruction, radar, telecommunications, spectroscopy.)


Q2. Explain the Strong Law of Large Numbers.

The Strong Law of Large Numbers (SLLN) states that if X1,X2,X_1, X_2, \dots are i.i.d. random variables with mean μ\mu, then the sample mean converges almost surely to μ\mu:

Xn=1ni=1nXia.s.μas n.\overline{X}_n = \frac{1}{n}\sum_{i=1}^{n} X_i \xrightarrow{a.s.} \mu \quad \text{as } n \to \infty.

This is the theoretical justification for estimating means from large samples — and underpins Monte Carlo simulation. "Strong" because the convergence holds with probability 1 (almost-sure), unlike the Weak Law which is convergence in probability.


Q3. What is the role of image acquisition in the image processing pipeline?

Image acquisition is the first stage — converting a physical scene into a digital image via a sensor (camera, scanner, MRI coil, satellite imager). The sensor performs sampling (spatial discretisation, governed by Nyquist) and quantisation (intensity discretisation). The quality of acquisition (resolution, noise, exposure, lens distortion) sets a hard upper bound on every downstream step (denoising, enhancement, segmentation, recognition).


Q4. How are eigenvectors related to principal components?

In Principal Component Analysis (PCA), the principal components are the eigenvectors of the covariance matrix of the centred data. They are the orthogonal directions of maximum variance. The corresponding eigenvalues equal the variance along each direction. The first PC = top eigenvector, second PC = next eigenvector, and so on.


Q5. List the three main steps involved in image processing.

  1. Image acquisition — capture and digitise (sensor → pixels).
  2. Image processing / enhancement — denoising, sharpening, restoration, transformation, segmentation, feature extraction.
  3. Image analysis / interpretation — classification, recognition, decision (e.g., is there a tumour in this MRI?).

Q6. Explain Type I and Type II error in statistical hypothesis testing.

  • Type I error (α\alpha, false positive) — rejecting a true null hypothesis H0H_0. Probability = significance level α\alpha (typically 0.05).
  • Type II error (β\beta, false negative) — failing to reject a false null hypothesis. The power of a test is 1β1 - \beta.

There is an unavoidable trade-off: lowering α\alpha raises β\beta (and vice-versa) for a fixed sample size.


Q7. What mathematical theory forms the foundation of the Fourier Series?

The Fourier Series is grounded in the theory of orthogonal function expansions in a Hilbert space, specifically L2L^2 inner product spaces. The sines and cosines {1,cos(nx),sin(nx)}n=1\{1, \cos(nx), \sin(nx)\}_{n=1}^{\infty} form a complete orthonormal basis on [π,π][-\pi, \pi], so any (square-integrable) periodic function can be uniquely written as a linear combination — exactly as a vector decomposes onto an orthonormal basis.


Part B — Long Essay

Answer any two questions. Each question carries 20 marks. (2 × 20 = 40 Marks)


Q8.

(a) Compare the Gaussian filter and Mean filter for noise reduction. Discuss their effects on image edges and noise suppression. (10 marks)

Both are linear spatial filters that smooth images by convolving with a kernel. They are widely used as the first step in image-processing pipelines.

Mean (Box / Average) filter.

Each output pixel is the unweighted average of its k×kk \times k neighbourhood:

I(x,y)=1k2i,jWI(x+i,y+j).I'(x, y) = \frac{1}{k^2} \sum_{i,j \in W} I(x+i, y+j).

3×3 example kernel: M=19(111111111).M = \frac{1}{9}\begin{pmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{pmatrix}.

Gaussian filter.

Weighted average using a 2-D Gaussian kernel, so pixels closer to the centre contribute more:

G(x,y)=12πσ2e(x2+y2)/(2σ2).G(x, y) = \frac{1}{2\pi \sigma^2} e^{-(x^2 + y^2)/(2 \sigma^2)}.

Larger σ\sigma → wider, smoother blur. The kernel is separable: a 2D Gaussian convolution equals two 1D convolutions — fast.

Comparison.

AspectMean filterGaussian filter
Kernel weightsUniformCentre-weighted (bell-shape)
Frequency responseSinc-shaped — has side-lobes, leaks high frequenciesGaussian — monotonically decreasing, no ringing
Noise suppressionGood for additive Gaussian noise — but roughBetter, smoother suppression of Gaussian noise
Edge preservationPoor — blurs edges hard; produces blocky artefactsSmoother blur but still blurs edges
ArtefactsRinging, halos near edgesSoft halos; minimal ringing
Smoothness vs detailStrong smoothing, loses fine detailTunable via σ\sigma — better trade-off
Computational costCheapestSlightly higher (separable kernel mitigates)
Salt-and-pepper noiseFails — keeps spikesFails — use a median filter instead
Theoretical linkSpecial case of weighted averageMaximum-entropy smoother
Typical useQuick smoothing, downsampling previewPre-processing for edge detection (Canny uses Gaussian), feature extraction

Effect on edges. Both filters are isotropic and linear — they don't know where the edges are. Both blur edges, but a Gaussian blurs more gracefully (no ringing), giving a softer result. For edge-preserving denoising one must instead use bilateral, median, non-local means, or modern BM3D / deep-learning denoisers.

Effect on noise. For additive Gaussian noise, the Gaussian filter is the optimal linear smoother (Wiener-filter limit) — its frequency response matches the noise's spectral assumptions and avoids the side-lobes of a box filter. Box filtering is acceptable for casual smoothing but generally inferior.

(b) Benefits of digital image processing for improving image quality and automating tasks. (10 marks)

Image-quality improvements.

  1. Noise reduction — Gaussian, median, non-local means and modern deep denoisers remove sensor noise, increasing perceived sharpness.
  2. Contrast enhancement — histogram equalisation, CLAHE, gamma correction reveal detail hidden in shadows / highlights.
  3. Sharpening — unsharp masking, Laplacian-based enhancement strengthen edges and structure.
  4. Restoration — deblurring (motion / focus), inpainting of missing regions, dust / scratch removal in archival images.
  5. White-balance and colour correction — restore natural colours under arbitrary illumination.
  6. Super-resolution — single-image and learning-based methods reconstruct higher-resolution images.
  7. Geometric correction — perspective rectification, lens-distortion removal.
  8. Compression — JPEG/JPEG-2000/HEIF reduce file size with minimal perceptible loss.

Task automation.

  1. Object detection and classification — autonomous driving, surveillance, defect detection on factory lines.
  2. Medical diagnosis — automatic detection of tumours in CT/MRI, diabetic retinopathy screening, X-ray triage.
  3. Optical Character Recognition (OCR) — automated form processing, postal sorting, document digitisation.
  4. Face recognition — biometric authentication, unlocking phones, airport gates.
  5. Satellite and remote sensing — land-use classification, deforestation monitoring, crop-yield estimation.
  6. Agriculture — disease detection on leaves, fruit-ripeness sorting, weed identification.
  7. Industrial inspection — robotic quality control on production lines.
  8. Sports and entertainment — Hawk-Eye in tennis/cricket, motion capture, AR filters.
  9. Security and forensics — license-plate recognition, video analytics, fingerprint matching.
  10. Augmented reality — real-time scene understanding for AR overlays.

Strategic benefits.

  • Speed and scale. A model can process millions of images / hour — impossible manually.
  • Consistency. No fatigue, no inter-rater variability.
  • Cost reduction. Automates labour-intensive tasks (medical screening, quality control).
  • New capabilities. Sees patterns humans miss (subtle pixel-level cancer features).
  • Data-driven decisions. Quantifies what was previously subjective ("looks like a defect" → 0.87 probability).
  • 24/7 operation. Round-the-clock surveillance, monitoring, screening.

Q9.

(a) Process of computing the Singular Value Decomposition (SVD) of a matrix, including the steps to calculate eigenvalues and eigenvectors. (10 marks)

Statement. For any real matrix ARm×nA \in \mathbb{R}^{m \times n} there exist orthogonal URm×mU \in \mathbb{R}^{m \times m}, VRn×nV \in \mathbb{R}^{n \times n} and a diagonal ΣRm×n\Sigma \in \mathbb{R}^{m \times n} (with non-negative entries σ1σ20\sigma_1 \ge \sigma_2 \ge \dots \ge 0) such that

A=UΣVT.\boxed{A = U \Sigma V^T}.

The σi\sigma_i are the singular values, columns of UU are left singular vectors, columns of VV are right singular vectors.

Computational process.

Step 1 — Form ATAA^T A. This is n×nn \times n, symmetric, positive semi-definite.

Step 2 — Eigendecompose ATAA^T A.

Solve det(ATAλI)=0\det(A^T A - \lambda I) = 0 to obtain eigenvalues λ1λ2λn0\lambda_1 \ge \lambda_2 \ge \dots \ge \lambda_n \ge 0 and corresponding orthonormal eigenvectors v1,v2,,vnv_1, v_2, \dots, v_n via (ATAλiI)vi=0(A^T A - \lambda_i I) v_i = 0.

The right singular vectors are these viv_i; assemble into V=[v1v2vn]V = [v_1 \, v_2 \, \dots \, v_n].

Step 3 — Singular values.

σi=λi,i=1,,min(m,n).\sigma_i = \sqrt{\lambda_i}, \quad i = 1, \dots, \min(m, n).

Place them on the diagonal of Σ\Sigma, descending.

Step 4 — Left singular vectors.

For each non-zero σi\sigma_i:

ui=1σiAvi.u_i = \frac{1}{\sigma_i} A v_i.

Verify ui=1\|u_i\| = 1 and they are orthogonal. Stack into U=[u1u2ur]U = [u_1 \, u_2 \, \dots \, u_r] (where rr = rank of AA). Extend UU to a full orthonormal basis if needed.

Step 5 — Verify. Reconstruct A=UΣVTA = U \Sigma V^T and check.

Worked example. A=(3045)A = \begin{pmatrix} 3 & 0 \\ 4 & 5 \end{pmatrix}.

Step 1. ATA=(3405)(3045)=(25202025)A^T A = \begin{pmatrix} 3 & 4 \\ 0 & 5 \end{pmatrix} \begin{pmatrix} 3 & 0 \\ 4 & 5 \end{pmatrix} = \begin{pmatrix} 25 & 20 \\ 20 & 25 \end{pmatrix}.

Step 2. Eigenvalues: det(ATAλI)=(25λ)2400=0λ=45,5\det(A^T A - \lambda I) = (25 - \lambda)^2 - 400 = 0 \Rightarrow \lambda = 45, 5.

Eigenvectors: for λ=45\lambda = 45: 20v1+20v2=0v=12(1,1)T-20 v_1 + 20 v_2 = 0 \Rightarrow v = \frac{1}{\sqrt{2}}(1, 1)^T.

For λ=5\lambda = 5: 20v1+20v2=0v=12(1,1)T20 v_1 + 20 v_2 = 0 \Rightarrow v = \frac{1}{\sqrt{2}}(1, -1)^T.

Step 3. Singular values σ1=456.71,  σ2=52.24\sigma_1 = \sqrt{45} \approx 6.71, \; \sigma_2 = \sqrt{5} \approx 2.24.

Step 4. Left singular vectors: u1=145A12(1,1)T=190(3,9)T=110(1,3)Tu_1 = \dfrac{1}{\sqrt{45}} A \cdot \dfrac{1}{\sqrt{2}}(1,1)^T = \dfrac{1}{\sqrt{90}}(3, 9)^T = \dfrac{1}{\sqrt{10}}(1, 3)^T.

u2=15A12(1,1)T=110(3,1)Tu_2 = \dfrac{1}{\sqrt{5}} A \cdot \dfrac{1}{\sqrt{2}}(1, -1)^T = \dfrac{1}{\sqrt{10}}(3, -1)^T.

Result. A=UΣVTA = U \Sigma V^T with U=110(1331),  Σ=(45005),  V=12(1111)U = \dfrac{1}{\sqrt{10}}\begin{pmatrix} 1 & 3 \\ 3 & -1 \end{pmatrix}, \; \Sigma = \begin{pmatrix} \sqrt{45} & 0 \\ 0 & \sqrt{5} \end{pmatrix}, \; V = \dfrac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}.

Practical algorithms. Direct eigendecomposition is unstable for large matrices. Production SVD uses Householder bidiagonalisation + QR iterations (Golub–Kahan), Jacobi rotations, or Lanczos/randomised SVD for very large sparse matrices.

(b) Step-by-step process of Principal Component Analysis (PCA), including the mathematical rationale for each step. (10 marks)

Goal. Find a lower-dimensional orthogonal basis that preserves maximum variance of the data — equivalently, that minimises reconstruction error (Eckart–Young).

Step 1 — Centre the data. xi=xix.x_i' = x_i - \overline{x}. Rationale. PCA finds directions through the origin; centring places the data's centre of mass at the origin so the first PC truly captures variance, not the offset of the mean.

Step 2 — Standardise (optional). Divide by std of each feature. Rationale. Variance is scale-sensitive: a feature in millimetres will dwarf one in kilograms unless both are standardised. Skip only if features share units.

Step 3 — Compute covariance matrix. Σ=1n1XTX(X=centred data).\Sigma = \frac{1}{n - 1} X^T X \quad (X = \text{centred data}). Σ\Sigma is symmetric, positive semi-definite. Σij\Sigma_{ij} tells how features ii and jj co-vary. Rationale. PCA seeks directions of maximum variance — variance is encoded in Σ\Sigma.

Step 4 — Eigendecompose Σ\Sigma.

Solve Σv=λv\Sigma v = \lambda v. Obtain eigenpairs (λi,vi)(\lambda_i, v_i).

  • Eigenvectors viv_i are orthonormal directions in feature space.
  • Eigenvalues λi\lambda_i are the variance along viv_i.

Rationale. The direction vv that maximises Var(vTX)=vTΣv\text{Var}(v^T X) = v^T \Sigma v subject to v=1\|v\|=1 is, by the Rayleigh quotient, the eigenvector of Σ\Sigma with the largest eigenvalue. The next-best orthogonal direction is the second eigenvector, and so on.

Step 5 — Sort and select top-kk components.

Sort λ1λ2λp\lambda_1 \ge \lambda_2 \ge \dots \ge \lambda_p and choose top kk — typically to keep, say, 95% cumulative variance: i=1kλii=1pλi0.95.\frac{\sum_{i=1}^{k} \lambda_i}{\sum_{i=1}^{p} \lambda_i} \ge 0.95. Rationale. Most variance is concentrated in a few directions; the rest are noise.

Step 6 — Project.

Form W=[v1,v2,,vk]W = [v_1, v_2, \dots, v_k] and project: Z=XW(shape: n×k).Z = X W \quad (\text{shape: } n \times k). Rationale. ZZ is the data in the reduced-dim PCA basis. Each row ziz_i is the new representation of xix_i.

Step 7 — Reconstruct or analyse.

Reconstruct: X^=ZWT\hat X = Z W^T (back to original space). The reconstruction error is i>kλi\sum_{i > k} \lambda_i — by Eckart–Young this is the minimum possible for any rank-kk approximation.

Equivalence with SVD. PCA can be done directly via SVD of the centred data matrix X=UΣVTX = U \Sigma V^T: the principal components are columns of VV, and the singular values squared (over n1n-1) are the variances. SVD is numerically preferable to forming XTXX^T X explicitly.

Applications.

  • Dimensionality reduction for ML (faster training, less overfitting).
  • Visualisation in 2D / 3D.
  • Denoising by reconstructing from top components.
  • Compression — Eigenfaces, hyperspectral imaging.

Caveats.

  • PCA is linear — for non-linear manifolds use Kernel PCA, t-SNE, UMAP, autoencoders.
  • Principal components are not interpretable as original features.
  • Sensitive to scaling and outliers.

Q10.

(a) Process of image recognition and its significance in computer vision, with examples. (10 marks)

Definition. Image recognition is the task of identifying what is in an image — assigning class labels to entire images, regions, or pixels. It is a central problem in computer vision.

Typical pipeline.

1. Image acquisition. Camera, scanner, MRI — produces raw pixel array.

2. Pre-processing.

  • Resize, crop, normalise pixel intensities to [0,1][0, 1].
  • Denoise (Gaussian / median).
  • Colour normalisation, white-balance.
  • Augment (rotate, flip, jitter) during training.

3. Feature extraction. Two paradigms:

  • Hand-crafted features (classical): SIFT, SURF, HOG, LBP, Haar, colour histograms.
  • Learned features (deep learning): Convolutional Neural Networks (CNNs) learn hierarchical features automatically — early layers detect edges, mid layers detect parts, deep layers detect object concepts.

4. Classification / decision.

  • Classical: SVM, Random Forest, KNN on the extracted features.
  • Deep: softmax head on top of a CNN backbone (ResNet, EfficientNet, ViT).

5. Post-processing. Non-max suppression (object detection), morphological cleanup, threshold tuning.

6. Output. Label + confidence; or bounding boxes; or pixel-wise segmentation mask depending on task.

Categories of recognition tasks.

TaskOutput
ClassificationWhole-image label (e.g., "cat")
LocalisationLabel + bounding box
Object detectionMultiple labels + boxes (YOLO, Faster R-CNN)
Semantic segmentationClass label per pixel (U-Net, DeepLab)
Instance segmentationPer-pixel labels with instance IDs (Mask R-CNN)
Face recognitionIdentity verification or identification
OCRText strings from images

Significance in computer vision.

  • Foundation of modern AI applications — autonomous driving, medical imaging, robotics, AR/VR, surveillance.
  • Replaces or augments human vision for high-volume / dangerous / detailed tasks.
  • Enables new capabilities — recognises microscopic patterns, satellite features, defect types invisible to humans.
  • Drives ImageNet-style breakthroughs that propelled deep learning from 2012 onward (AlexNet → ResNet → ViT).

Examples.

  1. Medical imaging. Detect diabetic retinopathy from retinal fundus images (Google's deep-learning model achieved cardiologist-level accuracy). CheXNet detects 14 chest conditions from X-rays.

  2. Autonomous vehicles. Tesla, Waymo recognise pedestrians, traffic lights, lane markings in real time using CNNs.

  3. Face recognition. Apple Face ID, airport e-gates, photo-tagging in Google Photos.

  4. OCR / Document AI. Bank cheque scanning, invoice processing, Google Lens translation.

  5. Agriculture. Detect crop disease from leaf photos; classify ripeness; estimate yield from drone imagery.

  6. Industrial QC. Detect manufacturing defects on assembly lines (cracks, scratches, missing components).

  7. Wildlife conservation. Camera-trap image analysis to track endangered species automatically.

  8. Retail. Visual search ("find this product"), checkout-free stores (Amazon Go).

(b) Concept of compressed sensing and its key principles in signal reconstruction. (10 marks)

Motivation. Classical sampling theory (Nyquist–Shannon) demands sampling at at least twice the highest frequency. For modern signals (MRI, hyperspectral imaging, radar), Nyquist sampling is expensive in time and storage. Compressed Sensing (CS) — pioneered by Candès, Romberg, Tao and Donoho (2004–06) — shows we can recover a signal from far fewer measurements than Nyquist if the signal is sparse in some basis.

Setup. Let xRnx \in \mathbb{R}^n be the signal we want to recover. We take linear measurements: y=ΦxRm,mn.y = \Phi x \in \mathbb{R}^m, \quad m \ll n.

The matrix Φ\Phi is called the sensing / measurement matrix. Recovering xx from yy is underdetermined — infinitely many solutions. CS gives us conditions under which the correct xx can still be uniquely recovered.

Key principles.

1. Sparsity. The signal xx is kk-sparse in some basis Ψ\Psi — i.e., x=Ψαx = \Psi \alpha where α\alpha has at most kk non-zero entries. Natural images are sparse in wavelets; audio in DCT; communication signals in Fourier basis.

2. Incoherent measurements. The sensing matrix Φ\Phi must be incoherent with the sparsity basis Ψ\Psi — meaning rows of Φ\Phi are "spread out" in the basis Ψ\Psi. Random Gaussian / Bernoulli / random Fourier matrices are highly incoherent with almost any fixed basis.

3. Restricted Isometry Property (RIP). For some δk<1\delta_k < 1, (1δk)x22Φx22(1+δk)x22(1 - \delta_k) \|x\|_2^2 \le \|\Phi x\|_2^2 \le (1 + \delta_k) \|x\|_2^2 for all kk-sparse xx. RIP guarantees that distinct sparse signals produce distinct measurements — making them recoverable.

4. 1\ell_1 minimisation reconstruction. Given measurements yy, recover xx by solving the convex programme

x^=argminxx1subject to Φx=y.\hat x = \arg\min_x \|x\|_1 \quad \text{subject to } \Phi x = y.

The 1\ell_1 norm xi\sum |x_i| is a convex proxy for the (non-convex) 0\ell_0 count of non-zeros, and Candès–Tao showed that with sufficient incoherent measurements (mklog(n/k)m \gtrsim k \log(n/k)), 1\ell_1 recovery exactly equals 0\ell_0 recovery.

5. Stability and robustness. Realistic measurements have noise y=Φx+ny = \Phi x + n. The variant x^=argminxx1s.t.Φxy2η\hat x = \arg\min_x \|x\|_1 \quad \text{s.t.} \quad \|\Phi x - y\|_2 \le \eta (or its Lagrangian Lasso form) recovers xx with error bounded by the noise level.

Algorithms. Basis Pursuit / Lasso (convex optimisation), Orthogonal Matching Pursuit (greedy), Iterative Hard / Soft Thresholding, approximate message passing.

Why it matters — applications.

  • MRI — measure far fewer Fourier samples; reconstruct via CS. Reduces scan time from 30 min to a few minutes, important for paediatric and cardiac MRI.
  • Single-pixel camera (Rice Univ.) — uses random masks to capture an image with one photodetector and CS reconstruction.
  • Compressed sensing radar — fewer pulses, sparser signal recovery.
  • Hyperspectral imaging, seismic data, networked sensors — wherever Nyquist sampling is prohibitive.
  • Machine learning — sparse recovery / Lasso / dictionary learning all rest on the same principles.

Significance. Compressed sensing rewrote our understanding of sampling: sparsity beats bandwidth. Whenever signals have few effective degrees of freedom in some basis, we can sense them with proportionally few measurements — a paradigm now embedded in MRI scanners, cameras, communication systems, and machine-learning regularisers.


Q11.

(a) Fourier Series, Fourier Transform, Fast Fourier Transform (FFT), Time-Frequency Analysis — differences and applications. (10 marks)

A family of related tools for decomposing signals into frequency components, each suited to different signal types and computational regimes.

1. Fourier Series (FS).

Applies to periodic signals. Any sufficiently nice periodic signal f(t)f(t) of period TT can be written as

f(t)=n=cnei2πnt/T,cn=1T0Tf(t)ei2πnt/Tdt.f(t) = \sum_{n=-\infty}^{\infty} c_n e^{i 2\pi n t / T}, \quad c_n = \frac{1}{T} \int_0^T f(t) e^{-i 2\pi n t / T} dt.

Outputs a discrete spectrum at integer multiples of 1/T1/T.

Applications. Analysing musical notes, AC power waveforms, periodic vibrations in mechanical systems, solving heat / wave equations on bounded domains.

2. Fourier Transform (FT).

Generalises FS to non-periodic, continuous signals over R\mathbb{R}:

F(ω)=f(t)eiωtdt,f(t)=12πF(ω)eiωtdω.F(\omega) = \int_{-\infty}^{\infty} f(t) e^{-i \omega t} dt, \quad f(t) = \frac{1}{2\pi}\int_{-\infty}^{\infty} F(\omega) e^{i\omega t} d\omega.

Outputs a continuous spectrum F(ω)F(\omega).

Applications. Signal-processing theory, telecommunications, image processing, optics, quantum mechanics, solving differential equations.

3. Discrete & Fast Fourier Transform (DFT / FFT).

Real-world signals are sampled. The DFT of {x0,x1,,xN1}\{x_0, x_1, \dots, x_{N-1}\} is

Xk=n=0N1xnei2πkn/N,k=0,,N1.X_k = \sum_{n=0}^{N-1} x_n \, e^{-i 2\pi k n / N}, \quad k = 0, \dots, N-1.

Naive DFT costs O(N2)O(N^2). The FFT (Cooley–Tukey 1965) computes it in O(NlogN)O(N \log N) — the algorithmic breakthrough that made digital signal processing practical.

Applications. MP3 / JPEG / video compression (related transforms), spectrum analysers, OFDM in 4G/5G, radar, MRI image reconstruction, real-time audio effects, convolutions via the frequency domain.

4. Time-Frequency Analysis.

The standard FT tells us what frequencies are present but not when they occur — it averages over all time. For non-stationary signals (speech, music, EEG), we need both. Approaches:

  • Short-Time Fourier Transform (STFT) — slide a short window across the signal, take FT in each window: X(t,ω)=f(τ)w(τt)eiωτdτ.X(t, \omega) = \int f(\tau) w(\tau - t) e^{-i \omega \tau} d\tau. Produces a spectrogram (frequency vs time vs amplitude). Trade-off: a short window gives good time resolution but poor frequency resolution and vice versa (Heisenberg uncertainty principle).

  • Wavelet transform — uses scalable wavelets instead of a fixed window, giving good time resolution at high frequency and good frequency resolution at low frequency. Multi-resolution analysis.

  • Wigner–Ville distribution — joint time-frequency distribution; high resolution but cross-term artefacts.

Applications. Speech recognition, music analysis, biomedical signals (EEG, ECG), seismic data, sonar / radar, fault detection in rotating machinery.

Side-by-side comparison.

ToolSignal typeOutputTime/freq resolutionComplexityBest use
Fourier SeriesPeriodic, continuousDiscrete spectrumNo time infoPeriodic signals
Fourier TransformNon-periodic, continuousContinuous spectrumNo time infoTheory, continuous signals
DFT / FFTDiscrete, finiteDiscrete spectrumNo time infoO(NlogN)O(N \log N)Digital signals, real-time DSP
STFT / WaveletsDiscrete or continuousTime–frequency mapTime AND freqO(NlogN)O(N \log N) per frameNon-stationary, transient signals

(b) What are random variables and their types? Give any four types of random variables with their probability distribution functions. (10 marks)

Random variable (RV). A random variable is a function X:ΩRX : \Omega \to \mathbb{R} that assigns a real number to each outcome of a random experiment. It is the bridge between abstract sample spaces and numerical analysis — it lets us talk about probabilities of numbers (P(X5)P(X \le 5)) rather than probabilities of outcomes.

Distribution function. For any RV, the Cumulative Distribution Function (CDF) F(x)=P(Xx)F(x) = P(X \le x) exists; it is non-decreasing, right-continuous, with limxF=0\lim_{x \to -\infty} F = 0 and limxF=1\lim_{x \to \infty} F = 1.

Types of random variables.

1. Discrete RV. Takes countably many values. Described by a Probability Mass Function (PMF) p(x)=P(X=x)p(x) = P(X = x), with xp(x)=1\sum_x p(x) = 1.

2. Continuous RV. Takes values in an interval. Described by a Probability Density Function (PDF) f(x)0f(x) \ge 0, with f(x)dx=1\int_{-\infty}^{\infty} f(x) dx = 1. P(X=c)=0P(X = c) = 0 for any single point; probabilities arise from intervals.

3. Mixed RV. Combination — e.g., insurance payouts that are 0 with non-zero probability and continuous when positive.

4. Multivariate RV (random vector). A vector X=(X1,,Xn)X = (X_1, \dots, X_n) whose components are jointly random.


Four specific distributions.

1. Bernoulli(p)(p) — Discrete.

PMF: P(X=1)=p,  P(X=0)=1pP(X = 1) = p, \; P(X = 0) = 1 - p, 0p10 \le p \le 1.

Mean =p= p, variance =p(1p)= p(1-p). Models a single yes/no trial.

2. Binomial(n,p)(n, p) — Discrete.

PMF: P(X=k)=(nk)pk(1p)nk,  k=0,1,,nP(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}, \; k = 0, 1, \dots, n.

Mean =np= np, variance =np(1p)= np(1 - p). Models the number of successes in nn independent Bernoulli trials. Used in A/B testing, quality control.

3. Poisson(λ)(\lambda) — Discrete.

PMF: P(X=k)=eλλkk!,  k=0,1,2,P(X = k) = \dfrac{e^{-\lambda} \lambda^k}{k!}, \; k = 0, 1, 2, \dots.

Mean =λ= \lambda, variance =λ= \lambda. Models the count of rare independent events in a fixed time/space window — call-centre arrivals, defects per page, photons per pixel.

4. Normal(μ,σ2)(\mu, \sigma^2) — Continuous.

PDF: f(x)=1σ2πexp ⁣((xμ)22σ2),  xRf(x) = \dfrac{1}{\sigma \sqrt{2\pi}} \exp\!\left( -\dfrac{(x - \mu)^2}{2\sigma^2} \right), \; x \in \mathbb{R}.

Mean =μ= \mu, variance =σ2= \sigma^2. The most important continuous distribution — by the Central Limit Theorem, sample means of any finite-variance distribution converge to normal as nn \to \infty. Models measurement errors, heights, IQ scores, asset returns.

(Two more common ones, for context.)

5. Exponential(λ)(\lambda) — Continuous. PDF f(x)=λeλx,x0f(x) = \lambda e^{-\lambda x}, x \ge 0. Mean 1/λ1/\lambda. Models inter-arrival times in a Poisson process — waiting times, lifetimes.

6. Uniform(a,b)(a, b) — Continuous. PDF f(x)=1/(ba)f(x) = 1/(b - a) on [a,b][a, b]. Mean (a+b)/2(a+b)/2. Models complete equal-likelihood — basis of inverse-transform sampling.