PGDDSA Study · Semester 1

PGD01C01

Module 3 · Linear Algebra for Data Science

Matrix Factorizations

Core Titles

Key headlines and terms for quick recall

LU decomposition — $A = LU$
QR decomposition — $A = QR$ , $Q$ orthogonal, $R$ upper triangular
Cholesky — $A = LL^T$ for symmetric positive-definite $A$
Eigendecomposition — $A = P D P^{-1}$
Singular Value Decomposition (SVD) — $A = U \Sigma V^T$

Basic Idea

What it is, why it matters, how it works

Why factorize?

A factorization writes a matrix as a product of simpler matrices. This makes solving systems, inverting, and computing eigenvalues much faster and more numerically stable.

LU decomposition

$A = LU$ $L$ lower triangular with unit diagonal, $U$ upper triangular. Used to solve $Ax = b$ in two triangular sweeps: $Ly = b$ , then $Ux = y$ . Equivalent to Gaussian elimination. Works for any non-singular square matrix (with row pivoting if needed: $PA = LU$ ).

QR decomposition

$A = QR$ $Q$ has orthonormal columns ( $Q^T Q = I$ ), $R$ is upper triangular. Computed by Gram–Schmidt or Householder reflections. Used in:

Least squares: $\hat{x} = R^{-1} Q^T b$
QR algorithm for eigenvalues

Cholesky decomposition

For symmetric positive-definite $A$ : $A = L L^T$ $L$ lower triangular with positive diagonal. Twice as fast as LU. Used in covariance matrices, Gaussian processes, simulating multivariate normals.

Eigendecomposition

$A = P D P^{-1}$ $P$ has eigenvectors, $D$ diagonal of eigenvalues. Requires $A$ to have a full set of linearly independent eigenvectors. For symmetric $A$ : $A = Q \Lambda Q^T$ , $Q$ orthogonal.

Singular Value Decomposition (SVD)

$A = U \Sigma V^T$ where for $A \in \mathbb{R}^{m \times n}$ :

$U$ is $m \times m$ orthogonal (left singular vectors)
$\Sigma$ is $m \times n$ diagonal of non-negative singular values $\sigma_1 \ge \sigma_2 \ge \dots \ge 0$
$V$ is $n \times n$ orthogonal (right singular vectors)

SVD always exists. Singular values are $\sqrt{\text{eigenvalues of } A^T A}$ .

Why this matters in Data Science

PCA = eigendecomposition of covariance, or equivalently SVD of centred data
Latent Semantic Analysis uses truncated SVD of term-document matrices
Recommender systems factor the rating matrix
Image compression by keeping only the top singular values
Solving least squares via QR

Mind Map

Visual structure of the concept

MATRIX FACTORIZATIONS
├── LU:        A = LU       (Gauss elim, solve Ax = b)
├── QR:        A = QR       (least squares, eigenvalues)
├── Cholesky:  A = LLᵀ     (symmetric pos-def)
├── Eigen:     A = PDP⁻¹  (eigenvalues / vectors)
│   └── If A=Aᵀ ⇒ A = QΛQᵀ
└── SVD:       A = UΣVᵀ
    ├── Always exists
    ├── σᵢ = √(eig(AᵀA))
    └── Truncated SVD = low-rank approximation

Exam Q&A

Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. Write the LU decomposition idea. $A = LU$ where $L$ is lower triangular and $U$ upper triangular; used to solve $Ax = b$ via two triangular systems.

Q2. What is a QR decomposition? $A = QR$ where $Q$ has orthonormal columns and $R$ is upper triangular.

Q3. When is Cholesky factorization possible? When $A$ is symmetric and positive-definite. Then $A = LL^T$ .

Q4. State the SVD. Any $m \times n$ matrix $A$ can be written as $A = U \Sigma V^T$ with $U, V$ orthogonal and $\Sigma$ diagonal of non-negative singular values.

Part B (20 marks)

Q. Explain Singular Value Decomposition (SVD). State its existence theorem and discuss how it generalises eigendecomposition. Mention its applications in data science.

Definition. For any matrix $A \in \mathbb{R}^{m \times n}$ , the SVD is $A = U \Sigma V^T$ where:

$U \in \mathbb{R}^{m \times m}$ is orthogonal ( $U^T U = I$ ). Its columns are left singular vectors.
$V \in \mathbb{R}^{n \times n}$ is orthogonal. Its columns are right singular vectors.
$\Sigma$ is $m \times n$ with $\sigma_1 \ge \sigma_2 \ge \dots \ge \sigma_r > 0$ on the diagonal (the singular values) and zeros elsewhere. $r$ = rank of $A$ .

Existence. SVD exists for every real matrix (no assumptions of squareness or invertibility). Singular values are square roots of eigenvalues of $A^T A$ (or $A A^T$ ); the right singular vectors are the eigenvectors of $A^T A$ and left singular vectors are eigenvectors of $A A^T$ .

Generalisation of eigendecomposition.

Eigendecomposition $A = P D P^{-1}$ requires $A$ to be square and have a full set of eigenvectors. Many matrices don't.
SVD applies to any matrix.
For a symmetric positive-definite $A$ , SVD and eigendecomposition coincide: $U = V = Q$ , $\Sigma = \Lambda$ .

Low-rank approximation (Eckart–Young). Keep the top $k$ singular triplets: $A_k = \sum_{i=1}^{k} \sigma_i u_i v_i^T$ $A_k$ is the best rank- $k$ approximation to $A$ in Frobenius and spectral norms.

Applications in Data Science.

Principal Component Analysis (PCA) — SVD of centred data; right singular vectors are PCs, singular values give variances.
Latent Semantic Analysis (LSA) — truncated SVD of term-document matrices to find latent topics.
Recommender systems — matrix-factorise the user–item rating matrix.
Image compression — keep top $k$ singular triplets; reconstruct an approximate image with far fewer numbers.
Pseudo-inverse — $A^+ = V \Sigma^+ U^T$ , used to solve overdetermined least-squares problems.
Noise reduction — small singular values often correspond to noise; dropping them denoises.

SVD is sometimes called the most important factorization in numerical linear algebra.

Eigenvalues and Eigenvectors Inner Products and Distance Measures