PGD01C01
Module 3 · Linear Algebra for Data Science

Matrix Factorizations

Core Titles
Key headlines and terms for quick recall
  • LU decompositionA=LUA = LU
  • QR decompositionA=QRA = QR, QQ orthogonal, RR upper triangular
  • CholeskyA=LLTA = LL^T for symmetric positive-definite AA
  • EigendecompositionA=PDP1A = P D P^{-1}
  • Singular Value Decomposition (SVD)A=UΣVTA = U \Sigma V^T
Basic Idea
What it is, why it matters, how it works

Why factorize?

A factorization writes a matrix as a product of simpler matrices. This makes solving systems, inverting, and computing eigenvalues much faster and more numerically stable.

LU decomposition

A=LUA = LU LL lower triangular with unit diagonal, UU upper triangular. Used to solve Ax=bAx = b in two triangular sweeps: Ly=bLy = b, then Ux=yUx = y. Equivalent to Gaussian elimination. Works for any non-singular square matrix (with row pivoting if needed: PA=LUPA = LU).

QR decomposition

A=QRA = QR QQ has orthonormal columns (QTQ=IQ^T Q = I), RR is upper triangular. Computed by Gram–Schmidt or Householder reflections. Used in:

  • Least squares: x^=R1QTb\hat{x} = R^{-1} Q^T b
  • QR algorithm for eigenvalues

Cholesky decomposition

For symmetric positive-definite AA: A=LLTA = L L^T LL lower triangular with positive diagonal. Twice as fast as LU. Used in covariance matrices, Gaussian processes, simulating multivariate normals.

Eigendecomposition

A=PDP1A = P D P^{-1} PP has eigenvectors, DD diagonal of eigenvalues. Requires AA to have a full set of linearly independent eigenvectors. For symmetric AA: A=QΛQTA = Q \Lambda Q^T, QQ orthogonal.

Singular Value Decomposition (SVD)

A=UΣVTA = U \Sigma V^T where for ARm×nA \in \mathbb{R}^{m \times n}:

  • UU is m×mm \times m orthogonal (left singular vectors)
  • Σ\Sigma is m×nm \times n diagonal of non-negative singular values σ1σ20\sigma_1 \ge \sigma_2 \ge \dots \ge 0
  • VV is n×nn \times n orthogonal (right singular vectors)

SVD always exists. Singular values are eigenvalues of ATA\sqrt{\text{eigenvalues of } A^T A}.

Why this matters in Data Science

  • PCA = eigendecomposition of covariance, or equivalently SVD of centred data
  • Latent Semantic Analysis uses truncated SVD of term-document matrices
  • Recommender systems factor the rating matrix
  • Image compression by keeping only the top singular values
  • Solving least squares via QR
Mind Map
Visual structure of the concept
MATRIX FACTORIZATIONS
├── LU:        A = LU       (Gauss elim, solve Ax = b)
├── QR:        A = QR       (least squares, eigenvalues)
├── Cholesky:  A = LLᵀ     (symmetric pos-def)
├── Eigen:     A = PDP⁻¹  (eigenvalues / vectors)
│   └── If A=Aᵀ ⇒ A = QΛQᵀ
└── SVD:       A = UΣVᵀ
    ├── Always exists
    ├── σᵢ = √(eig(AᵀA))
    └── Truncated SVD = low-rank approximation
Exam Q&A
Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. Write the LU decomposition idea. A=LUA = LU where LL is lower triangular and UU upper triangular; used to solve Ax=bAx = b via two triangular systems.

Q2. What is a QR decomposition? A=QRA = QR where QQ has orthonormal columns and RR is upper triangular.

Q3. When is Cholesky factorization possible? When AA is symmetric and positive-definite. Then A=LLTA = LL^T.

Q4. State the SVD. Any m×nm \times n matrix AA can be written as A=UΣVTA = U \Sigma V^T with U,VU, V orthogonal and Σ\Sigma diagonal of non-negative singular values.


Part B (20 marks)

Q. Explain Singular Value Decomposition (SVD). State its existence theorem and discuss how it generalises eigendecomposition. Mention its applications in data science.

Definition. For any matrix ARm×nA \in \mathbb{R}^{m \times n}, the SVD is A=UΣVTA = U \Sigma V^T where:

  • URm×mU \in \mathbb{R}^{m \times m} is orthogonal (UTU=IU^T U = I). Its columns are left singular vectors.
  • VRn×nV \in \mathbb{R}^{n \times n} is orthogonal. Its columns are right singular vectors.
  • Σ\Sigma is m×nm \times n with σ1σ2σr>0\sigma_1 \ge \sigma_2 \ge \dots \ge \sigma_r > 0 on the diagonal (the singular values) and zeros elsewhere. rr = rank of AA.

Existence. SVD exists for every real matrix (no assumptions of squareness or invertibility). Singular values are square roots of eigenvalues of ATAA^T A (or AATA A^T); the right singular vectors are the eigenvectors of ATAA^T A and left singular vectors are eigenvectors of AATA A^T.

Generalisation of eigendecomposition.

  • Eigendecomposition A=PDP1A = P D P^{-1} requires AA to be square and have a full set of eigenvectors. Many matrices don't.
  • SVD applies to any matrix.
  • For a symmetric positive-definite AA, SVD and eigendecomposition coincide: U=V=QU = V = Q, Σ=Λ\Sigma = \Lambda.

Low-rank approximation (Eckart–Young). Keep the top kk singular triplets: Ak=i=1kσiuiviTA_k = \sum_{i=1}^{k} \sigma_i u_i v_i^T AkA_k is the best rank-kk approximation to AA in Frobenius and spectral norms.

Applications in Data Science.

  1. Principal Component Analysis (PCA) — SVD of centred data; right singular vectors are PCs, singular values give variances.
  2. Latent Semantic Analysis (LSA) — truncated SVD of term-document matrices to find latent topics.
  3. Recommender systems — matrix-factorise the user–item rating matrix.
  4. Image compression — keep top kk singular triplets; reconstruct an approximate image with far fewer numbers.
  5. Pseudo-inverseA+=VΣ+UTA^+ = V \Sigma^+ U^T, used to solve overdetermined least-squares problems.
  6. Noise reduction — small singular values often correspond to noise; dropping them denoises.

SVD is sometimes called the most important factorization in numerical linear algebra.