PGD01C01
Module 3 · Linear Algebra for Data Science

Projections, Hyperplanes and Half-Planes

Core Titles
Key headlines and terms for quick recall
  • Projection of uu onto vv: projvu=u,vv,vv\text{proj}_v u = \dfrac{\langle u, v \rangle}{\langle v, v \rangle} v
  • Orthogonal projection onto a subspace
  • Projection matrix P=A(ATA)1ATP = A(A^T A)^{-1} A^T
  • Hyperplane wTx+b=0w^T x + b = 0 — codimension-1 affine subspace
  • Half-plane / Half-space
  • Distance from point to hyperplane wTx0+bw\dfrac{|w^T x_0 + b|}{\|w\|}
Basic Idea
What it is, why it matters, how it works

Projection of one vector onto another

The component of uu in the direction of vv is the scalar projection u,vv\dfrac{\langle u, v \rangle}{\|v\|}. The vector projection is: projv(u)=u,vv,vv.\text{proj}_v(u) = \frac{\langle u, v \rangle}{\langle v, v \rangle} \, v.

Property. uprojvuu - \text{proj}_v u is orthogonal to vv (this is the "drop the perpendicular" interpretation).

Orthogonal projection onto a subspace

Given subspace S=col(A)S = \text{col}(A), the projection of bb onto SS is: b^=A(ATA)1ATb=Pb\hat{b} = A(A^T A)^{-1} A^T b = Pb where P=A(ATA)1ATP = A(A^T A)^{-1} A^T is the projection matrix. Properties: P2=PP^2 = P (idempotent), PT=PP^T = P (symmetric).

The residual bb^b - \hat{b} is orthogonal to SS. This is exactly what least squares does.

Hyperplane

In Rn\mathbb{R}^n, a hyperplane is the set H={xRn:wTx+b=0}H = \{x \in \mathbb{R}^n : w^T x + b = 0\} with w0w \ne 0. It has dimension n1n - 1.

  • In R2\mathbb{R}^2: hyperplane = line.
  • In R3\mathbb{R}^3: hyperplane = plane.
  • In Rn\mathbb{R}^n: (n1)(n-1)-dim flat.

The vector ww is normal (perpendicular) to the hyperplane.

Half-space

H+={x:wTx+b0},H={x:wTx+b0}.H^+ = \{x : w^T x + b \ge 0\}, \quad H^- = \{x : w^T x + b \le 0\}. Two half-spaces meeting at the hyperplane.

Distance from point to hyperplane

For x0Rnx_0 \in \mathbb{R}^n and hyperplane wTx+b=0w^T x + b = 0: d(x0,H)=wTx0+bw.d(x_0, H) = \frac{|w^T x_0 + b|}{\|w\|}.

Why this matters in Data Science

  • Linear regression = orthogonal projection onto the column space of the design matrix.
  • SVM finds the maximum-margin hyperplane wTx+b=0w^T x + b = 0.
  • Logistic regression separates classes via a hyperplane decision boundary.
  • PCA projects high-dim data onto low-dim subspaces.
Mind Map
Visual structure of the concept
PROJECTIONS & HYPERPLANES
├── projᵥ u = ⟨u,v⟩/⟨v,v⟩ · v
├── u − projᵥ u  ⊥  v
├── Orthogonal projection onto col(A)
│   ├── P = A(AᵀA)⁻¹Aᵀ
│   ├── P² = P,  Pᵀ = P
│   └── Least squares solution
└── Hyperplanes
    ├── wᵀx + b = 0
    ├── Normal vector w
    ├── Half-space wᵀx + b ≥ 0
    └── Distance |wᵀx₀ + b| / ‖w‖
Exam Q&A
Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. Give the formula for projection of uu onto vv. projvu=u,vv,vv\text{proj}_v u = \dfrac{\langle u, v \rangle}{\langle v, v \rangle} v.

Q2. Define a hyperplane in Rn\mathbb{R}^n. The set {xRn:wTx+b=0}\{x \in \mathbb{R}^n : w^T x + b = 0\} for some w0w \ne 0 and scalar bb.

Q3. Give the distance formula from a point to a hyperplane. d=wTx0+bwd = \dfrac{|w^T x_0 + b|}{\|w\|}.

Q4. What is the projection matrix onto the column space of AA? P=A(ATA)1ATP = A(A^T A)^{-1} A^T. It is symmetric and idempotent.


Part B (20 marks)

Q. Derive the projection of one vector onto another. Define hyperplane and half-plane. Derive the distance from a point to a hyperplane and discuss its use in Support Vector Machines.

Projection of uu onto vv. Write u=αv+wu = \alpha v + w where wvw \perp v. Take inner product with vv: u,v=αv,v+w,v=αv,v\langle u, v \rangle = \alpha \langle v, v \rangle + \langle w, v \rangle = \alpha \langle v, v \rangle (since w,v=0\langle w, v \rangle = 0). Solving: α=u,vv,v,projvu=u,vv,vv.\alpha = \frac{\langle u, v \rangle}{\langle v, v \rangle}, \quad \text{proj}_v u = \frac{\langle u, v \rangle}{\langle v, v \rangle} v.

Hyperplane. In Rn\mathbb{R}^n, a hyperplane is the set H={x:wTx+b=0}H = \{x : w^T x + b = 0\} where wRn{0}w \in \mathbb{R}^n \setminus \{0\}. It is (n1)(n-1)-dimensional, with ww as a normal vector.

Half-spaces. H+={x:wTx+b0}H^+ = \{x : w^T x + b \ge 0\} and H={x:wTx+b0}H^- = \{x : w^T x + b \le 0\}.

Distance from x0x_0 to HH. Take any point xHx^* \in H (so wTx+b=0w^T x^* + b = 0). The shortest distance from x0x_0 to HH is the length of the projection of (x0x)(x_0 - x^*) onto ww (since wHw \perp H): d(x0,H)=wT(x0x)w=wTx0wTxw=wTx0+bw.d(x_0, H) = \frac{|w^T (x_0 - x^*)|}{\|w\|} = \frac{|w^T x_0 - w^T x^*|}{\|w\|} = \frac{|w^T x_0 + b|}{\|w\|}.

Application — Support Vector Machines (SVM). Given labelled data (xi,yi)(x_i, y_i) with yi{1,+1}y_i \in \{-1, +1\}, an SVM seeks a hyperplane wTx+b=0w^T x + b = 0 that separates the classes with maximum margin.

For correct classification: yi(wTxi+b)1y_i (w^T x_i + b) \ge 1 for every ii.

The geometric margin between the two parallel hyperplanes wTx+b=±1w^T x + b = \pm 1 is 2w\dfrac{2}{\|w\|}. Maximising the margin is equivalent to: minw,b12w2subject toyi(wTxi+b)1.\min_{w, b} \frac{1}{2}\|w\|^2 \quad \text{subject to} \quad y_i(w^T x_i + b) \ge 1.

The optimisation finds the maximum-margin hyperplane. The points satisfying equality yi(wTxi+b)=1y_i(w^T x_i + b) = 1 are the support vectors — they sit exactly on the margin boundary and entirely determine the classifier.

Once trained, a new point xx is classified by the sign of wTx+bw^T x + b, with confidence proportional to the distance wTx+bw\dfrac{|w^T x + b|}{\|w\|} from the hyperplane.