PGDDSA Study · Semester 1

Core Titles

Key headlines and terms for quick recall

Projection of $u$ onto $v$ : $\text{proj}_v u = \dfrac{\langle u, v \rangle}{\langle v, v \rangle} v$
Orthogonal projection onto a subspace
Projection matrix $P = A(A^T A)^{-1} A^T$
Hyperplane $w^T x + b = 0$ — codimension-1 affine subspace
Half-plane / Half-space
Distance from point to hyperplane $\dfrac{|w^T x_0 + b|}{\|w\|}$

Basic Idea

What it is, why it matters, how it works

Projection of one vector onto another

The component of $u$ in the direction of $v$ is the scalar projection $\dfrac{\langle u, v \rangle}{\|v\|}$ . The vector projection is: $\text{proj}_v(u) = \frac{\langle u, v \rangle}{\langle v, v \rangle} \, v.$

Property. $u - \text{proj}_v u$ is orthogonal to $v$ (this is the "drop the perpendicular" interpretation).

Orthogonal projection onto a subspace

Given subspace $S = \text{col}(A)$ , the projection of $b$ onto $S$ is: $\hat{b} = A(A^T A)^{-1} A^T b = Pb$ where $P = A(A^T A)^{-1} A^T$ is the projection matrix. Properties: $P^2 = P$ (idempotent), $P^T = P$ (symmetric).

The residual $b - \hat{b}$ is orthogonal to $S$ . This is exactly what least squares does.

Hyperplane

In $\mathbb{R}^n$ , a hyperplane is the set $H = \{x \in \mathbb{R}^n : w^T x + b = 0\}$ with $w \ne 0$ . It has dimension $n - 1$ .

In $\mathbb{R}^2$ : hyperplane = line.
In $\mathbb{R}^3$ : hyperplane = plane.
In $\mathbb{R}^n$ : $(n-1)$ -dim flat.

The vector $w$ is normal (perpendicular) to the hyperplane.

Half-space

$H^+ = \{x : w^T x + b \ge 0\}, \quad H^- = \{x : w^T x + b \le 0\}.$ Two half-spaces meeting at the hyperplane.

Distance from point to hyperplane

For $x_0 \in \mathbb{R}^n$ and hyperplane $w^T x + b = 0$ : $d(x_0, H) = \frac{|w^T x_0 + b|}{\|w\|}.$

Why this matters in Data Science

Linear regression = orthogonal projection onto the column space of the design matrix.
SVM finds the maximum-margin hyperplane $w^T x + b = 0$ .
Logistic regression separates classes via a hyperplane decision boundary.
PCA projects high-dim data onto low-dim subspaces.

Mind Map

Visual structure of the concept

PROJECTIONS & HYPERPLANES
├── projᵥ u = ⟨u,v⟩/⟨v,v⟩ · v
├── u − projᵥ u  ⊥  v
├── Orthogonal projection onto col(A)
│   ├── P = A(AᵀA)⁻¹Aᵀ
│   ├── P² = P,  Pᵀ = P
│   └── Least squares solution
└── Hyperplanes
    ├── wᵀx + b = 0
    ├── Normal vector w
    ├── Half-space wᵀx + b ≥ 0
    └── Distance |wᵀx₀ + b| / ‖w‖

Exam Q&A

Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. Give the formula for projection of $u$ onto $v$ . $\text{proj}_v u = \dfrac{\langle u, v \rangle}{\langle v, v \rangle} v$ .

Q2. Define a hyperplane in $\mathbb{R}^n$ . The set $\{x \in \mathbb{R}^n : w^T x + b = 0\}$ for some $w \ne 0$ and scalar $b$ .

Q3. Give the distance formula from a point to a hyperplane. $d = \dfrac{|w^T x_0 + b|}{\|w\|}$ .

Q4. What is the projection matrix onto the column space of $A$ ? $P = A(A^T A)^{-1} A^T$ . It is symmetric and idempotent.

Part B (20 marks)

Q. Derive the projection of one vector onto another. Define hyperplane and half-plane. Derive the distance from a point to a hyperplane and discuss its use in Support Vector Machines.

Projection of $u$ onto $v$ . Write $u = \alpha v + w$ where $w \perp v$ . Take inner product with $v$ : $\langle u, v \rangle = \alpha \langle v, v \rangle + \langle w, v \rangle = \alpha \langle v, v \rangle$ (since $\langle w, v \rangle = 0$ ). Solving: $\alpha = \frac{\langle u, v \rangle}{\langle v, v \rangle}, \quad \text{proj}_v u = \frac{\langle u, v \rangle}{\langle v, v \rangle} v.$

Hyperplane. In $\mathbb{R}^n$ , a hyperplane is the set $H = \{x : w^T x + b = 0\}$ where $w \in \mathbb{R}^n \setminus \{0\}$ . It is $(n-1)$ -dimensional, with $w$ as a normal vector.

Half-spaces. $H^+ = \{x : w^T x + b \ge 0\}$ and $H^- = \{x : w^T x + b \le 0\}$ .

Distance from $x_0$ to $H$ . Take any point $x^* \in H$ (so $w^T x^* + b = 0$ ). The shortest distance from $x_0$ to $H$ is the length of the projection of $(x_0 - x^*)$ onto $w$ (since $w \perp H$ ): $d(x_0, H) = \frac{|w^T (x_0 - x^*)|}{\|w\|} = \frac{|w^T x_0 - w^T x^*|}{\|w\|} = \frac{|w^T x_0 + b|}{\|w\|}.$

Application — Support Vector Machines (SVM). Given labelled data $(x_i, y_i)$ with $y_i \in \{-1, +1\}$ , an SVM seeks a hyperplane $w^T x + b = 0$ that separates the classes with maximum margin.

For correct classification: $y_i (w^T x_i + b) \ge 1$ for every $i$ .

The geometric margin between the two parallel hyperplanes $w^T x + b = \pm 1$ is $\dfrac{2}{\|w\|}$ . Maximising the margin is equivalent to: $\min_{w, b} \frac{1}{2}\|w\|^2 \quad \text{subject to} \quad y_i(w^T x_i + b) \ge 1.$

The optimisation finds the maximum-margin hyperplane. The points satisfying equality $y_i(w^T x_i + b) = 1$ are the support vectors — they sit exactly on the margin boundary and entirely determine the classifier.

Once trained, a new point $x$ is classified by the sign of $w^T x + b$ , with confidence proportional to the distance $\dfrac{|w^T x + b|}{\|w\|}$ from the hyperplane.

Projections, Hyperplanes and Half-Planes