PGDDSA Study · Semester 1

Core Titles

Key headlines and terms for quick recall

Inner product $\langle x, y \rangle = \sum x_i y_i$
Norm $\|x\| = \sqrt{\langle x, x \rangle}$
Cosine similarity $\cos\theta = \dfrac{\langle x,y \rangle}{\|x\|\|y\|}$
Orthogonality $\langle x, y \rangle = 0$
Cauchy–Schwarz $|\langle x, y \rangle| \le \|x\| \|y\|$
Distance measures: Euclidean, Manhattan, Chebyshev, Minkowski, Mahalanobis

Basic Idea

What it is, why it matters, how it works

Inner product

An inner product $\langle \cdot, \cdot \rangle$ on a vector space satisfies symmetry, bilinearity, and positive-definiteness. In $\mathbb{R}^n$ the standard one is the dot product: $\langle x, y \rangle = x^T y = \sum_{i=1}^n x_i y_i$

It measures the alignment of two vectors.

Norm (length)

The induced norm: $\|x\| = \sqrt{\langle x, x \rangle} = \sqrt{\sum x_i^2}.$

Angle and cosine similarity

$\cos \theta = \frac{\langle x, y \rangle}{\|x\| \, \|y\|}.$

Two vectors are orthogonal when $\langle x, y \rangle = 0$ .

Cauchy–Schwarz inequality

$|\langle x, y \rangle| \le \|x\| \, \|y\|$ with equality iff $x, y$ are linearly dependent.

Common distance measures

For $x, y \in \mathbb{R}^n$ :

Euclidean ( $L_2$ ): $d(x,y) = \sqrt{\sum (x_i - y_i)^2}$
Manhattan ( $L_1$ ): $d(x,y) = \sum |x_i - y_i|$
Chebyshev ( $L_\infty$ ): $d(x,y) = \max_i |x_i - y_i|$
Minkowski ( $L_p$ ): $\left( \sum |x_i - y_i|^p \right)^{1/p}$ (general form)
Cosine distance: $1 - \cos \theta$
Mahalanobis: $d_M(x, y) = \sqrt{(x-y)^T S^{-1} (x-y)}$ , scaled by covariance $S$ — accounts for feature correlation and scale.

Why this matters in Data Science

k-Nearest Neighbors needs a distance metric.
k-Means uses Euclidean distance to assign clusters.
Cosine similarity is the standard for text vectors and embeddings.
Mahalanobis catches outliers in correlated data.

Mind Map

Visual structure of the concept

INNER PRODUCTS & DISTANCE
├── Inner product
│   ├── ⟨x,y⟩ = Σ xᵢyᵢ
│   ├── ‖x‖ = √⟨x,x⟩
│   ├── cos θ = ⟨x,y⟩ / (‖x‖‖y‖)
│   └── Cauchy-Schwarz: |⟨x,y⟩| ≤ ‖x‖‖y‖
└── Distances
    ├── Euclidean (L₂)
    ├── Manhattan (L₁)
    ├── Chebyshev (L∞)
    ├── Minkowski (Lₚ)
    ├── Cosine (1 − cos θ)
    └── Mahalanobis  √(x−y)ᵀS⁻¹(x−y)

Exam Q&A

Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. Define dot product of two vectors. $\langle x, y \rangle = x_1 y_1 + x_2 y_2 + \dots + x_n y_n$ .

Q2. State the Cauchy–Schwarz inequality. $|\langle x, y \rangle| \le \|x\| \, \|y\|$ .

Q3. Define cosine similarity. $\cos\theta = \dfrac{\langle x,y \rangle}{\|x\| \, \|y\|}$ , measuring the angle between two vectors.

Q4. Why is Mahalanobis distance preferred over Euclidean for correlated features? Because it normalises by the covariance matrix, removing the influence of scale and correlation among features.

Part B (20 marks)

Q. Discuss inner products and various distance measures used in data science. Compare Euclidean, Manhattan, Cosine and Mahalanobis distances with examples.

Inner product. A bilinear, symmetric, positive-definite operation $\langle \cdot, \cdot \rangle$ producing a scalar. In $\mathbb{R}^n$ , the dot product $\langle x, y \rangle = \sum x_i y_i$ measures alignment and induces:

Norm $\|x\| = \sqrt{\langle x, x \rangle}$
Angle via $\cos\theta = \langle x, y \rangle / (\|x\| \|y\|)$
Orthogonality: $\langle x, y \rangle = 0$

Distance measures. Take $x = (1, 2)$ and $y = (4, 6)$ .

Distance	Formula	Value (example)
Euclidean	$\sqrt{\sum (x_i - y_i)^2}$	$\sqrt{9 + 16} = 5$
Manhattan	$\sum	x_i - y_i
Chebyshev	$\max	x_i - y_i
Minkowski- $p$	$(\sum	x_i - y_i
Cosine	$1 - \dfrac{\langle x, y \rangle}{\\|x\\|\\|y\\|}$	$1 - \dfrac{16}{\sqrt{5}\sqrt{52}} \approx 0.008$

Comparison.

Euclidean ( $L_2$ ) — straight-line distance. Standard for k-Means, k-NN with continuous features. Sensitive to scale.
Manhattan ( $L_1$ ) — sum of axis differences. Robust to outliers, used in robust regression and grid-like spaces.
Chebyshev ( $L_\infty$ ) — worst single-coordinate gap. Used in chess king moves and warehouse logistics.
Cosine — angle-based, ignores magnitude. Standard for text vectors, embeddings, recommender systems where $\|x\|$ depends on document length but the direction (topic) matters.
Mahalanobis — $d_M(x,y) = \sqrt{(x-y)^T S^{-1}(x-y)}$ where $S$ is the covariance matrix. It de-correlates and rescales features, treating correlated coordinates as one. Used for outlier detection and discriminant analysis.

Choice depends on the data. Numeric continuous → Euclidean. Sparse high-dimensional text → Cosine. Correlated features with different units → Mahalanobis. Robust to outliers → Manhattan.

Inner Products and Distance Measures