PGD01C01
Module 3 · Linear Algebra for Data Science

Inner Products and Distance Measures

Core Titles
Key headlines and terms for quick recall
  • Inner product x,y=xiyi\langle x, y \rangle = \sum x_i y_i
  • Norm x=x,x\|x\| = \sqrt{\langle x, x \rangle}
  • Cosine similarity cosθ=x,yxy\cos\theta = \dfrac{\langle x,y \rangle}{\|x\|\|y\|}
  • Orthogonality x,y=0\langle x, y \rangle = 0
  • Cauchy–Schwarz x,yxy|\langle x, y \rangle| \le \|x\| \|y\|
  • Distance measures: Euclidean, Manhattan, Chebyshev, Minkowski, Mahalanobis
Basic Idea
What it is, why it matters, how it works

Inner product

An inner product ,\langle \cdot, \cdot \rangle on a vector space satisfies symmetry, bilinearity, and positive-definiteness. In Rn\mathbb{R}^n the standard one is the dot product: x,y=xTy=i=1nxiyi\langle x, y \rangle = x^T y = \sum_{i=1}^n x_i y_i

It measures the alignment of two vectors.

Norm (length)

The induced norm: x=x,x=xi2.\|x\| = \sqrt{\langle x, x \rangle} = \sqrt{\sum x_i^2}.

Angle and cosine similarity

cosθ=x,yxy.\cos \theta = \frac{\langle x, y \rangle}{\|x\| \, \|y\|}.

Two vectors are orthogonal when x,y=0\langle x, y \rangle = 0.

Cauchy–Schwarz inequality

x,yxy|\langle x, y \rangle| \le \|x\| \, \|y\| with equality iff x,yx, y are linearly dependent.

Common distance measures

For x,yRnx, y \in \mathbb{R}^n:

  • Euclidean (L2L_2): d(x,y)=(xiyi)2d(x,y) = \sqrt{\sum (x_i - y_i)^2}
  • Manhattan (L1L_1): d(x,y)=xiyid(x,y) = \sum |x_i - y_i|
  • Chebyshev (LL_\infty): d(x,y)=maxixiyid(x,y) = \max_i |x_i - y_i|
  • Minkowski (LpL_p): (xiyip)1/p\left( \sum |x_i - y_i|^p \right)^{1/p} (general form)
  • Cosine distance: 1cosθ1 - \cos \theta
  • Mahalanobis: dM(x,y)=(xy)TS1(xy)d_M(x, y) = \sqrt{(x-y)^T S^{-1} (x-y)}, scaled by covariance SS — accounts for feature correlation and scale.

Why this matters in Data Science

  • k-Nearest Neighbors needs a distance metric.
  • k-Means uses Euclidean distance to assign clusters.
  • Cosine similarity is the standard for text vectors and embeddings.
  • Mahalanobis catches outliers in correlated data.
Mind Map
Visual structure of the concept
INNER PRODUCTS & DISTANCE
├── Inner product
│   ├── ⟨x,y⟩ = Σ xᵢyᵢ
│   ├── ‖x‖ = √⟨x,x⟩
│   ├── cos θ = ⟨x,y⟩ / (‖x‖‖y‖)
│   └── Cauchy-Schwarz: |⟨x,y⟩| ≤ ‖x‖‖y‖
└── Distances
    ├── Euclidean (L₂)
    ├── Manhattan (L₁)
    ├── Chebyshev (L∞)
    ├── Minkowski (Lₚ)
    ├── Cosine (1 − cos θ)
    └── Mahalanobis  √(x−y)ᵀS⁻¹(x−y)
Exam Q&A
Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. Define dot product of two vectors. x,y=x1y1+x2y2++xnyn\langle x, y \rangle = x_1 y_1 + x_2 y_2 + \dots + x_n y_n.

Q2. State the Cauchy–Schwarz inequality. x,yxy|\langle x, y \rangle| \le \|x\| \, \|y\|.

Q3. Define cosine similarity. cosθ=x,yxy\cos\theta = \dfrac{\langle x,y \rangle}{\|x\| \, \|y\|}, measuring the angle between two vectors.

Q4. Why is Mahalanobis distance preferred over Euclidean for correlated features? Because it normalises by the covariance matrix, removing the influence of scale and correlation among features.


Part B (20 marks)

Q. Discuss inner products and various distance measures used in data science. Compare Euclidean, Manhattan, Cosine and Mahalanobis distances with examples.

Inner product. A bilinear, symmetric, positive-definite operation ,\langle \cdot, \cdot \rangle producing a scalar. In Rn\mathbb{R}^n, the dot product x,y=xiyi\langle x, y \rangle = \sum x_i y_i measures alignment and induces:

  • Norm x=x,x\|x\| = \sqrt{\langle x, x \rangle}
  • Angle via cosθ=x,y/(xy)\cos\theta = \langle x, y \rangle / (\|x\| \|y\|)
  • Orthogonality: x,y=0\langle x, y \rangle = 0

Distance measures. Take x=(1,2)x = (1, 2) and y=(4,6)y = (4, 6).

DistanceFormulaValue (example)
Euclidean(xiyi)2\sqrt{\sum (x_i - y_i)^2}9+16=5\sqrt{9 + 16} = 5
Manhattan$\sumx_i - y_i
Chebyshev$\maxx_i - y_i
Minkowski-pp$(\sumx_i - y_i
Cosine1x,yxy1 - \dfrac{\langle x, y \rangle}{\|x\|\|y\|}1165520.0081 - \dfrac{16}{\sqrt{5}\sqrt{52}} \approx 0.008

Comparison.

  • Euclidean (L2L_2) — straight-line distance. Standard for k-Means, k-NN with continuous features. Sensitive to scale.
  • Manhattan (L1L_1) — sum of axis differences. Robust to outliers, used in robust regression and grid-like spaces.
  • Chebyshev (LL_\infty) — worst single-coordinate gap. Used in chess king moves and warehouse logistics.
  • Cosine — angle-based, ignores magnitude. Standard for text vectors, embeddings, recommender systems where x\|x\| depends on document length but the direction (topic) matters.
  • MahalanobisdM(x,y)=(xy)TS1(xy)d_M(x,y) = \sqrt{(x-y)^T S^{-1}(x-y)} where SS is the covariance matrix. It de-correlates and rescales features, treating correlated coordinates as one. Used for outlier detection and discriminant analysis.

Choice depends on the data. Numeric continuous → Euclidean. Sparse high-dimensional text → Cosine. Correlated features with different units → Mahalanobis. Robust to outliers → Manhattan.