Inner Products and Distance Measures
Core Titles
Key headlines and terms for quick recall- Inner product
- Norm
- Cosine similarity
- Orthogonality
- Cauchy–Schwarz
- Distance measures: Euclidean, Manhattan, Chebyshev, Minkowski, Mahalanobis
Basic Idea
What it is, why it matters, how it worksInner product
An inner product on a vector space satisfies symmetry, bilinearity, and positive-definiteness. In the standard one is the dot product:
It measures the alignment of two vectors.
Norm (length)
The induced norm:
Angle and cosine similarity
Two vectors are orthogonal when .
Cauchy–Schwarz inequality
with equality iff are linearly dependent.
Common distance measures
For :
- Euclidean ():
- Manhattan ():
- Chebyshev ():
- Minkowski (): (general form)
- Cosine distance:
- Mahalanobis: , scaled by covariance — accounts for feature correlation and scale.
Why this matters in Data Science
- k-Nearest Neighbors needs a distance metric.
- k-Means uses Euclidean distance to assign clusters.
- Cosine similarity is the standard for text vectors and embeddings.
- Mahalanobis catches outliers in correlated data.
Mind Map
Visual structure of the conceptINNER PRODUCTS & DISTANCE
├── Inner product
│ ├── ⟨x,y⟩ = Σ xᵢyᵢ
│ ├── ‖x‖ = √⟨x,x⟩
│ ├── cos θ = ⟨x,y⟩ / (‖x‖‖y‖)
│ └── Cauchy-Schwarz: |⟨x,y⟩| ≤ ‖x‖‖y‖
└── Distances
├── Euclidean (L₂)
├── Manhattan (L₁)
├── Chebyshev (L∞)
├── Minkowski (Lₚ)
├── Cosine (1 − cos θ)
└── Mahalanobis √(x−y)ᵀS⁻¹(x−y)
Exam Q&A
Part A (2 marks) and Part B (20 marks) style questionsPart A (2 marks each)
Q1. Define dot product of two vectors. .
Q2. State the Cauchy–Schwarz inequality. .
Q3. Define cosine similarity. , measuring the angle between two vectors.
Q4. Why is Mahalanobis distance preferred over Euclidean for correlated features? Because it normalises by the covariance matrix, removing the influence of scale and correlation among features.
Part B (20 marks)
Q. Discuss inner products and various distance measures used in data science. Compare Euclidean, Manhattan, Cosine and Mahalanobis distances with examples.
Inner product. A bilinear, symmetric, positive-definite operation producing a scalar. In , the dot product measures alignment and induces:
- Norm
- Angle via
- Orthogonality:
Distance measures. Take and .
| Distance | Formula | Value (example) |
|---|---|---|
| Euclidean | ||
| Manhattan | $\sum | x_i - y_i |
| Chebyshev | $\max | x_i - y_i |
| Minkowski- | $(\sum | x_i - y_i |
| Cosine |
Comparison.
- Euclidean () — straight-line distance. Standard for k-Means, k-NN with continuous features. Sensitive to scale.
- Manhattan () — sum of axis differences. Robust to outliers, used in robust regression and grid-like spaces.
- Chebyshev () — worst single-coordinate gap. Used in chess king moves and warehouse logistics.
- Cosine — angle-based, ignores magnitude. Standard for text vectors, embeddings, recommender systems where depends on document length but the direction (topic) matters.
- Mahalanobis — where is the covariance matrix. It de-correlates and rescales features, treating correlated coordinates as one. Used for outlier detection and discriminant analysis.
Choice depends on the data. Numeric continuous → Euclidean. Sparse high-dimensional text → Cosine. Correlated features with different units → Mahalanobis. Robust to outliers → Manhattan.