PGDDSA Study · Semester 1

Core Titles

Key headlines and terms for quick recall

Algorithm analysis — predict resource use without running
Time complexity — # of steps as function of input $n$
Space complexity — memory used
Asymptotic notation: $O$ (upper), $\Omega$ (lower), $\Theta$ (tight)
Common classes: $O(1) < O(\log n) < O(n) < O(n \log n) < O(n^2) < O(2^n) < O(n!)$
Best, Average, Worst cases
Amortised analysis
Master theorem for divide-and-conquer recurrences

Basic Idea

What it is, why it matters, how it works

Why analyse?

Two algorithms solving the same problem can differ by a factor of millions on real data. Algorithm analysis predicts resource use mathematically, without needing to run on every input.

We care about:

Time — how many basic operations.
Space — how much memory.

Asymptotic notation

We describe growth as $n \to \infty$ , ignoring constants and lower-order terms.

Big-O $O(f(n))$ — upper bound. $T(n) \le c \cdot f(n)$ for large $n$ .
Big-Omega $\Omega(f(n))$ — lower bound.
Big-Theta $\Theta(f(n))$ — tight bound (upper + lower).

We usually quote worst-case $O$ in interviews and texts.

Common growth classes

Notation	Name	Example
$O(1)$	Constant	array index
$O(\log n)$	Logarithmic	binary search
$O(n)$	Linear	one pass
$O(n \log n)$	Linearithmic	merge sort, FFT
$O(n^2)$	Quadratic	bubble sort
$O(n^3)$	Cubic	naive matrix multiplication
$O(2^n)$	Exponential	brute-force subset enumeration
$O(n!)$	Factorial	brute-force permutations / TSP

For $n = 10^6$ : $O(n^2)$ is $10^{12}$ ops — too slow for one second. $O(n \log n)$ is $\sim 2 \times 10^7$ — fine.

Three case analyses

Best case — luckiest input (rarely useful).
Average case — expected behaviour on random input.
Worst case — guaranteed upper bound (most useful for safety).

For quicksort: best $O(n \log n)$ , average $O(n \log n)$ , worst $O(n^2)$ (already-sorted).

Amortised analysis

Some operations are sometimes expensive but on average cheap. Amortised cost averages over a sequence:

Dynamic array (Python list) — append is $O(1)$ amortised because doubling-and-copying happens rarely.

How to analyse

Identify the "basic operation" (comparison, arithmetic).
Count how many times it runs as a function of input size.
Drop constants and lower-order terms.

def linear_search(arr, x):
    for i in range(len(arr)):       # n iterations
        if arr[i] == x:             # 1 op
            return i
    return -1
# Worst case T(n) = n → O(n)

Recurrences and the Master Theorem

For divide-and-conquer $T(n) = a T(n/b) + f(n)$ :

If $f(n) = O(n^{\log_b a - \varepsilon})$ → $T(n) = \Theta(n^{\log_b a})$ .
If $f(n) = \Theta(n^{\log_b a})$ → $T(n) = \Theta(n^{\log_b a} \log n)$ .
If $f(n) = \Omega(n^{\log_b a + \varepsilon})$ (and regularity) → $T(n) = \Theta(f(n))$ .

Merge sort: $T(n) = 2 T(n/2) + O(n) \Rightarrow T(n) = O(n \log n)$ .

Space complexity

Sometimes the bottleneck. Recursive algorithms use stack space; sorting in-place vs out-of-place differs.

Why it matters in Data Science

Knowing $O(n^2)$ matrix ops scale poorly tells you to switch to mini-batches or specialised libraries.
ML training and inference latency depends on this analysis.
Choosing the right data structure (hash table vs list) flips $O(n)$ to $O(1)$ .

Mind Map

Visual structure of the concept

ALGORITHM ANALYSIS
├── Time vs Space complexity
├── Asymptotic notation
│   ├── O — upper bound
│   ├── Ω — lower bound
│   └── Θ — tight
├── Growth classes
│   └── O(1) < log n < n < n log n < n² < 2ⁿ < n!
├── Three cases
│   ├── Best
│   ├── Average
│   └── Worst
├── Amortised analysis (e.g., dynamic array append)
├── Master theorem for T(n) = aT(n/b) + f(n)
└── Practical impact (n = 10⁶, op limits)

Exam Q&A

Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. Define time and space complexity.

Time complexity — number of basic operations an algorithm performs as a function of input size $n$ .
Space complexity — amount of memory required as a function of $n$ , including input, auxiliary and output.

Q2. What is Big-O notation? A mathematical notation that describes the asymptotic upper bound on an algorithm's growth: $T(n) = O(f(n))$ iff $T(n) \le c \cdot f(n)$ for some constant $c$ and all $n \ge n_0$ .

Q3. Order the following from slowest to fastest: $O(n)$ , $O(\log n)$ , $O(n^2)$ , $O(n \log n)$ , $O(1)$ . $O(1) < O(\log n) < O(n) < O(n \log n) < O(n^2)$ .

Part B (20 marks)

Q. Explain algorithm analysis. Discuss time and space complexity, asymptotic notations and common complexity classes. Why is the analysis of algorithms important?

Why analyse?

Two algorithms solving the same problem can differ by a factor of millions on real input. Empirical timing is brittle — performance changes with machine, compiler, data. Asymptotic analysis predicts growth mathematically and lets us reason about scalability.

Time complexity — number of basic operations as a function of input size $n$ . Space complexity — memory required as a function of $n$ .

Asymptotic notation.

Notation	Meaning	Use
$O(f)$	$T(n) \le c f(n)$ — upper bound	Worst-case guarantees
$\Omega(f)$	$T(n) \ge c f(n)$ — lower bound	Best-case / lower-bound proofs
$\Theta(f)$	Both — tight bound	Exact growth class

We drop constants and lower-order terms — $5 n^2 + 3 n + 7$ is $O(n^2)$ .

Common complexity classes.

Class	Name	Example
$O(1)$	Constant	Array index, hash lookup
$O(\log n)$	Logarithmic	Binary search
$O(n)$	Linear	Single pass through array
$O(n \log n)$	Linearithmic	Merge sort, heap sort, FFT
$O(n^2)$	Quadratic	Bubble / selection sort
$O(n^3)$	Cubic	Naive matrix multiply
$O(2^n)$	Exponential	Brute-force subset enumeration
$O(n!)$	Factorial	Brute-force TSP

Concrete impact at $n = 10^6$ (assume $10^9$ ops/sec):

Class	Operations	Time
$O(n)$	$10^6$	1 ms
$O(n \log n)$	$2 \times 10^7$	20 ms
$O(n^2)$	$10^{12}$	~17 min
$O(2^n)$	astronomical	universe-age

Three cases.

Best case — luckiest input.
Average case — expected over random input.
Worst case — guaranteed upper bound (used in safety-critical systems).

Quicksort: best $O(n \log n)$ , average $O(n \log n)$ , worst $O(n^2)$ (already-sorted, bad pivot).

Amortised analysis.

Sometimes individual operations are expensive but on average cheap. Dynamic array append (Python list, C++ vector) is $O(1)$ amortised — though resizing copies the whole array, it happens rarely.

Recurrences (Master Theorem). For divide-and-conquer $T(n) = a T(n/b) + f(n)$ :

Case	Condition	Solution
1	$f(n) = O(n^{\log_b a - \varepsilon})$	$T(n) = \Theta(n^{\log_b a})$
2	$f(n) = \Theta(n^{\log_b a})$	$T(n) = \Theta(n^{\log_b a} \log n)$
3	$f(n) = \Omega(n^{\log_b a + \varepsilon})$ + regularity	$T(n) = \Theta(f(n))$

Example (merge sort): $T(n) = 2 T(n/2) + O(n) \Rightarrow T(n) = O(n \log n)$ (Case 2).

Why this matters.

Predict whether an algorithm will scale to production data.
Compare candidate algorithms without coding both.
Justify infrastructure decisions — "your $O(n^2)$ algorithm will take 15 hours on 10⁶ rows; switch to $O(n \log n)$ ."
In ML, the same idea drives matrix-multiplication cost ( $O(n^3)$ generally) and explains why batching, GPUs and approximate algorithms (e.g., $O(n)$ random projections) are crucial.

Worked example.

Sorting 10 million numbers:

Bubble sort $O(n^2)$ = $10^{14}$ ops — infeasible.
Merge sort $O(n \log n) \approx 2.3 \times 10^8$ ops — 0.5 s.

The choice of algorithm is the difference between completing and timing out.

Algorithm Analysis: Time and Space Complexity