PGD01C02
Module 5 · Data Structures and Algorithm Design

Divide and Conquer

Core Titles
Key headlines and terms for quick recall
  • Divide and Conquer (D&C) — break problem into sub-problems
  • Three steps: Divide, Conquer, Combine
  • Recurrence: T(n)=aT(n/b)+f(n)T(n) = a T(n/b) + f(n) — solved by Master Theorem
  • Classics: Merge Sort, Quicksort, Binary Search, FFT, Strassen's matrix mult
  • Pros: often O(nlogn)O(n \log n) from O(n2)O(n^2)
  • Cons: recursion overhead, extra memory
Basic Idea
What it is, why it matters, how it works

Idea

Divide and Conquer solves a problem by:

  1. Divide — split into smaller independent sub-problems.
  2. Conquer — solve sub-problems recursively (base case: solve directly when small).
  3. Combine — merge sub-problem solutions into the full solution.

Recurrence

Most D&C algorithms have time complexity given by a recurrence T(n)=aT(n/b)+f(n)T(n) = a \, T(n/b) + f(n) where aa = number of sub-problems, bb = factor by which size shrinks, f(n)f(n) = work outside recursion.

Master Theorem solves this in three cases (covered in algorithm analysis).

Classic examples

1. Binary Search (O(logn)O(\log n)).

def binary_search(arr, x, lo, hi):
    if lo > hi: return -1
    mid = (lo + hi) // 2
    if arr[mid] == x: return mid
    if arr[mid] < x: return binary_search(arr, x, mid+1, hi)
    return binary_search(arr, x, lo, mid-1)

Recurrence T(n)=T(n/2)+O(1)T(n)=O(logn)T(n) = T(n/2) + O(1) \Rightarrow T(n) = O(\log n).

2. Merge Sort (O(nlogn)O(n \log n)).

def merge_sort(arr):
    if len(arr) <= 1: return arr
    mid = len(arr) // 2
    left = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])
    return merge(left, right)

T(n)=2T(n/2)+O(n)T(n)=O(nlogn)T(n) = 2 T(n/2) + O(n) \Rightarrow T(n) = O(n \log n). Stable, optimal comparison-based sort.

3. Quicksort (O(nlogn)O(n \log n) average, O(n2)O(n^2) worst).

Choose a pivot; partition; recurse on each side. In-place, very fast in practice.

4. Strassen's matrix multiplication. Naive: O(n3)O(n^3). Strassen: O(n2.807)O(n^{2.807}) via 7 multiplications of n/2×n/2n/2 \times n/2 blocks.

5. Fast Fourier Transform (FFT). O(nlogn)O(n \log n) vs naive DFT O(n2)O(n^2). Foundation of modern signal processing.

6. Closest pair of points. O(nlogn)O(n \log n) for points in 2D — recursive split + boundary merge.

7. Karatsuba multiplication. Multiply two nn-digit numbers in O(n1.585)O(n^{1.585}).

Pros and cons

Pros.

  • Often turns O(n2)O(n^2) into O(nlogn)O(n \log n).
  • Naturally parallelisable — sub-problems are independent.
  • Clean recursive code.

Cons.

  • Recursion overhead — function calls, stack space.
  • Memory — merge sort needs extra space.
  • Doesn't always help if sub-problems overlap (use DP instead).

Parallelisation

Independent sub-problems → run them on separate cores / GPUs. MapReduce is a distributed D&C pattern.

When D&C does not apply

If sub-problems overlap (same sub-problem solved many times), use Dynamic Programming to memoise. Example: naive recursive Fibonacci is O(2n)O(2^n) — DP cuts it to O(n)O(n).

Why it matters in Data Science

  • Sorting / searching are foundational.
  • FFT powers signal analysis, JPEG, MP3.
  • MapReduce parallelises analytics across clusters.
  • Strassen's principle underlies efficient deep-learning kernel libraries (cuBLAS).
Mind Map
Visual structure of the concept
DIVIDE AND CONQUER
├── Steps
│   ├── Divide — split into sub-problems
│   ├── Conquer — solve recursively
│   └── Combine — merge solutions
├── Recurrence T(n) = aT(n/b) + f(n)
│   └── Master theorem solves
├── Classic algorithms
│   ├── Binary search  O(log n)
│   ├── Merge sort     O(n log n)
│   ├── Quicksort      O(n log n) avg
│   ├── FFT            O(n log n)
│   ├── Strassen mult  O(n^2.807)
│   └── Closest-pair   O(n log n)
├── Parallelisable
└── Use DP if sub-problems overlap
Exam Q&A
Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. What are the three steps of Divide and Conquer?

  1. Divide the problem into smaller sub-problems.
  2. Conquer each sub-problem recursively (base case: solve directly).
  3. Combine the sub-solutions into the full answer.

Q2. Give the time complexity of merge sort and explain. O(nlogn)O(n \log n). Recurrence T(n)=2T(n/2)+O(n)T(n) = 2 T(n/2) + O(n): two halves recursed, plus O(n)O(n) to merge.

Q3. What is the limitation of D&C compared to Dynamic Programming? D&C re-solves sub-problems independently. When sub-problems overlap, it recomputes the same answers — DP memoises sub-solutions to avoid that work.


Part B (20 marks)

Q. Explain the Divide and Conquer algorithm-design technique. Discuss its three steps, common examples, and time-complexity analysis using the Master Theorem.

Idea. D&C reduces a problem to smaller sub-problems of the same kind, solves them recursively, and combines their answers.

Three steps.

  1. Divide. Partition the input into smaller (often equal) sub-problems.
  2. Conquer. Recursively solve each sub-problem. Base case: when the size is small, solve directly.
  3. Combine. Merge the recursive answers into the overall answer.

General recurrence.

T(n)=aT(n/b)+f(n)T(n) = a \, T(n/b) + f(n)

where aa = # of sub-problems, bb = factor reducing size, f(n)f(n) = work outside recursion.

Master Theorem. Let c=logbac = \log_b a.

CaseCondition on f(n)f(n)Solution
1f(n)=O(ncε)f(n) = O(n^{c - \varepsilon})T(n)=Θ(nc)T(n) = \Theta(n^c)
2f(n)=Θ(nc)f(n) = \Theta(n^c)T(n)=Θ(nclogn)T(n) = \Theta(n^c \log n)
3f(n)=Ω(nc+ε)f(n) = \Omega(n^{c + \varepsilon}) + regularityT(n)=Θ(f(n))T(n) = \Theta(f(n))

Classic examples.

1. Binary Search. T(n)=T(n/2)+O(1)T(n)=O(logn)T(n) = T(n/2) + O(1) \Rightarrow T(n) = O(\log n).

2. Merge Sort.

MergeSort(A):
    if len(A) ≤ 1: return A
    mid = len(A) // 2
    L = MergeSort(A[:mid])
    R = MergeSort(A[mid:])
    return Merge(L, R)

Recurrence T(n)=2T(n/2)+O(n)T(n) = 2 T(n/2) + O(n). Case 2 ⇒ T(n)=O(nlogn)T(n) = O(n \log n). Stable, optimal among comparison sorts.

3. Quicksort. Average T(n)=2T(n/2)+O(n)=O(nlogn)T(n) = 2 T(n/2) + O(n) = O(n \log n). Worst-case (bad pivots) O(n2)O(n^2).

4. Strassen's Matrix Multiplication. Naive O(n3)O(n^3). Strassen recurses T(n)=7T(n/2)+O(n2)T(n) = 7 T(n/2) + O(n^2)T(n)=O(nlog27)=O(n2.807)T(n) = O(n^{\log_2 7}) = O(n^{2.807}).

5. Fast Fourier Transform (FFT). T(n)=2T(n/2)+O(n)=O(nlogn)T(n) = 2 T(n/2) + O(n) = O(n \log n) — backbone of signal processing, image compression, polynomial multiplication.

6. Closest pair of points in 2D. O(nlogn)O(n \log n) via recursive splitting on xx-coordinate plus boundary merge.

Pros and cons.

Pros.

  • Often improves O(n2)O(n^2) to O(nlogn)O(n \log n) — huge speed-up.
  • Independent sub-problems are naturally parallelisable — drives MapReduce, GPU kernels.
  • Clean, recursive expression of the algorithm.

Cons.

  • Recursion overhead (function calls, stack space).
  • Some D&C algorithms (merge sort) use extra memory.
  • If sub-problems overlap, D&C wastefully recomputes — use Dynamic Programming instead.

When D&C fails — overlap.

Naive Fibonacci:

fib(n) = fib(n-1) + fib(n-2)

T(n)=O(2n)T(n) = O(2^n) because the same sub-problems recur. DP memoises them → O(n)O(n).

Take-away. D&C is a powerful paradigm — used to derive the famous O(nlogn)O(n \log n) algorithms — and underlies many modern parallel and distributed computations.