PGDDSA Study · Semester 1

PGD01C02

Module 5 · Data Structures and Algorithm Design

Greedy Method

Core Titles

Key headlines and terms for quick recall

Greedy algorithm — make the locally best choice at each step
Greedy-choice property — local optimal leads to global optimal
Optimal substructure — global solution contains optimal sub-solutions
Classics: Kruskal's MST, Prim's MST, Dijkstra's shortest path, Huffman coding, Fractional Knapsack, Activity selection
Cons: doesn't always give optimal (e.g., 0/1 knapsack)
Proof techniques: exchange argument

Basic Idea

What it is, why it matters, how it works

Idea

A greedy algorithm builds a solution step by step, making the locally best choice (the "greedy" choice) at each step — never revisiting earlier decisions. It produces an optimal solution iff the problem has certain structure.

When greedy works — two conditions

1. Greedy-choice property. A globally optimal solution can be reached by making locally optimal choices.

2. Optimal substructure. An optimal solution to the problem contains optimal solutions to its sub-problems.

If both hold, greedy is correct and fast. If not, greedy may produce a sub-optimal answer.

Classic greedy algorithms

1. Activity selection. Given activities with start/end times, schedule the most that don't overlap.

Strategy: sort by end time; pick each activity whose start is ≥ last selected end.
$O(n \log n)$ . Provably optimal.

2. Fractional Knapsack. Items have weight $w_i$ and value $v_i$ ; pick fractions to maximise value within capacity $W$ .

Strategy: sort by value/weight ratio descending; take as much of each as fits.
$O(n \log n)$ . Optimal because fractions are allowed.

3. 0/1 Knapsack — greedy FAILS. Cannot take fractions; greedy by ratio is sub-optimal. Use DP.

4. Huffman coding. Build optimal prefix code from character frequencies.

Strategy: repeatedly combine the two least-frequent nodes into a new node with summed frequency, using a min-heap.
$O(n \log n)$ . Provably optimal.

5. Kruskal's MST.

Sort edges by weight; add if no cycle (Union-Find).
$O(E \log E)$ . Provably optimal.

6. Prim's MST.

Grow tree from any vertex; always add the cheapest crossing edge.
$O((V + E) \log V)$ with a heap.

7. Dijkstra's shortest path.

Greedy choice: extract minimum-distance unvisited vertex; relax neighbours.
$O((V + E) \log V)$ with a heap.
Works only with non-negative weights.

8. Coin change (canonical denominations).

Take largest coin ≤ remaining amount.
Optimal for {1, 5, 10, 25} (US coins); not for arbitrary denominations like {1, 3, 4}.

Proof technique — Exchange argument

To prove a greedy algorithm is optimal: assume any optimal solution $O$ ; show that exchanging $O$ 's first differing choice with the greedy's choice does not worsen $O$ . Hence the greedy's choice is also part of some optimal solution.

When greedy fails

0/1 Knapsack — fractions disallowed.
TSP — greedy nearest-neighbour produces tours up to $\log n \cdot \text{OPT}$ .
Longest path — greedy can pick a dead end early.

Greedy vs DP

Aspect	Greedy	DP
Decisions	Irrevocable	Considers all alternatives
Speed	Faster (often $O(n \log n)$ )	Slower (often $O(n^2)$ or higher)
Memory	Lower	Higher (table)
Correctness	Only if structure holds	Always if applicable
Classic ex	MST, Huffman	LCS, 0/1 Knapsack

Why greedy in Data Science

MST — graph clustering.
Huffman — data compression.
Dijkstra — pathfinding in graph ML.
Greedy feature selection — RFE, forward stepwise.
Greedy decoding — language-model output (greedy / beam search).

Mind Map

Visual structure of the concept

GREEDY METHOD
├── Make locally best choice at each step
├── Required structure
│   ├── Greedy-choice property
│   └── Optimal substructure
├── Classics
│   ├── Activity selection
│   ├── Fractional Knapsack
│   ├── Huffman coding
│   ├── Kruskal MST
│   ├── Prim MST
│   ├── Dijkstra shortest path
│   └── Coin change (canonical)
├── Fails on
│   ├── 0/1 Knapsack
│   └── TSP (only approx)
└── Proof — exchange argument

Exam Q&A

Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. Define the Greedy method. An algorithmic paradigm that builds a solution incrementally by making the locally optimal choice at each step — never revising past decisions.

Q2. What two properties must a problem satisfy for greedy to be correct?

Greedy-choice property — local optimal choice leads to global optimum.
Optimal substructure — an optimal solution contains optimal solutions to its sub-problems.

Q3. Give an example where greedy fails. 0/1 Knapsack. Greedy by value/weight ratio is sub-optimal — sometimes a less efficient combination of items fills capacity better. Dynamic Programming is needed for the optimal solution.

Part B (20 marks)

Q. Explain the Greedy algorithm-design technique. State its required properties, illustrate with examples (Kruskal's MST, Huffman coding, Activity selection) and discuss its limitations vs Dynamic Programming.

Greedy paradigm.

Build a solution one piece at a time, always picking the locally best choice, without backtracking. Fast and intuitive — but correct only when the problem has special structure.

Required structure.

Greedy-choice property — a global optimum can be constructed by making locally optimal choices.
Optimal substructure — an optimal solution to the problem contains optimal solutions to its sub-problems.

If either fails, greedy may produce sub-optimal answers.

Example 1 — Activity selection.

Given $n$ activities with start $s_i$ and finish $f_i$ , schedule the maximum number that don't overlap.

Strategy: sort activities by finish time. Iterate; pick each whose start ≥ last selected finish.

Complexity. $O(n \log n)$ .

Why optimal. Suppose some optimum picks activity $j$ instead of greedy's choice $g$ . Since $f_g \le f_j$ , swap $j \to g$ — still a valid schedule with the same count. So some optimum begins with $g$ . Apply induction.

Example 2 — Kruskal's MST.

Build a minimum spanning tree of a weighted connected graph.

Strategy.

sort edges by weight
for each edge (u, v) in ascending order:
    if u and v are in different components (Union-Find):
        add (u, v) to MST

Complexity. $O(E \log E)$ .

Why optimal. Cut property — the lightest edge crossing any cut is part of some MST. Each greedy step adds such a lightest edge across the cut separating its components.

Worked. Graph with edges $(A,B,4), (A,C,1), (B,C,2), (B,D,5), (C,D,8), (D,E,2), (C,E,10)$ : Sorted: $(A,C,1), (B,C,2), (D,E,2), (A,B,4), (B,D,5), (C,D,8), (C,E,10)$ .

Add (A,C) 1; (B,C) 2; (D,E) 2; skip (A,B) [cycle]; add (B,D) 5. Done — 4 edges.
MST weight = 10.

Example 3 — Huffman coding.

Optimal prefix code for $n$ symbols with frequencies.

Strategy. Put each symbol in a min-heap by frequency. Repeatedly extract the two smallest, combine into a new node whose frequency is the sum, insert back. Continue until one node remains — the Huffman tree.

Complexity. $O(n \log n)$ .

Why optimal. Provable by exchange argument: in any optimal prefix tree, the two least-frequent symbols are sibling leaves at the deepest level. The greedy choice respects this.

Application. JPEG, MP3, ZIP all use Huffman (or its variant arithmetic coding).

When greedy fails — 0/1 Knapsack.

Take or leave each item; maximise value within capacity.

Greedy by ratio fails: items $\{(w_1=10, v_1=60), (w_2=20, v_2=100), (w_3=30, v_3=120)\}$ , capacity $W = 50$ . Ratios 6, 5, 4. Greedy picks 1, 2 → value 160 with weight 30. Optimum picks 2, 3 → value 220 with weight 50.

Reason. No greedy-choice property here; need to consider all combinations. DP solves in $O(nW)$ .

Greedy vs DP comparison.

Aspect	Greedy	DP
Builds solution	Incrementally, no revisits	Solves all sub-problems
Memory	Low	High (table)
Speed	Often $O(n \log n)$	Often $O(n^2)$ or $O(nW)$
Correctness	Only if greedy property holds	Always (if applicable)
Examples	MST, Huffman, Activity, Dijkstra	LCS, 0/1 Knapsack, edit distance

Take-away. Greedy is elegant and fast — but correctness must be proven. When in doubt, DP is the safer fallback.

Divide and Conquer Dynamic Programming