PGD01C02
Module 5 · Data Structures and Algorithm Design

Greedy Method

Core Titles
Key headlines and terms for quick recall
  • Greedy algorithm — make the locally best choice at each step
  • Greedy-choice property — local optimal leads to global optimal
  • Optimal substructure — global solution contains optimal sub-solutions
  • Classics: Kruskal's MST, Prim's MST, Dijkstra's shortest path, Huffman coding, Fractional Knapsack, Activity selection
  • Cons: doesn't always give optimal (e.g., 0/1 knapsack)
  • Proof techniques: exchange argument
Basic Idea
What it is, why it matters, how it works

Idea

A greedy algorithm builds a solution step by step, making the locally best choice (the "greedy" choice) at each step — never revisiting earlier decisions. It produces an optimal solution iff the problem has certain structure.

When greedy works — two conditions

1. Greedy-choice property. A globally optimal solution can be reached by making locally optimal choices.

2. Optimal substructure. An optimal solution to the problem contains optimal solutions to its sub-problems.

If both hold, greedy is correct and fast. If not, greedy may produce a sub-optimal answer.

Classic greedy algorithms

1. Activity selection. Given activities with start/end times, schedule the most that don't overlap.

  • Strategy: sort by end time; pick each activity whose start is ≥ last selected end.
  • O(nlogn)O(n \log n). Provably optimal.

2. Fractional Knapsack. Items have weight wiw_i and value viv_i; pick fractions to maximise value within capacity WW.

  • Strategy: sort by value/weight ratio descending; take as much of each as fits.
  • O(nlogn)O(n \log n). Optimal because fractions are allowed.

3. 0/1 Knapsack — greedy FAILS. Cannot take fractions; greedy by ratio is sub-optimal. Use DP.

4. Huffman coding. Build optimal prefix code from character frequencies.

  • Strategy: repeatedly combine the two least-frequent nodes into a new node with summed frequency, using a min-heap.
  • O(nlogn)O(n \log n). Provably optimal.

5. Kruskal's MST.

  • Sort edges by weight; add if no cycle (Union-Find).
  • O(ElogE)O(E \log E). Provably optimal.

6. Prim's MST.

  • Grow tree from any vertex; always add the cheapest crossing edge.
  • O((V+E)logV)O((V + E) \log V) with a heap.

7. Dijkstra's shortest path.

  • Greedy choice: extract minimum-distance unvisited vertex; relax neighbours.
  • O((V+E)logV)O((V + E) \log V) with a heap.
  • Works only with non-negative weights.

8. Coin change (canonical denominations).

  • Take largest coin ≤ remaining amount.
  • Optimal for {1, 5, 10, 25} (US coins); not for arbitrary denominations like {1, 3, 4}.

Proof technique — Exchange argument

To prove a greedy algorithm is optimal: assume any optimal solution OO; show that exchanging OO's first differing choice with the greedy's choice does not worsen OO. Hence the greedy's choice is also part of some optimal solution.

When greedy fails

  • 0/1 Knapsack — fractions disallowed.
  • TSP — greedy nearest-neighbour produces tours up to lognOPT\log n \cdot \text{OPT}.
  • Longest path — greedy can pick a dead end early.

Greedy vs DP

AspectGreedyDP
DecisionsIrrevocableConsiders all alternatives
SpeedFaster (often O(nlogn)O(n \log n))Slower (often O(n2)O(n^2) or higher)
MemoryLowerHigher (table)
CorrectnessOnly if structure holdsAlways if applicable
Classic exMST, HuffmanLCS, 0/1 Knapsack

Why greedy in Data Science

  • MST — graph clustering.
  • Huffman — data compression.
  • Dijkstra — pathfinding in graph ML.
  • Greedy feature selection — RFE, forward stepwise.
  • Greedy decoding — language-model output (greedy / beam search).
Mind Map
Visual structure of the concept
GREEDY METHOD
├── Make locally best choice at each step
├── Required structure
│   ├── Greedy-choice property
│   └── Optimal substructure
├── Classics
│   ├── Activity selection
│   ├── Fractional Knapsack
│   ├── Huffman coding
│   ├── Kruskal MST
│   ├── Prim MST
│   ├── Dijkstra shortest path
│   └── Coin change (canonical)
├── Fails on
│   ├── 0/1 Knapsack
│   └── TSP (only approx)
└── Proof — exchange argument
Exam Q&A
Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. Define the Greedy method. An algorithmic paradigm that builds a solution incrementally by making the locally optimal choice at each step — never revising past decisions.

Q2. What two properties must a problem satisfy for greedy to be correct?

  1. Greedy-choice property — local optimal choice leads to global optimum.
  2. Optimal substructure — an optimal solution contains optimal solutions to its sub-problems.

Q3. Give an example where greedy fails. 0/1 Knapsack. Greedy by value/weight ratio is sub-optimal — sometimes a less efficient combination of items fills capacity better. Dynamic Programming is needed for the optimal solution.


Part B (20 marks)

Q. Explain the Greedy algorithm-design technique. State its required properties, illustrate with examples (Kruskal's MST, Huffman coding, Activity selection) and discuss its limitations vs Dynamic Programming.

Greedy paradigm.

Build a solution one piece at a time, always picking the locally best choice, without backtracking. Fast and intuitive — but correct only when the problem has special structure.

Required structure.

  1. Greedy-choice property — a global optimum can be constructed by making locally optimal choices.
  2. Optimal substructure — an optimal solution to the problem contains optimal solutions to its sub-problems.

If either fails, greedy may produce sub-optimal answers.

Example 1 — Activity selection.

Given nn activities with start sis_i and finish fif_i, schedule the maximum number that don't overlap.

Strategy: sort activities by finish time. Iterate; pick each whose start ≥ last selected finish.

Complexity. O(nlogn)O(n \log n).

Why optimal. Suppose some optimum picks activity jj instead of greedy's choice gg. Since fgfjf_g \le f_j, swap jgj \to g — still a valid schedule with the same count. So some optimum begins with gg. Apply induction.

Example 2 — Kruskal's MST.

Build a minimum spanning tree of a weighted connected graph.

Strategy.

sort edges by weight
for each edge (u, v) in ascending order:
    if u and v are in different components (Union-Find):
        add (u, v) to MST

Complexity. O(ElogE)O(E \log E).

Why optimal. Cut property — the lightest edge crossing any cut is part of some MST. Each greedy step adds such a lightest edge across the cut separating its components.

Worked. Graph with edges (A,B,4),(A,C,1),(B,C,2),(B,D,5),(C,D,8),(D,E,2),(C,E,10)(A,B,4), (A,C,1), (B,C,2), (B,D,5), (C,D,8), (D,E,2), (C,E,10): Sorted: (A,C,1),(B,C,2),(D,E,2),(A,B,4),(B,D,5),(C,D,8),(C,E,10)(A,C,1), (B,C,2), (D,E,2), (A,B,4), (B,D,5), (C,D,8), (C,E,10).

  • Add (A,C) 1; (B,C) 2; (D,E) 2; skip (A,B) [cycle]; add (B,D) 5. Done — 4 edges.
  • MST weight = 10.

Example 3 — Huffman coding.

Optimal prefix code for nn symbols with frequencies.

Strategy. Put each symbol in a min-heap by frequency. Repeatedly extract the two smallest, combine into a new node whose frequency is the sum, insert back. Continue until one node remains — the Huffman tree.

Complexity. O(nlogn)O(n \log n).

Why optimal. Provable by exchange argument: in any optimal prefix tree, the two least-frequent symbols are sibling leaves at the deepest level. The greedy choice respects this.

Application. JPEG, MP3, ZIP all use Huffman (or its variant arithmetic coding).

When greedy fails — 0/1 Knapsack.

Take or leave each item; maximise value within capacity.

Greedy by ratio fails: items {(w1=10,v1=60),(w2=20,v2=100),(w3=30,v3=120)}\{(w_1=10, v_1=60), (w_2=20, v_2=100), (w_3=30, v_3=120)\}, capacity W=50W = 50. Ratios 6, 5, 4. Greedy picks 1, 2 → value 160 with weight 30. Optimum picks 2, 3 → value 220 with weight 50.

Reason. No greedy-choice property here; need to consider all combinations. DP solves in O(nW)O(nW).

Greedy vs DP comparison.

AspectGreedyDP
Builds solutionIncrementally, no revisitsSolves all sub-problems
MemoryLowHigh (table)
SpeedOften O(nlogn)O(n \log n)Often O(n2)O(n^2) or O(nW)O(nW)
CorrectnessOnly if greedy property holdsAlways (if applicable)
ExamplesMST, Huffman, Activity, DijkstraLCS, 0/1 Knapsack, edit distance

Take-away. Greedy is elegant and fast — but correctness must be proven. When in doubt, DP is the safer fallback.