PGD01C01
Module 2 · Graph Theory

Directed Graphs, Trees and Minimum Spanning Tree

Core Titles
Key headlines and terms for quick recall
  • Directed graph (digraph), in-degree, out-degree
  • Strongly / Weakly connected digraph
  • Tree — connected, acyclic graph; nn vertices, n1n - 1 edges
  • Rooted tree, leaf, internal node, depth
  • Spanning tree of a connected graph
  • Minimum Spanning Tree (MST) — minimum-weight spanning tree
  • Kruskal's algorithm — sort edges, add if no cycle
  • Prim's algorithm — grow from a vertex
Basic Idea
What it is, why it matters, how it works

Directed graphs

A digraph has ordered edges (u,v)(u, v). Each vertex has an in-degree (incoming) and out-degree (outgoing). Handshaking: in-deg=out-deg=E\sum \text{in-deg} = \sum \text{out-deg} = |E|.

A digraph is strongly connected if every vertex can reach every other vertex; weakly connected if its underlying undirected graph is connected.

Trees

A tree is a connected acyclic undirected graph. Equivalent characterisations on nn vertices:

  • connected and has exactly n1n - 1 edges
  • acyclic and has n1n - 1 edges
  • unique path between any two vertices
  • adding any edge creates a cycle; removing any edge disconnects it

Rooted tree — designate one vertex as the root; every other vertex has a unique parent. Standard data structure for hierarchies, expression trees, decision trees.

Spanning tree

A subgraph of a connected graph GG that is a tree and includes every vertex.

Minimum Spanning Tree (MST)

For a weighted connected graph, the spanning tree with minimum total edge weight.

Kruskal's algorithm (O(ElogE)O(E \log E)):

  1. Sort all edges by weight.
  2. Add the next-lightest edge if it doesn't create a cycle (use Union–Find).
  3. Stop when n1n - 1 edges added.

Prim's algorithm (O((V+E)logV)O((V+E)\log V) with heap):

  1. Start with any vertex.
  2. Repeatedly add the cheapest edge connecting the tree to a vertex outside it.
  3. Continue until all vertices included.

Both are greedy and produce the same total weight (MST is unique if edge weights are distinct).

Why this matters in Data Science

Hierarchical clustering builds a tree. Decision trees, random forests, parse trees, network design, cluster analysis — all are tree/MST applications.

Mind Map
Visual structure of the concept
DIGRAPHS, TREES, MST
├── Digraph
│   ├── In/Out-degree
│   ├── Strongly connected
│   └── Weakly connected
├── Tree (n vertices ⇒ n−1 edges)
│   ├── Connected + acyclic
│   ├── Unique path between any two
│   └── Rooted: parent / leaf / depth
├── Spanning Tree
│   └── Tree covering all vertices
└── MST  (min total edge weight)
    ├── Kruskal  — sort + Union-Find  O(E log E)
    └── Prim     — grow with min-heap O((V+E)log V)
Exam Q&A
Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. Define a tree. A connected, acyclic, undirected graph. Equivalently, on nn vertices it has exactly n1n-1 edges and connects all of them.

Q2. How many edges does a tree on nn vertices have? n1n - 1.

Q3. What is an MST? A spanning tree of a weighted connected graph with the smallest possible total edge weight.

Q4. What is the difference between Prim's and Kruskal's algorithm? Prim grows a tree from a starting vertex, always adding the cheapest crossing edge. Kruskal sorts all edges and adds them in order, skipping those that would create a cycle.


Part B (20 marks)

Q. Define a spanning tree and MST. Describe Kruskal's algorithm and apply it to find the MST of the graph below.

Edges: (A,B,4),(A,C,1),(B,C,2),(B,D,5),(C,D,8),(C,E,10),(D,E,2)(A,B,4), (A,C,1), (B,C,2), (B,D,5), (C,D,8), (C,E,10), (D,E,2).

Spanning tree. A subgraph of a connected graph GG that is a tree and includes every vertex of GG.

MST. A spanning tree with the minimum total edge weight.

Kruskal's algorithm.

sort edges by weight ascending
initialise disjoint-set with each vertex in its own component
for each edge (u, v) in sorted order:
    if find(u) ≠ find(v):
        add (u, v) to MST
        union(u, v)
    if MST has |V|−1 edges: stop

Apply to the example. Sorted edges:

#EdgeWeightAction
1(A, C)1add — connects {A,C}\{A,C\}
2(B, C)2add — connects {A,B,C}\{A,B,C\}
3(D, E)2add — separate component {D,E}\{D,E\}
4(A, B)4skip — A and B already in same component (would form cycle)
5(B, D)5add — merges everything into one tree

Now 4 edges chosen = V1|V| - 1. Stop.

MST edges: (A,C),(B,C),(D,E),(B,D)(A,C), (B,C), (D,E), (B,D). Total weight: 1+2+2+5=101 + 2 + 2 + 5 = 10.

Diagram of MST.

A —1— C —2— B —5— D —2— E

Vertex E hangs off D; both connected back to the main path through C.