Directed Graphs, Trees and Minimum Spanning Tree
Core Titles
Key headlines and terms for quick recall- Directed graph (digraph), in-degree, out-degree
- Strongly / Weakly connected digraph
- Tree — connected, acyclic graph; vertices, edges
- Rooted tree, leaf, internal node, depth
- Spanning tree of a connected graph
- Minimum Spanning Tree (MST) — minimum-weight spanning tree
- Kruskal's algorithm — sort edges, add if no cycle
- Prim's algorithm — grow from a vertex
Basic Idea
What it is, why it matters, how it worksDirected graphs
A digraph has ordered edges . Each vertex has an in-degree (incoming) and out-degree (outgoing). Handshaking: .
A digraph is strongly connected if every vertex can reach every other vertex; weakly connected if its underlying undirected graph is connected.
Trees
A tree is a connected acyclic undirected graph. Equivalent characterisations on vertices:
- connected and has exactly edges
- acyclic and has edges
- unique path between any two vertices
- adding any edge creates a cycle; removing any edge disconnects it
Rooted tree — designate one vertex as the root; every other vertex has a unique parent. Standard data structure for hierarchies, expression trees, decision trees.
Spanning tree
A subgraph of a connected graph that is a tree and includes every vertex.
Minimum Spanning Tree (MST)
For a weighted connected graph, the spanning tree with minimum total edge weight.
Kruskal's algorithm ():
- Sort all edges by weight.
- Add the next-lightest edge if it doesn't create a cycle (use Union–Find).
- Stop when edges added.
Prim's algorithm ( with heap):
- Start with any vertex.
- Repeatedly add the cheapest edge connecting the tree to a vertex outside it.
- Continue until all vertices included.
Both are greedy and produce the same total weight (MST is unique if edge weights are distinct).
Why this matters in Data Science
Hierarchical clustering builds a tree. Decision trees, random forests, parse trees, network design, cluster analysis — all are tree/MST applications.
Mind Map
Visual structure of the conceptDIGRAPHS, TREES, MST
├── Digraph
│ ├── In/Out-degree
│ ├── Strongly connected
│ └── Weakly connected
├── Tree (n vertices ⇒ n−1 edges)
│ ├── Connected + acyclic
│ ├── Unique path between any two
│ └── Rooted: parent / leaf / depth
├── Spanning Tree
│ └── Tree covering all vertices
└── MST (min total edge weight)
├── Kruskal — sort + Union-Find O(E log E)
└── Prim — grow with min-heap O((V+E)log V)
Exam Q&A
Part A (2 marks) and Part B (20 marks) style questionsPart A (2 marks each)
Q1. Define a tree. A connected, acyclic, undirected graph. Equivalently, on vertices it has exactly edges and connects all of them.
Q2. How many edges does a tree on vertices have? .
Q3. What is an MST? A spanning tree of a weighted connected graph with the smallest possible total edge weight.
Q4. What is the difference between Prim's and Kruskal's algorithm? Prim grows a tree from a starting vertex, always adding the cheapest crossing edge. Kruskal sorts all edges and adds them in order, skipping those that would create a cycle.
Part B (20 marks)
Q. Define a spanning tree and MST. Describe Kruskal's algorithm and apply it to find the MST of the graph below.
Edges: .
Spanning tree. A subgraph of a connected graph that is a tree and includes every vertex of .
MST. A spanning tree with the minimum total edge weight.
Kruskal's algorithm.
sort edges by weight ascending
initialise disjoint-set with each vertex in its own component
for each edge (u, v) in sorted order:
if find(u) ≠ find(v):
add (u, v) to MST
union(u, v)
if MST has |V|−1 edges: stop
Apply to the example. Sorted edges:
| # | Edge | Weight | Action |
|---|---|---|---|
| 1 | (A, C) | 1 | add — connects |
| 2 | (B, C) | 2 | add — connects |
| 3 | (D, E) | 2 | add — separate component |
| 4 | (A, B) | 4 | skip — A and B already in same component (would form cycle) |
| 5 | (B, D) | 5 | add — merges everything into one tree |
Now 4 edges chosen = . Stop.
MST edges: . Total weight: .
Diagram of MST.
A —1— C —2— B —5— D —2— E
Vertex E hangs off D; both connected back to the main path through C.