Disjoint Sets / Union-Find
A disjoint-set ADT is most commonly used to implement the Kruskal’s algorithm for finding the minimum spanning tree of a graph.
Disjoint sets are useful to detect cycles in a graph.
It is commonly defined as a union-find algorithm.
Table of contents
Union-Find
Suppose we have a collection of disjoint sets. Each element belongs to exactly one set.
Given some elements, we want to be able to merge (union) the sets that they belong to, or query (find) which disjoint set an element belongs to.
Union-Find supports the following operations:
make_set(x)
: Creates a singleton set containing elementx
- The representative element of the set is
x
.
- The representative element of the set is
union(x, y)
: Merges the sets containing elementsx
andy
.find(x)
: Returns the representative element of the set containingx
.
If find(x) == find(y)
, then x
and y
are in the same set.
Simple Implementation of Union-Find
A simple array-based implementation idea of union-find:
class UnionFind:
def __init__(self, n):
self.parent = [i for i in range(n)]
def find(self, x):
# Recursively follow the parent pointers
if self.parent[x] == x:
return x
return self.find(self.parent[x])
def union(self, x, y):
x_root = self.find(x)
y_root = self.find(y)
self.parent[x_root] = y_root # This is a choice
Find operation is $O(n)$ in the worst case.
Union-Find with Path Compression
To be added
From the previous simple implementation, we can see that each disjoint set is like a tree with the representative element as the root.
The simple implementation is just that except that the path to the root can be long.
Union by Rank
One improvement that could be made is to have each node keep track of its height (rank, actually), and when merging two trees, always attach the shorter tree to the taller tree to avoid elongating the path to the root.
If the two trees have the same height, then the height of the resulting tree will have to increase by one.
Rank
But instead of height, we use the term rank.
A rank is basically the (uncompressed) maximum height of its subtree.
It is important to note that the rank of a node is not always the height of the node. The rank is not updated on path compression, which we talk about later. Even though they do not reflect the actual height of the node, it is good enough for amortized analysis.
Upon initialization, the rank of each node is 0.
The rank is increased by one only when two trees of the same rank are merged.
For other merges, the rank for neither tree is changed.
This concept becomes more relevant when proving the amortized complexity of the union-find operations.
Path Compression in Find
Find operation is a recursive operation that follows the parent pointers.
While we recurse up the tree to find the root, we can actually make each node on the way point directly to the root.
We call this path compression or flattening.
It can be expensive but amortized analysis shows that it is worth it for future find operations.
Combining Rank and Path Compression
class UnionFind:
def __init__(self, n):
self.parent = [i for i in range(n)]
self.rank = [0] * n
def find(self, x):
if self.parent[x] != x:
# Path compression (remeber rank is not updated)
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
if self.rank[x] < self.rank[y]:
self.parent[x] = y
elif self.rank[x] > self.rank[y]:
self.parent[y] = x
else:
self.parent[y_root] = x_root
self.rank[x_root] += 1