Disjoint Sets / Union-Find

A disjoint-set ADT is most commonly used to implement the Kruskal’s algorithm for finding the minimum spanning tree of a graph.

Disjoint sets are useful to detect cycles in a graph.

It is commonly defined as a union-find algorithm.

Table of contents

Union-Find
Simple Implementation of Union-Find
Union-Find with Path Compression

Union-Find

Suppose we have a collection of disjoint sets. Each element belongs to exactly one set.

Given some elements, we want to be able to merge (union) the sets that they belong to, or query (find) which disjoint set an element belongs to.

Union-Find supports the following operations:

make_set(x): Creates a singleton set containing element x
- The representative element of the set is x.
union(x, y): Merges the sets containing elements x and y.
find(x): Returns the representative element of the set containing x.

If find(x) == find(y), then x and y are in the same set.

Simple Implementation of Union-Find

A simple array-based implementation idea of union-find:

class UnionFind:
    def __init__(self, n):
        self.parent = [i for i in range(n)]
    
    def find(self, x):
        # Recursively follow the parent pointers
        if self.parent[x] == x:
            return x
        return self.find(self.parent[x])
    
    def union(self, x, y):
        x_root = self.find(x)
        y_root = self.find(y)
        self.parent[x_root] = y_root  # This is a choice

Find operation is $O(n)$ in the worst case.

Union-Find with Path Compression

To be added

From the previous simple implementation, we can see that each disjoint set is like a tree with the representative element as the root.

The simple implementation is just that except that the path to the root can be long.

Union by Rank

One improvement that could be made is to have each node keep track of its height (rank, actually), and when merging two trees, always attach the shorter tree to the taller tree to avoid elongating the path to the root.

If the two trees have the same height, then the height of the resulting tree will have to increase by one.

Rank

But instead of height, we use the term rank.

A rank is basically the (uncompressed) maximum height of its subtree.

It is important to note that the rank of a node is not always the height of the node. The rank is not updated on path compression, which we talk about later. Even though they do not reflect the actual height of the node, it is good enough for amortized analysis.

Upon initialization, the rank of each node is 0.

The rank is increased by one only when two trees of the same rank are merged.

For other merges, the rank for neither tree is changed.

This concept becomes more relevant when proving the amortized complexity of the union-find operations.

Path Compression in Find

Find operation is a recursive operation that follows the parent pointers.

While we recurse up the tree to find the root, we can actually make each node on the way point directly to the root.

We call this path compression or flattening.

It can be expensive but amortized analysis shows that it is worth it for future find operations.

Combining Rank and Path Compression

class UnionFind:
    def __init__(self, n):
        self.parent = [i for i in range(n)]
        self.rank = [0] * n
    
    def find(self, x):
        if self.parent[x] != x:
            # Path compression (remeber rank is not updated)
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    
    def union(self, x, y):
        if self.rank[x] < self.rank[y]:
            self.parent[x] = y
        elif self.rank[x] > self.rank[y]:
            self.parent[y] = x
        else:
            self.parent[y_root] = x_root
            self.rank[x_root] += 1