*Connected Sets Labeling* or *Connected Components Labeling* is the process of assigning unique labels to elements in a matrix or image, in such a way that adjacent elements are assigned the same label. The elements within a connected set will be similar to each other in terms of a given *criteria*. Alternate terms for connected-sets-labeling include connected-component-analysis, blob-extraction, blob-discovery, region-finding, region-labeling and region-extraction. Connected-sets-labeling is an important problem that has many applications in *graph theory* and *computer vision*. In *computer vision*, connected-sets-labeling is used in image analysis to find groups of similar pixels. Identifying and labeling of various disjoint or connected regions in an image is useful in many automated image analysis applications (example: image segmentation).

This article explains a sequential algorithm (with C# source-code) for finding disjoint connected sets in a given matrix. Using the Silverlight demo provided, you can dynamically create an input matrix and mark/unmark the cells (by clicking) to see the generated connected-sets (or regions) visually. Full source-code of the demo is available for download.

Connected Sets Demo in Silverlight

Click on the matrix to mark/unmark the cells.

Connected Sets Labeling Process

Finding the connected sets using a sequential procedure (without recursion) consists of two phases.

- Phase1: Find & mark regions with temporary labels and record the equivalences.
- Phase2: Merge the regions (i.e. assign all equivalent regions the same region value).

Data Structures for this Article

The following data-structure classes are created. The *Matrix* class represents a 2-dimensional matrix or an image. The *Cell* class represents a cell or element in a matrix or a pixel in an image. The *Set* class represents a list of items (in our case numbers). The *SetList* class represents a list of sets.

public class Matrix { public int Width, Height; public int[,] Values; } public class Cell { public int X, Y, Value; } public class Set : List<int> { } public class SetList : Dictionary<int, Set> { }

Phase1: Finding and Marking the Regions

The sequential algorithm for marking the regions, is as follows:

- Create a region-counter and initialize it to 1.
- For every cell in the matrix, get the neighbor cells that match a criteria. For binary image/matrix, cell-value of 1 is the criteria. Note that there will be 8 neighbors for a given cell of a matrix, namely
*north*,*south*,*east*,*west*,*north-east*,*north-west*,*south-east*and*south-west*. For*4-connectivity*, consider*north*and*west*cells, and for*8-connectivity*, consider*north*,*west*,*north-east*and*north-west*cells. For more information on connectivity, see http://en.wikipedia.org/wiki/Connectedness. - If zero neighbors match the criteria, then assign the cell to current region-counter value, and increment the region-counter.
- If only one neighbor match the criteria, then assign the cell to region of that neighboring cell.
- If multiple neighbors match the criteria and all of them are of same region, then assign the cell to that region.
- If multiple neighbors match the criteria and they belong to different regions, then assign the cell to any one of those regions, and record that these regions are equivalent.

The following is the complete source-code for this algorithm. The comments inside the source-code identify the step-number in the above algorithm.

public static Matrix MarkRegions(Matrix matrix, out SetList equivalentRegions) { Matrix regionMatrix = new Matrix(matrix.Width, matrix.Height); equivalentRegions = new SetList(); int currentRegion = 1; //step1 for (int y = 0; y < matrix.Height; y++) { for (int x = 0; x < matrix.Width; x++) { if (matrix.Values[x, y] == 1) { List<Cell> neighbors = matrix.GetNeighbors(x, y, 8); //step2 int matchCount = neighbors.Count(cell => cell.Value > 0); if (matchCount == 0) //step3 { regionMatrix.Values[x, y] = currentRegion; equivalentRegions.Add(currentRegion, new Set() { currentRegion }); currentRegion += 1; } else if (matchCount == 1) //step4 { Cell oneCell = neighbors.First(cell => cell.Value == 1); regionMatrix.Values[x, y] = regionMatrix.Values[oneCell.X, oneCell.Y]; } else if (matchCount > 1) { List<int> distinctRegions = neighbors.Select(cell => regionMatrix.Values[cell.X, cell.Y]).Distinct().ToList(); while (distinctRegions.Remove(0)) ; if (distinctRegions.Count == 1) //step5 { regionMatrix.Values[x, y] = distinctRegions[0]; } else if (distinctRegions.Count > 1) //step6 { int firstRegion = distinctRegions[0]; regionMatrix.Values[x, y] = firstRegion; //save the relations of equivalent regions foreach (int region in distinctRegions) { if (!equivalentRegions[firstRegion].Contains(region)) { equivalentRegions[firstRegion].Add(region); } } } } } } } return regionMatrix; }

For illustration, consider the following input matrix:

0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1 1 1 0 1 0 0 0 1 0 1 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1

By applying the phase1 algorithm, the following region-matrix will be created.

0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 2 0 0 3 3 3 1 0 2 0 0 0 3 0 1 2 2 0 0 0 0 3 0 0 2 0 0 0 0 3 0 0 0 0 0 4 4 0 0 0 0 0 0 4 0 0 5

From the region-matrix, we understand that 5 regions are marked. For each region, a equivalence-set will be created. Thus, the equivalence-sets created for 5 regions in this region-matrix will be: {1, 3}, {2}, {3, 1}, {4, 3} and {5}, respectively.

Phase2: Merging the Regions (Union-Find Algorithm)

Merging the regions is the process of assigning all equivalent regions, the same region value. This can be achieved using *Union-Find* algorithm.

The *Union-Find* algorithm, in general, is the process of finding 'disjoint' sets (with 'connected' elements) from a list of given sets. The following are the steps in *Union-Find* algorithm:

- Create a dictionary for storing
*root*of all elements. Initially set the*root*of each element to itself. - If two elements are 'connected', then unite the sets of those two elements. To unite sets containining the elements
*p*and*q*, change the root of all elements (in the dictionary) with*root(p)*to*root(q)*.

According to *Union-Find* algorithm, elements *p* and *q* are connected, if they have same *root*.

Please note that *Union-Find* algorithm needs the information of how the elements are 'connected' to each other. For the problem of region-merging, the equivalence-sets found in phase1 can be used in Step2 of the *Union-Find* algorithm.

The following is the source-code of *Union-Find* algorithm:

public class Vector : Dictionary<int, int> { } public class UnionFind { public SetList Sets { get; set; } //equivalence-sets public Vector Roots { get; set; } public UnionFind(SetList sets) { this.Sets = sets; this.Roots = new Vector(); this.Initialize(); this.Start(); } public void Initialize() { //initialize Roots (add all items to Roots) Roots.Clear(); foreach (int index in Sets.Keys) foreach (int item in Sets[index]) if (!Roots.ContainsKey(item)) Roots.Add(item, -1); //assign root of each item foreach (int index in Sets.Keys) foreach (int item in Sets[index]) Roots[item] = Sets[index][0]; } public void Unite(int item1, int item2) { int item1Root = Roots[item1]; for (int index = 0; index < Roots.Count; index++) { int item = Roots.Keys.ElementAt(index); if (Roots[item] == item1Root) Roots[item] = Roots[item2]; } } public void Start() { foreach (int index in Sets.Keys) { var set = Sets[index]; for (int i = 0; i < set.Count; i++) for (int j = i + 1; j < set.Count; j++) Unite(set[i], set[j]); } } }

After applying the *Union-Find* algorithm to the sample region-matrix illustrated above, the *Roots* dictionary will be: [1:4, 2:2, 3:4, 4:4, 5:5]. From this we understand that only 3 disjoint sets (namely 2, 4 and 5) are available in the region-matrix.

0 0 0 0 4 4 4 0 0 0 0 0 0 0 0 4 0 2 0 0 4 4 4 4 0 2 0 0 0 4 0 4 2 2 0 0 0 0 4 0 0 2 0 0 0 0 4 0 0 0 0 0 4 4 0 0 0 0 0 0 4 0 0 5

We have to then normalize the region-numbers i.e. assign sequential region-numbers starting from 1. Thus, the region numbers after normalizing will be 1, 2 and 3. The final region-matrix will look like this:

0 0 0 0 2 2 2 0 0 0 0 0 0 0 0 2 0 1 0 0 2 2 2 2 0 1 0 0 0 2 0 2 1 1 0 0 0 0 2 0 0 1 0 0 0 0 2 0 0 0 0 0 2 2 0 0 0 0 0 0 2 0 0 3

The techniques explained in this article can be extended to any dimensions, however with increased computational complexities. For more information on connected component analysis, see http://en.wikipedia.org/wiki/Connected-component_labeling.