12,501,833 members (50,641 online)
Article
alternative version

25.6K views
26 bookmarked
Posted

# Cluster Algorithm Using .NET Collections

, 21 Jul 2009 CPOL
 Rate this:
Implementation of a clusterization algorithm by means of .NET's LINQ and Generics.

## Introduction

This article shows an algorithm for the classification of a set of points into groups (clusters) according to their relative positions by putting into work some of the .NET features like Generic Collections, LINQ, Partial Classes, and Lambda Expressions.

## Background

Clusterization algorithms are a well studied subject, and articles about it are easily found on the Internet.

At the core of each algorithm, there is always a loop, or series of loops, to traverse the set of points and classify them into groups (clusters) according to a certain criteria. In this article, I show how, by taking advantage of .NET collection classes, LINQ, and Lambda Expressions, an algorithm can be simplified.

First, some definitions to establish a common parlance:

• The algorithm's entry data will be a set of points {P}, let N be the total number of points in {P}; and a number (distance) D.
• Two points of {P} are said to be directly connected if their relative (Euclidian) distance is less or equal than D.
• Two points of {P} are said to be indirectly connected if a path of directly connected points can be established between them.
• A cluster is a subset of points of {P} that are (directly or indirectly) connected among them.
• A point can only belong to one cluster.
• If a point is not directly connected to any other point, then it forms all by itself one cluster. That cluster contains only one point.
• The result of the algorithm will be a set clusters {C}. This set describes how the points in {P} are grouped when following the above rules.
• Each individual member of {C} lists a subset of {P}.

Sorry, the last two sentences sound more complicated than what in reality the problem is. OOP clarifies these matters in the code.

## Using the code

The class `Point` describes a point of the above mentioned set {P}. {P} itself is represented in code by a .NET generic collection `l_ListPoints`. The class `Cluster` has a property describing the points inside the cluster of type `List<Point>`. {C} is represented in code by `l_ListClusters`. The actual data for the points was generated using C#'s not-so-random generator and set into a .NET partial class `BasicLibrary.AuxListPoint` (spread in files AuxListPoint.cs and points.cs). The bottom line is the positions of the points are hardcoded for the sake of the example.

Figure 1 displays the core of the algorithm:

The simplification provided by using the .NET features can be seen in the coding of steps 6, 7, and 8 (see Fig 1). If we weren't using .NET, then we would have to code steps 6, 7, and 8 into elaborated sub-algorithms to handle collections.

In code, the implementation of those steps looks like:

```#region Main Loop
// the algorithm is inside this loop
List<Cluster> l_ListAttainableClusters;
Cluster l_c;
foreach (BasicLibrary.Point p in l_ListPoints)
{
l_ListAttainableClusters = new List<Cluster>();
l_ListAttainableClusters = l_ListClusters.FindAll(x => x.IsPointReachable(p));
l_ListClusters.RemoveAll(x => x.IsPointReachable(p));
l_c = new Cluster(l_Dist, p);
// merge point's "reachable" clusters
if (l_ListAttainableClusters.Count > 0)
l_c.AnnexCluster(l_ListAttainableClusters.Aggregate((c, x) =>
c = Cluster.MergeClusters(x, c)));
l_ListAttainableClusters = null;
l_c = null;
}  // of loop over candidate points
#endregion```

## Points of interest

It is worth mentioning that real life clusterization applications often deal with a huge number of points, and the performance of the algorithm is of the essence in those cases. The algorithm presented here hasn't been tested for a number of points greater than fifty thousands points; its performance for greater sets could be an issue.

There is a WPF app and a console app in the solution (see source code). Both use the same algorithm. The WPF project comes with a graphic display of the clusterization that is colorful but has yet to prove to be meaningful (Fig. 2 and Fig. 3 are examples of running the WPF app with two different D values). In the graphics, I used a helper class described in http://www.codeproject.com/KB/graphics/WpfCoordExample.aspx.

## Share

 Web Developer United States
Emilio
New York

## You may also be interested in...

 Pro Pro