## Introduction

In this article I'm going to talk about set collections and an implementation of them I have written in C#.

## Background

I've been programming for over 30 years, and for the last 20 I've used either C, C++, JavaScript or Delphi. So when C# was first proposed my initial reaction was “Oh no! Not another C-based language”. However, as I studied the implementation of C# I was impressed by the work Microsoft had put into making an elegant and unified programming system.

That was until I saw this:

[Flags]
public enum FontStyle
{
Regular = 0,
Bold = 1,
Italic = 2,
Underline = 4,
Strikeout = 8,
}

Now to me this looks like a hangover from the days when C was used as a kind of ‘super assembler’. To fully make sense of it you have got to understand binary notation, i.e. that 1 is in the lowest bit position, two in the next and so on. And you have to realise that the values must all be powers of two for the flags to work correctly.

Now here’s the equivalent definition from Delphi:

TFontStyle = (fsBold, fsItalic, fsUnderline, fsStrikeOut);
TFontStyles = **set of** TFontStyle;

I think this is much more descriptive, and does not force the compiler into a particular mapping from the concept to the hardware. You will see ‘set of’ used extensively in a Delphi program.

Using sets in Delphi is also much easier. Suppose you wanted to test that a particular font style included **bold**. Here is how you would do it in C#:

if ((fontStyles & FontStyle.Bold) == FontStyle.Bold)

Whereas the corresponding construct in Delphi is much clearer:

**if** fsBold **in** fontStyles **then**

And here is how you would test that a particular character is alpha-numeric in Delphi:

**if** char **in** ['A'..'Z','a'..'z','0'..'9'] **then**

Now I want to do all my future developments in C# as it seems to have a lot of momentum behind it, but I missed those Delphi sets, so I decided to implement my own set classes for C#.

## Existing Implementations of Set Classes

But before writing a line of code, it’s always a good idea to see if someone has already done the job for you, so I did some Googling for possible existing implementations. I found several.

- Yet Another C# Set Class by Theo Bebekis
- A Set Class by PIEBALDconsult
- A C# Set Class based on enums by ‘RalfW’
- Pascal Set by Scott Mitchell
- Add Support for ‘Set’ Collections to .NET by Jason Smith

Each of these has its merits, but none of them seemed to support all the features I wanted. Briefly these were as follows:

- Use templates to make declarations as simple as possible.
- Have a number of classes supporting the same interface, tuned for various scenarios.
- Include an empty set, sets for values in a range and sets for all values of a given type.
- Implement a
`Contains`

method to make testing for membership as simple as possible - Support all the Boolean operators.
- Allow straightforward iteration through the members.
- Have an efficient implementation for the common cases of sets of integers or enumerations.
- Enumerations should be handled correctly, and not just treated as integers.

The implementation by Jason Smith came closest to what I wanted, so I decided to use that as my starting point. In particular he defines a generic interface – `ISet `

– then implements various classes which conform to that interface. My set library uses the same interface, so classes in it can be freely mixed with any other `ISet `

based class.

### An Apology

This is some of the first code I have written in C#, so there may well be aspects that I could have done better. In particular I was not very concerned with efficiency of the code, just with getting something that worked. If you have any suggestions for improvements please let me know. Also, it has taken me two years to get this far – unfortunately I have a lot of other things on my plate at the moment. So for instance this was written using the 2005 version of C# Express, though I have tested it using the 2008 version.

## Using the Code

The download contains the source for a single solution – `KMBLibrary`

, which contains four projects:

`Iesi.Collections ` | Class library for Jason Smith’s set collections. |

`KMB.Library.Collections ` | Class library of my own set collections written to the same `ISet` interface. |

`KMB.Library.Tests ` | Class library containing NUnit tests for the set collections in `KMB.Libary` . |

`KMBLibraryTest` | Application which combines the above libraries with the NUnit GUI runner to make a Windows test program. |

In order to build and run this solution you will also need the latest NUnit. It can be downloaded from here.

### Iesi.Collections

The primary concept I have borrowed from the `Iesi`

library is the `ISet`

interface. This interface looks like this:

Member | Description |

`Is Empty` | Returns `true` if this set contains no elements. |

`Add` | Adds the specified element to this set if it is not already present. |

`Add All` | Adds all the elements in the specified collection to the set if they are not already present. |

`Clear` | Removes all objects from the set. |

`Contains` | Returns `true` if this set contains the specified element. |

`ContainsAll` | Returns `true` if the set contains all the elements in the specified collection. |

`ExclusiveOr` | Performs an "exclusive-or" of the two sets, keeping only the elements that are in one of the sets, but not in both. The original sets are not modified during this operation. |

`Intersect` | Performs an "intersection" of the two sets, where only the elements that are present in both sets remain. That is, the element is included if it exists in both sets. The `Intersect()` operation does not modify the input sets. |

`Minus` | Performs a "minus" of set `b` from set `a` . This returns a set of all the elements in set `a` , removing the elements that are also in set `b` . The original sets are not modified during this operation. |

`Remove` | Removes the specified element from the set. |

`RemoveAll` | Removes all the specified elements from this set, if they exist in this set. |

`RetainAll` | Retains only the elements in this set that are contained in the specified collection. |

`Union` | Performs a "union" of the two sets, where all the elements in both sets are present. That is, the element is included if it is in either `a` or `b` . Neither this set nor the input set are modified during the operation. |

The `Iesi`

collection library contains various sets written to implement this interface. See Jason’s original article for more details.

### KMB.Library.Collections

The download contains my library `KMB.Library.Collections`

, which implements the following classes:

Class | Description |

`BinarySetOperatorSet ` | Defines a base set for classes which combine sets using boolean operators, such as `IntersectSet` . |

`EmptySet ` | Defines a set with no members. It can be used as a return value from functions that combine sets. |

`IntersectSet ` | Defines a set which is the intersection of two other sets. It can be used as a return value from functions that combine sets. |

`ExclusiveOrSet ` | Defines a set which is the ‘exclusive or’ of two other sets. It can be used as a return value from functions that combine sets. |

`MinusSet ` | Defines a set which is one set 'minus' another. It can be used as a return value from functions that combine sets. |

`SetBase ` | A base class for most classes within the library. It implements useful default behaviour. |

`SetBetween<T>` | Defines the set of all elements of type `T` between two bounds. |

`SetOf<T>` | Defines a set of elements of type `T` . |

`SetOfAll<T>` | Defines a set of **all** objects of type `T` . |

`SetOfInts ` | An efficient implementation of a set for integers that uses `BitArray` . |

`SetOfInts16 ` | A value type for very small sets of integers. It uses the bits of a 16 bit integer to record membership. |

`Sets ` | Defines various static utility functions. |

`UnionSet ` | Defines a set which is the union of two other sets. It can be used as a return value from functions that combine sets. |

`UniversalSet ` | The universal set represents the set of everything. The `Contains ` method returns `true ` for any object. This means it has an infinite number of elements, so it cannot really be used as a collection. |

For more detailed information on each of these classes please see the source. I will just give an overview here.

### Meeting the Requirements

#### Use Templates to Make Declaration as Simple as Possible

The primary class – `SetOf`

– uses templates so that you can easily define sets of different types.

SetOf<int> x = new SetOf<int> ( new int[] {2,5,7,32,66} );
SetOf<string> y = new SetOf<string> ( new string[] {"apples","pears","bananas"} );
SetOf<FontStyle> fontStyle =
new SetOf<FontStyle>( new FontStyle[] {FontStyle.Bold, FontStyle.Italic} );

#### Have a Number of Classes Supporting the Same Interface, Tuned for Various Scenarios

All the classes implement the `ISet`

interface, so they can be freely mixed and substituted.

SetOfInts setOfInts = new SetOfInts();
setOfInts.SetRange(10, 20, true);
SetOf<string> setOfStrings =
new SetOf<string>(new string[] { "apples", "pears", "bananas" });
SetOfAll<FontStyle> allFontStyles = new SetOfAll<FontStyle>();
ISet unionSet = setOfInts | setOfStrings | allFontStyles;

#### Include an Empty Set, Sets for Values in a Range and Sets for all Values of a Given Type

There is an empty set, which is useful in cases were you are combining sets and need to return ‘nothing’:

public static ISet Intersect(ISet a, ISet b)
{
if (a.IsEmpty || b.IsEmpty)
return EmptySet.e;
else
return new IntersectSet(a, b);
}

There is a range set, `SetBetween`

, which allows you to easily define the set of all objects in a given range:

SetBetween<string> set1 = new SetBetween<string>("excretion", "geometry");
Assert.IsFalse(set1.Contains("examination"), "set1 should not contain 'examination'");
Assert.IsTrue(set1.Contains("excretion"), "set1 should contain 'excretion'");

For integers, this set can return an enumerator for use in `foreach`

loops:

SetBetween<int> setBetween = new SetBetween<int>(10, 20);
int total = 0;
foreach(int i in setBetween)
{
total += i;
}
Assert.AreEqual(165, total);

There is also a class which includes all objects of a given type. This is particularly useful for enumerations:

enum Colour { Red, Orange, Yellow, Green, Blue, Indigo, Violet };
SetOfAll<Colour> setOfAllColours = new SetOfAll<Colour>();
foreach(Colour clr in setOfAllColours )
{
:

#### Implement a Contains Method to Make Testing for Membership as Simple as Possible

SetOf<int> x;
x.Add(33);
bool b = false;
if( x.Contains(33) )
b = true;

#### Support all the Boolean Operators

Operators are defined to perform the standard Boolean operations of union, intersection, minus and exclusive-or:

SetOf<int> a = new SetOf<int>(new int[]{ 2, 5, 7, 32, 66 });
SetOf<int> b = new SetOf<int>(new int[]{ 3, 5, 8, 11 });
SetOf<int> c = a | b;
SetOf<int> d = a & b;
SetOf<int> e = a - b;
SetOf<int> f = a ^ b;

#### Allow Straightforward Iteration Through the Members

All the sets support iteration through the members, via the `IEnumerable`

interface:

SetOfInts set1 = new SetOfInts();
set1.SetRange(7, 11, true);
long n = 0;
foreach (int i in set1)
n += i;

#### Have an Efficient Implementation for the Common Cases of Sets of Integers or Enumerations

The set of integers is defined as a wrapper around `BitArray`

. This means large sets of integers can be defined without using excessive memory:

SetOfInts set1 = new SetOfInts();
set1.SetRange(-100000, +100000, true);

For small sets of integers you can use `SetOfInt16`

, which stores the bits in a 16 bit word:

SetOfInts16 set1 = new SetOfInts16();
set1.SetRange(10, 13, true);
string s = set1.ToString();
Assert.AreEqual("{10,11,12,13}", s);

#### Enumerations Should be Handled Correctly, and Not just Treated as Integers

The `SetOf `

class treats enumerations as distinct, regardless of their numeric value:

SetOf<Colour> setOfColours = new SetOf<Colour>();
setOfColours.Add(Colour.Red);
Assert.IsFalse(setOfColours.Contains(FontStyle.Regular));

### KMB.Library.Tests

A set of unit tests have been provided for the library. Use these to get more information on using the classes.

## Points of Interest

Having implemented these set classes, I then speculated on how they could be more tightly integrated into the C# language. I came up with the following proposals (some of which I have passed on to the C# implementation team).

### Provide an "in" Operator

You can currently use the `in`

keyword in a `foreach`

statement, but it would also be useful to support a Delphi-like syntax for testing membership:

if( 32 in x )…

Ideally this would work with any collection class, as the compiler could simply translate the above into:

if ( x.Contains(32) )

### Define a Double Dot Operator

It is often necessary to work with ranges of integers, so it would be useful if C# also supported a double dot operator like Delphi. This can be used as a short hand way of defining a set of integers in a particular range. So for example {1..100} would be handled by the compiler as meaning `SetBetween<int>(1, 100)`

.

SetOf<int> x = {1 .. 100};

### Use Set Notation to Put Constraints on Variables

Again this would overload the use of `in`

to allow constraints on the range of a variable. For example, to limit an integer to be in the range 1 to 120 we could write something like this:

int age in {1..120};

Attempting to assign an invalid value to this variable would cause an exception to be thrown.

age = 42; // OK
age = -5; // throws exception

### Function to Obtain the Domain of a Variable

A `domainof`

function could be defined which returns the set of all possible values of a variable:

ISet s = domainof(age);
foreach (int i in s )
{
:
}

## History

- 2 February 2008 - Initial version
- 3 February 2008 - Fixed the code formatting