Click here to Skip to main content
15,884,298 members
Articles / Programming Languages / C#
Article

Set Collections for C#

Rate me:
Please Sign up or sign in to vote.
4.85/5 (18 votes)
3 Feb 2008CPOL10 min read 115.1K   642   54   7
Describes a library of set collections I have written

Introduction

In this article I'm going to talk about set collections and an implementation of them I have written in C#.

Background

I've been programming for over 30 years, and for the last 20 I've used either C, C++, JavaScript or Delphi. So when C# was first proposed my initial reaction was “Oh no! Not another C-based language”. However, as I studied the implementation of C# I was impressed by the work Microsoft had put into making an elegant and unified programming system.

That was until I saw this:

C#
[Flags]
public enum FontStyle
{
    Regular = 0,
    Bold = 1,
    Italic = 2,
    Underline = 4,
    Strikeout = 8,
}

Now to me this looks like a hangover from the days when C was used as a kind of ‘super assembler’. To fully make sense of it you have got to understand binary notation, i.e. that 1 is in the lowest bit position, two in the next and so on. And you have to realise that the values must all be powers of two for the flags to work correctly.

Now here’s the equivalent definition from Delphi:

TFontStyle = (fsBold, fsItalic, fsUnderline, fsStrikeOut);
TFontStyles = set of TFontStyle;

I think this is much more descriptive, and does not force the compiler into a particular mapping from the concept to the hardware. You will see ‘set of’ used extensively in a Delphi program.

Using sets in Delphi is also much easier. Suppose you wanted to test that a particular font style included bold. Here is how you would do it in C#:

C#
if ((fontStyles & FontStyle.Bold) == FontStyle.Bold)

Whereas the corresponding construct in Delphi is much clearer:

if fsBold in fontStyles then

And here is how you would test that a particular character is alpha-numeric in Delphi:

if char in ['A'..'Z','a'..'z','0'..'9'] then

Now I want to do all my future developments in C# as it seems to have a lot of momentum behind it, but I missed those Delphi sets, so I decided to implement my own set classes for C#.

Existing Implementations of Set Classes

But before writing a line of code, it’s always a good idea to see if someone has already done the job for you, so I did some Googling for possible existing implementations. I found several.

  1. Yet Another C# Set Class by Theo Bebekis
  2. A Set Class by PIEBALDconsult
  3. A C# Set Class based on enums by ‘RalfW’
  4. Pascal Set by Scott Mitchell
  5. Add Support for ‘Set’ Collections to .NET by Jason Smith

Each of these has its merits, but none of them seemed to support all the features I wanted. Briefly these were as follows:

  1. Use templates to make declarations as simple as possible.
  2. Have a number of classes supporting the same interface, tuned for various scenarios.
  3. Include an empty set, sets for values in a range and sets for all values of a given type.
  4. Implement a Contains method to make testing for membership as simple as possible
  5. Support all the Boolean operators.
  6. Allow straightforward iteration through the members.
  7. Have an efficient implementation for the common cases of sets of integers or enumerations.
  8. Enumerations should be handled correctly, and not just treated as integers.

The implementation by Jason Smith came closest to what I wanted, so I decided to use that as my starting point. In particular he defines a generic interface – ISet – then implements various classes which conform to that interface. My set library uses the same interface, so classes in it can be freely mixed with any other ISet based class.

An Apology

This is some of the first code I have written in C#, so there may well be aspects that I could have done better. In particular I was not very concerned with efficiency of the code, just with getting something that worked. If you have any suggestions for improvements please let me know. Also, it has taken me two years to get this far – unfortunately I have a lot of other things on my plate at the moment. So for instance this was written using the 2005 version of C# Express, though I have tested it using the 2008 version.

Using the Code

The download contains the source for a single solution – KMBLibrary, which contains four projects:

Iesi.Collections Class library for Jason Smith’s set collections.
KMB.Library.Collections Class library of my own set collections written to the same ISet interface.
KMB.Library.Tests Class library containing NUnit tests for the set collections in KMB.Libary.
KMBLibraryTest Application which combines the above libraries with the NUnit GUI runner to make a Windows test program.

In order to build and run this solution you will also need the latest NUnit. It can be downloaded from here.

Iesi.Collections

The primary concept I have borrowed from the Iesi library is the ISet interface. This interface looks like this:

Member Description
Is Empty Returns true if this set contains no elements.
Add Adds the specified element to this set if it is not already present.
Add All Adds all the elements in the specified collection to the set if they are not already present.
Clear Removes all objects from the set.
Contains Returns true if this set contains the specified element.
ContainsAll Returns true if the set contains all the elements in the specified collection.
ExclusiveOr Performs an "exclusive-or" of the two sets, keeping only the elements that are in one of the sets, but not in both. The original sets are not modified during this operation.
Intersect Performs an "intersection" of the two sets, where only the elements that are present in both sets remain. That is, the element is included if it exists in both sets. The Intersect() operation does not modify the input sets.
Minus Performs a "minus" of set b from set a. This returns a set of all the elements in set a, removing the elements that are also in set b. The original sets are not modified during this operation.
Remove Removes the specified element from the set.
RemoveAll Removes all the specified elements from this set, if they exist in this set.
RetainAll Retains only the elements in this set that are contained in the specified collection.
Union Performs a "union" of the two sets, where all the elements in both sets are present. That is, the element is included if it is in either a or b. Neither this set nor the input set are modified during the operation.

The Iesi collection library contains various sets written to implement this interface. See Jason’s original article for more details.

KMB.Library.Collections

The download contains my library KMB.Library.Collections, which implements the following classes:

Class Description
BinarySetOperatorSet Defines a base set for classes which combine sets using boolean operators, such as IntersectSet.
EmptySet Defines a set with no members. It can be used as a return value from functions that combine sets.
IntersectSet Defines a set which is the intersection of two other sets. It can be used as a return value from functions that combine sets.
ExclusiveOrSet Defines a set which is the ‘exclusive or’ of two other sets. It can be used as a return value from functions that combine sets.
MinusSet Defines a set which is one set 'minus' another. It can be used as a return value from functions that combine sets.
SetBase A base class for most classes within the library. It implements useful default behaviour.
SetBetween<T> Defines the set of all elements of type T between two bounds.
SetOf<T> Defines a set of elements of type T.
SetOfAll<T> Defines a set of all objects of type T.
SetOfInts An efficient implementation of a set for integers that uses BitArray.
SetOfInts16 A value type for very small sets of integers. It uses the bits of a 16 bit integer to record membership.
Sets Defines various static utility functions.
UnionSet Defines a set which is the union of two other sets. It can be used as a return value from functions that combine sets.
UniversalSet The universal set represents the set of everything. The Contains method returns true for any object. This means it has an infinite number of elements, so it cannot really be used as a collection.

For more detailed information on each of these classes please see the source. I will just give an overview here.

Meeting the Requirements

Use Templates to Make Declaration as Simple as Possible

The primary class – SetOf – uses templates so that you can easily define sets of different types.

C#
SetOf<int> x = new SetOf<int> ( new int[] {2,5,7,32,66} );
SetOf<string> y = new SetOf<string> ( new string[] {"apples","pears","bananas"} );
SetOf<FontStyle> fontStyle = 
    new SetOf<FontStyle>( new FontStyle[] {FontStyle.Bold, FontStyle.Italic} );

Have a Number of Classes Supporting the Same Interface, Tuned for Various Scenarios

All the classes implement the ISet interface, so they can be freely mixed and substituted.

C#
SetOfInts setOfInts = new SetOfInts();
setOfInts.SetRange(10, 20, true);
SetOf<string> setOfStrings = 
    new SetOf<string>(new string[] { "apples", "pears", "bananas" });
SetOfAll<FontStyle> allFontStyles = new SetOfAll<FontStyle>();
ISet unionSet = setOfInts | setOfStrings | allFontStyles;

Include an Empty Set, Sets for Values in a Range and Sets for all Values of a Given Type

There is an empty set, which is useful in cases were you are combining sets and need to return ‘nothing’:

C#
public static ISet Intersect(ISet a, ISet b)
{
    if (a.IsEmpty || b.IsEmpty)
        return EmptySet.e;
    else
        return new IntersectSet(a, b);
}

There is a range set, SetBetween, which allows you to easily define the set of all objects in a given range:

C#
SetBetween<string> set1 = new SetBetween<string>("excretion", "geometry");
Assert.IsFalse(set1.Contains("examination"), "set1 should not contain 'examination'");
Assert.IsTrue(set1.Contains("excretion"), "set1 should contain 'excretion'");

For integers, this set can return an enumerator for use in foreach loops:

C#
SetBetween<int> setBetween = new SetBetween<int>(10, 20);
int total = 0;
foreach(int i in setBetween)
{
    total += i;
}
Assert.AreEqual(165, total);

There is also a class which includes all objects of a given type. This is particularly useful for enumerations:

C#
enum Colour { Red, Orange, Yellow, Green, Blue, Indigo, Violet };

SetOfAll<Colour> setOfAllColours = new SetOfAll<Colour>();

foreach(Colour clr in setOfAllColours )
{
    :

Implement a Contains Method to Make Testing for Membership as Simple as Possible

C#
SetOf<int> x;
x.Add(33);
bool b = false;
if( x.Contains(33) )
    b = true;

Support all the Boolean Operators

Operators are defined to perform the standard Boolean operations of union, intersection, minus and exclusive-or:

C#
SetOf<int> a = new SetOf<int>(new int[]{ 2, 5, 7, 32, 66 });
SetOf<int> b = new SetOf<int>(new int[]{ 3, 5, 8, 11 });
SetOf<int> c = a | b;
SetOf<int> d = a & b;
SetOf<int> e = a - b;
SetOf<int> f = a ^ b;

Allow Straightforward Iteration Through the Members

All the sets support iteration through the members, via the IEnumerable interface:

C#
SetOfInts set1 = new SetOfInts();
set1.SetRange(7, 11, true);
long n = 0;
foreach (int i in set1)
    n += i;

Have an Efficient Implementation for the Common Cases of Sets of Integers or Enumerations

The set of integers is defined as a wrapper around BitArray. This means large sets of integers can be defined without using excessive memory:

C#
SetOfInts set1 = new SetOfInts();
set1.SetRange(-100000, +100000, true);    // will take up about 50K

For small sets of integers you can use SetOfInt16, which stores the bits in a 16 bit word:

C#
SetOfInts16 set1 = new SetOfInts16();
set1.SetRange(10, 13, true);
string s = set1.ToString();
Assert.AreEqual("{10,11,12,13}", s);

Enumerations Should be Handled Correctly, and Not just Treated as Integers

The SetOf class treats enumerations as distinct, regardless of their numeric value:

C#
SetOf<Colour> setOfColours = new SetOf<Colour>();
setOfColours.Add(Colour.Red); // Red has the numeric value zero
Assert.IsFalse(setOfColours.Contains(FontStyle.Regular));  
// Regular also has the value zero
// but is not in the set of colours

KMB.Library.Tests

A set of unit tests have been provided for the library. Use these to get more information on using the classes.

Points of Interest

Having implemented these set classes, I then speculated on how they could be more tightly integrated into the C# language. I came up with the following proposals (some of which I have passed on to the C# implementation team).

Provide an "in" Operator

You can currently use the in keyword in a foreach statement, but it would also be useful to support a Delphi-like syntax for testing membership:

C#
if( 32 in x )… 

Ideally this would work with any collection class, as the compiler could simply translate the above into:

C#
if ( x.Contains(32) ) 

Define a Double Dot Operator

It is often necessary to work with ranges of integers, so it would be useful if C# also supported a double dot operator like Delphi. This can be used as a short hand way of defining a set of integers in a particular range. So for example {1..100} would be handled by the compiler as meaning SetBetween<int>(1, 100).

C#
SetOf<int> x = {1 .. 100}; 

Use Set Notation to Put Constraints on Variables

Again this would overload the use of in to allow constraints on the range of a variable. For example, to limit an integer to be in the range 1 to 120 we could write something like this:

C#
int age in {1..120}; 

Attempting to assign an invalid value to this variable would cause an exception to be thrown.

age = 42; // OK 
age = -5; // throws exception 

Function to Obtain the Domain of a Variable

A domainof function could be defined which returns the set of all possible values of a variable:

C#
ISet s = domainof(age);    // would return the set of
                           // ints between 1 and 120

foreach (int i in s )
{
    :
}

History

  • 2 February 2008 - Initial version
  • 3 February 2008 - Fixed the code formatting

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior) Imagine Communications
United Kingdom United Kingdom
I have been working in IT since 1975, in various roles from junior programmer to system architect, and with many different languages and platforms. I have written shedloads of code.

I now live in Bedfordshire, England. As well as working full time I am the primary carer for my wife who has MS. I am learning to play the piano. I have three grown up children and a cat.

Comments and Discussions

 
GeneralDepreciated by .NET framework 3.5's HashSet Generic Class Pin
James Kolpack3-Feb-08 14:55
James Kolpack3-Feb-08 14:55 
GeneralRe: Depreciated by .NET framework 3.5's HashSet Generic Class Pin
PIEBALDconsult3-Feb-08 16:41
mvePIEBALDconsult3-Feb-08 16:41 
GeneralRe: Depreciated by .NET framework 3.5's HashSet Generic Class Pin
Keith Barrett4-Feb-08 9:39
Keith Barrett4-Feb-08 9:39 
GeneralRe: Depreciated by .NET framework 3.5's HashSet Generic Class Pin
Jaime Olivares11-Jul-08 17:59
Jaime Olivares11-Jul-08 17:59 
GeneralRe: Depreciated by .NET framework 3.5's HashSet Generic Class Pin
Jaime Olivares11-Jul-08 18:15
Jaime Olivares11-Jul-08 18:15 
GeneralRe: Depreciated by .NET framework 3.5's HashSet Generic Class Pin
ds9c1lt19523-Aug-08 19:29
ds9c1lt19523-Aug-08 19:29 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.