Set Collections for C#

Keith Barrett
Rate me:
4.85/5 (18 votes)
3 Feb 2008CPOL10 min read
115.1K
642
Describes a library of set collections I have written
kmblibrary.zip
- Iesi.Collections
- KMBLibrary
  - KMB.Library.Tests
    - bin
      - Release
    - Customer.cs
    - EmptySetTest.cs
    - ExamplesTest.cs
    - KMB.Library.Tests.csproj
    - Properties
      - AssemblyInfo.cs
    - SetBetweenTests.cs
    - SetOfAllTests.cs
    - SetOfEnumsTests.cs
    - SetOfEnumTests.cs
    - SetOfInt16Tests.cs
    - SetOfIntsTests.cs
    - SetOfIntTests.cs
    - SetOfObjectsTest.cs
    - SetOfStringsTests.cs
    - SetOfStructsTest.cs
    - UniversalSetTest.cs
  - KMB.Library
    - Collections
      - bin
        
        Release
      - BinarySetOperatorSet.cs
      - EmptySet.cs
      - ExclusiveOrSet.cs
      - IntersectSet.cs
      - KMB.Library.Collections.csproj
      - MinusSet.cs
      - Properties
        
        AssemblyInfo.cs
      - SetBase.cs
      - SetBetween.cs
      - SetOf.cs
      - SetOfAll.cs
      - SetOfInts.cs
      - SetOfInts16.cs
      - Sets.cs
      - UnionSet.cs
      - UniversalSet.cs
  - KMBLibrary.sln
  - KMBLibrary.suo
  - KMBLibraryTest
    - bin
      - Release
    - Form1.cs
    - Form1.Designer.cs
    - KMBLibraryTest.csproj
    - Program.cs
    - Properties
<html dir="LTR"><head><META http-equiv="Content-Type" content="text/html; charset=utf-8"><meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5"><title>Iesi.Collections</title><xml></xml><link rel="stylesheet" type="text/css" href="MSDN.css"></head><body id="bodyID" class="dtBODY"><div id="nsbanner"><div id="bannerrow1"><table class="bannerparthead" cellspacing="0"><tr id="hdr"><td class="runninghead">"Set" Collections Library for .NET</td><td class="product"></td></tr></table></div><div id="TitleRow"><h1 class="dtH1">Iesi.Collections Namespace</h1></div></div><div id="nstext"><p>
            			The System.Collections namespace in the .NET Framework provides a number of collection 
            			types that are extremely useful for manipulating data in memory. However, there is one 
            			type of collection that is conspicuously missing from <code>System.Collections</code>: the 
            			<code>Set</code>.
            		</p><p>
            			A <code>Set</code> is a collection that contains no duplicate elements, and where the order of
            			the elements may be arbitrary. It is loosely modelled after 
            			the mathematical concept of a "set." This implementation is based on the Java <code>Set</code> 
            			interface definition, so if you are also a Java programmer, this may seem familiar. 
            			This library provides a number of "standard" <code>Set</code> operators that the Java library 
            			neglected to include.
            		</p><p><code>Sets</code> come in handy when an <code>Array</code> or a <code>List</code> won't quite fit the bill. 
            			Arrays in .NET have a fixed length, making it tedious to add and remove elements. 
            			Lists allow you add new objects easily, but you can have numerous duplicate 
            			elements, which is undesirable for some types of problems. 
            			Searching Arrays or Lists for elements is just plain slow for large data sets, 
            			requiring a linear search. You could keep the array sorted and use a binary search, 
            			but that is often more trouble than it is worth (especially since this library, 
            			and the .NET Framework, provide better ways that are already written for you).
            		</p><p>
            			With sets, adding elements, removing elements, and checking for the existence 
            			of an element is fast and simple. You can mix and match the elements in different 
            			sets using the supported mathematical set operators: union, intersection, 
            			exclusive-or, and minus. 
            		</p><p>
            			You will see some interesting side effects with different <code>Set</code> 
            			implementations in this library, depending on the underlying search algorithm. 
            			For example, if you choose a sort-based <code>Set</code>, the elements will come out 
            			in sort order when you iterate using <code>foreach</code>. If you use a hash-based 
            			<code>Set</code>, the elements will come out in no particular order, but checking 
            			for inclusion will fastest when dealing with large data sets. If you use a 
            			list-based <code>Set</code>, elements will come out in the order you put them in 
            			when you iterate (although the effect of operators on element order in 
            			<code>Set</code> instances is not well defined by design). Additionally, list-based 
            			sets are fastest for very small data sets (up to about 10 elements), 
            			but get slower very quickly as the number of contained elements increases. 
            			To get the best of both worlds, the library provides a <code>Set</code> type that 
            			uses lists for small data sets and switches to a hash-based algorithm when 
            			the data set gets large enough to warrant it.
            		</p><p>
            			The following sample program demonstrates some of the features of sets:
            			<pre class="code">
            using System;
            using Iesi.Collections;
            namespace RiverDemo
            {    
            	class Rivers
            	{
            		[STAThread]
            		static void Main(string[] args)
            		{
            			//Use Arrays (which are ICollection objects) to quickly initialize.
            			Set arizona   
            				= new SortedSet(new string[] {"Colorado River"});
            			Set california
            				= new SortedSet(new string[] {"Colorado River", "Sacramento River"});
            			Set colorado
            				= new SortedSet(new string[] {"Arkansas River", "Colorado River", "Green River", "Rio Grande"});
            			Set kansas
            				= new SortedSet(new string[] {"Arkansas River", "Missouri River"});
            			Set nevada
            				= new SortedSet(new string[] {"Colorado River"});
            			Set newMexico
            				= new SortedSet(new string[] {"Rio Grande"});
            			Set utah
            				= new SortedSet(new string[] {"Colorado River", "Green River", "San Juan River"});
            			//Rivers by region.
            			Set southWest = colorado | newMexico | arizona | utah;
            			Set midWest = kansas;
            			Set west = california | nevada;
            			//All rivers (at least for the demo).
            			Set all = southWest | midWest | west;
            			Print("All rivers:", all);
            			Print("Rivers in the southwest:", southWest);
            			Print("Rivers in the west:", west);
            			Print("Rivers in the midwest:", midWest);
            			Console.WriteLine();
            
            			//Use the '-' operator to subtract the rivers in Colorado from 
            			//the set of all rivers.
            			Print("Of all rivers, these don't pass through Colorado:", all - colorado);
            
            			//Use the '&amp;' operator to find rivers that are in Colorado AND in Utah.
            			//A river must be present in both states, not just one.
            			Print("Rivers common to both Colorado and Utah:", colorado &amp; utah);
            
            			//use the '^' operator to find rivers that are in Colorado OR Utah,
            			//but not in both.
            			Print("Rivers in Colorado and Utah that are not shared by both states:",
            				colorado ^ utah);
            
            			//Use the '&amp;' operator to discover which rivers are present in Arizona, 
            			// California,Colorado, Nevada, and Utah.  The river must be present in 
            			// all states to be counted.
            			Print("Rivers common to Arizona, California, Colorado, Nevada, and Utah:", 
            				arizona &amp; california &amp; colorado &amp; nevada &amp; utah);
            			//Just to prove to you that operators always return like types, let's do a
            			//complex Set operation and look at the type of the result:
            			Console.WriteLine("The type of this complex operation is: " + 
            				((southWest ^ colorado &amp; california) | kansas).GetType().FullName);
            		}
            		private static void Print(string title, Set elements)
            		{
            			Console.WriteLine(title);
            			foreach(object o in elements)
            			{
            				Console.WriteLine("\t" + o);
            				Console.WriteLine();
            			}    
            		}
            	}
            			</pre></p><p>
            			Although there are other kinds of sets available in the library, the example uses 
            			<code>SortedSet</code> throughout. This is nice for the example, since everything will 
            			print neatly in alphabetical order. But you may be wondering what kind of <code>Set</code>
            			is returned when you "union," "intersect," "exclusive-or," or "minus" two <code>Set</code>
            			instances. The library always returns a <code>Set</code> that is the same type as 
            			the <code>Set</code> on the left, unless the left operand is null, in which case it 
            			returns the type of the <code>Set</code> on the right.
            		</p><p>
            			Here is the output from running the example:
            			<pre class="code">
            All rivers:
            Arkansas River
            Colorado River
            Green River
            Missouri River
            Rio Grande
            Sacramento River
            San Juan River
            
            Rivers in the southwest:
            Arkansas River
            Colorado River
            Green River
            Rio Grande
            San Juan River
            
            Rivers in the west:
            Colorado River
            Sacramento River
            
            Rivers in the midwest:
            Arkansas River
            Missouri River
            
            Of all rivers, these don't pass through Colorado:
            Missouri River
            Sacramento River
            San Juan River
            
            Rivers common to both Colorado and Utah:
            Colorado River
            Green River
            
            Rivers in Colorado and Utah that are not shared by both states:
            Arkansas River
            Rio Grande
            San Juan River
            
            Rivers common to Arizona, California, Colorado, Nevada, and Utah:
            Colorado River
            
            The type of this complex operation is: 
            Iesi.Collections.SortedSet
            Press any key to continue
            			</pre></p><p><a href="Iesi.CollectionsHierarchy.html">Namespace hierarchy</a></p><h3 class="dtH3">Classes</h3><div class="tablediv"><table class="dtTABLE" cellspacing="0"><tr valign="top"><th width="50%">Class</th><th width="50%">Description</th></tr><tr valign="top"><td width="50%"><a href="Iesi.Collections.DictionarySet.html">DictionarySet</a></td><td width="50%"><p><code>DictionarySet</code> is an abstract class that supports the creation of new <code>Set</code>
            types where the underlying data store is an <code>IDictionary</code> instance.</p><p>You can use any object that implements the <code>IDictionary</code> interface to hold set data.
            You can define your own, or you can use one of the objects provided in the Framework.   
            The type of <code>IDictionary</code> you choose will affect both the performance and the behavior
            of the <code>Set</code> using it. </p><p>This object overrides the <code>Equals()</code> object method, but not the <code>GetHashCode()</code>, because 
            the <code>DictionarySet</code> class is mutable.  Therefore, it is not safe to use as a key value in a hash table.</p><p>To make a <code>Set</code> typed based on your own <code>IDictionary</code>, simply derive a
            new class with a constructor that takes no parameters.  Some <code>Set</code> implmentations
            cannot be defined with a default constructor.  If this is the case for your class, 
            you will need to override <code>Clone()</code> as well.</p><p>It is also standard practice that at least one of your constructors takes an <code>ICollection</code> or 
            an <code>ISet</code> as an argument.</p></td></tr><tr valign="top"><td width="50%"><a href="Iesi.Collections.HashedSet.html">HashedSet</a></td><td width="50%">
            Implements a <code>Set</code> based on a hash table.  This will give the best lookup, add, and remove
            performance for very large data-sets, but iteration will occur in no particular order.
            </td></tr><tr valign="top"><td width="50%"><a href="Iesi.Collections.HybridSet.html">HybridSet</a></td><td width="50%">
            Implements a <code>Set</code> that automatically changes from a list to a hash table
            when the size reaches a certain threshold.  This is good if you are unsure about
            whether you data-set will be tiny or huge.  Because this uses a dual implementation,
            iteration order is not guaranteed!
            </td></tr><tr valign="top"><td width="50%"><a href="Iesi.Collections.ImmutableSet.html">ImmutableSet</a></td><td width="50%"><p>Implements an immutable (read-only) <code>Set</code> wrapper.</p><p>Although this is advertised as immutable, it really isn't.  Anyone with access to the
            <code>basisSet</code> can still change the data-set.  So <code>GetHashCode()</code> is not implemented
            for this <code>Set</code>, as is the case for all <code>Set</code> implementations in this library.
            This design decision was based on the efficiency of not having to <code>Clone()</code> the 
            <code>basisSet</code> every time you wrap a mutable <code>Set</code>.</p></td></tr><tr valign="top"><td width="50%"><a href="Iesi.Collections.ListSet.html">ListSet</a></td><td width="50%">
            Implements a <code>Set</code> based on a list.  Performance is much better for very small lists 
            than either <code>HashedSet</code> or <code>SortedSet</code>.  However, performance degrades rapidly as 
            the data-set gets bigger.  Use a <code>HybridSet</code> instead if you are not sure your data-set
            will always remain very small.  Iteration produces elements in the order they were added.
            However, element order is not guaranteed to be maintained by the various <code>Set</code>
            mathematical operators.  
            </td></tr><tr valign="top"><td width="50%"><a href="Iesi.Collections.Set.html">Set</a></td><td width="50%"><p>A collection that contains no duplicate elements.  This class models the mathematical
            <code>Set</code> abstraction, and is the base class for all other <code>Set</code> implementations.  
            The order of elements in a set is dependant on (a)the data-structure implementation, and 
            (b)the implementation of the various <code>Set</code> methods, and thus is not guaranteed.</p><p><code>Set</code> overrides the <code>Equals()</code> method to test for "equivalency": whether the 
            two sets contain the same elements.  The "==" and "!=" operators are not overridden by 
            design, since it is often desirable to compare object references for equality.</p><p>Also, the <code>GetHashCode()</code> method is not implemented on any of the set implementations, since none
            of them are truely immutable.  This is by design, and it is the way almost all collections in 
            the .NET framework function.  So as a general rule, don't store collection objects inside <code>Set</code>
            instances.  You would typically want to use a keyed <code>IDictionary</code> instead.</p><p>None of the <code>Set</code> implementations in this library are guranteed to be thread-safe
            in any way unless wrapped in a <code>SynchronizedSet</code>.</p><p>The following table summarizes the binary operators that are supported by the <code>Set</code> class.</p><div class="tablediv"><table class="dtTABLE" cellspacing="0"><tr valign="top"><th width="50%">Operation</th><th width="50%">Description</th><th width="50%">Method</th><th width="50%">Operator</th></tr><tr valign="top"><td>Union (OR)</td><td>Element included in result if it exists in either <code>A</code> OR <code>B</code>.</td><td><code>Union()</code></td><td><code>|</code></td></tr><tr valign="top"><td>Intersection (AND)</td><td>Element included in result if it exists in both <code>A</code> AND <code>B</code>.</td><td><code>InterSect()</code></td><td><code>&amp;</code></td></tr><tr valign="top"><td>Exclusive Or (XOR)</td><td>Element included in result if it exists in one, but not both, of <code>A</code> and <code>B</code>.</td><td><code>ExclusiveOr()</code></td><td><code>^</code></td></tr><tr valign="top"><td>Minus (n/a)</td><td>Take all the elements in <code>A</code>.  Now, if any of them exist in <code>B</code>, remove
            		them.  Note that unlike the other operators, <code>A - B</code> is not the same as <code>B - A</code>.</td><td><code>Minus()</code></td><td><code>-</code></td></tr></table></div></td></tr><tr valign="top"><td width="50%"><a href="Iesi.Collections.SortedSet.html">SortedSet</a></td><td width="50%">
            Implements a <code>Set</code> based on a sorted tree.  This gives good performance for operations on very
            large data-sets, though not as good - asymptotically - as a <code>HashedSet</code>.  However, iteration
            occurs in order.  Elements that you put into this type of collection must implement <code>IComparable</code>,
            and they must actually be comparable.  You can't mix <code>string</code> and <code>int</code> values, for example.
            </td></tr><tr valign="top"><td width="50%"><a href="Iesi.Collections.SynchronizedSet.html">SynchronizedSet</a></td><td width="50%"><p>Implements a thread-safe <code>Set</code> wrapper.  The implementation is extremely conservative, 
            serializing critical sections to prevent possible deadlocks, and locking on everything.
            The one exception is for enumeration, which is inherently not thread-safe.  For this, you
            have to <code>lock</code> the <code>SyncRoot</code> object for the duration of the enumeration.</p></td></tr></table></div><h3 class="dtH3">Interfaces</h3><div class="tablediv"><table class="dtTABLE" cellspacing="0"><tr valign="top"><th width="50%">Interface</th><th width="50%">Description</th></tr><tr valign="top"><td width="50%"><a href="Iesi.Collections.ISet.html">ISet</a></td><td width="50%"><p>A collection that contains no duplicate elements.  This interface models the mathematical
            <code>Set</code> abstraction.    
            The order of elements in a set is dependant on (a)the data-structure implementation, and 
            (b)the implementation of the various <code>Set</code> methods, and thus is not guaranteed.</p><p><code>Set</code> overrides the <code>Equals()</code> method to test for "equivalency": whether the 
            two sets contain the same elements.  The "==" and "!=" operators are not overridden by 
            design, since it is often desirable to compare object references for equality.</p><p>Also, the <code>GetHashCode()</code> method is not implemented on any of the set implementations, since none
            of them are truely immutable.  This is by design, and it is the way almost all collections in 
            the .NET framework function.  So as a general rule, don't store collection objects inside <code>Set</code>
            instances.  You would typically want to use a keyed <code>IDictionary</code> instead.</p><p>None of the <code>Set</code> implementations in this library are guranteed to be thread-safe
            in any way unless wrapped in a <code>SynchronizedSet</code>.</p><p>The following table summarizes the binary operators that are supported by the <code>Set</code> class.</p><div class="tablediv"><table class="dtTABLE" cellspacing="0"><tr valign="top"><th width="50%">Operation</th><th width="50%">Description</th><th width="50%">Method</th></tr><tr valign="top"><td>Union (OR)</td><td>Element included in result if it exists in either <code>A</code> OR <code>B</code>.</td><td><code>Union()</code></td></tr><tr valign="top"><td>Intersection (AND)</td><td>Element included in result if it exists in both <code>A</code> AND <code>B</code>.</td><td><code>InterSect()</code></td></tr><tr valign="top"><td>Exclusive Or (XOR)</td><td>Element included in result if it exists in one, but not both, of <code>A</code> and <code>B</code>.</td><td><code>ExclusiveOr()</code></td></tr><tr valign="top"><td>Minus (n/a)</td><td>Take all the elements in <code>A</code>.  Now, if any of them exist in <code>B</code>, remove
            		them.  Note that unlike the other operators, <code>A - B</code> is not the same as <code>B - A</code>.</td><td><code>Minus()</code></td></tr></table></div></td></tr></table></div></div></body></html>
By viewing downloads associated with this article you agree to the Terms of Service and the article's licence.
If a file you wish to view isn't highlighted, and is a text file (not binary), please let us know and we'll add colourisation support for it.
License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)
Written By
Keith Barrett
Software Developer (Senior) Imagine Communications
United Kingdom
I have been working in IT since 1975, in various roles from junior programmer to system architect, and with many different languages and platforms. I have written shedloads of code.

I now live in Bedfordshire, England. As well as working full time I am the primary carer for my wife who has MS. I am learning to play the piano. I have three grown up children and a cat.
Set Collections for C#

License

Comments and Discussions