Introduction
The System.Collections namespace in the .NET Framework provides a number of collection types that are extremely useful for manipulating data in memory. However, there is one type of collection that is conspicuously missing from System.Collections: the Set.
A Set is a collection that contains no duplicate elements. It is loosely modelled after the mathematical concept of a "set." This implementation is based on the Java Set interface definition, so if you are also a Java programmer, this may seem familiar. The major differences are that this library does not use interfaces, and it provides a number of "standard" Set operators that the Java library neglected to include.
Sets come in handy when an Array or a List won't quite fit the bill. Arrays in .NET have a fixed length, making it tedious to add and remove elements. Lists allow you add new objects easily, but you can have numerous duplicate elements, which is undesirable for some types of problems. Searching Arrays or Lists for elements is just plain slow for large data sets, requiring a linear search. You could keep the array sorted and use a binary search, but that is often more trouble than it is worth (especially since this library, and the .NET Framework, provide better ways that are already written for you).
With sets, adding elements, removing elements, and checking for the existence of an element is fast and simple. You can mix and match the elements in different sets using the supported mathematical set operators: union, intersection, exclusive-or, and minus. See the example below for more information.
You will see some interesting side effects with different Set implementations in this library, depending on the underlying search algorithm. For example, if you choose a sort-based Set, the elements will come out in sort order when you iterate using foreach. If you use a hash-based Set, the elements will come out in no particular order, but checking for inclusion will fastest when dealing with large data sets. If you use a list-based Set, elements will come out in the order you put them in when you iterate. Additionally, list-based sets are fastest for very small data sets (up to about 10 elements), but get slower very quickly as the number of contained elements increases. To get the best of both worlds, the library provides a Set type that uses lists for small data sets and switches to a hash-based algorithm when the data set gets large enough to warrant it.
Using the code
The Iesi.Collections library has the following object hierarchy:
Set - The abstract class from which all other sets inherit.
DictionarySet - An abstract class that lets you create new types of Set classes based on an arbitrary implementation of IDictionary.
HashedSet - Based on a HashTable.
SortedSet - Based on a SortedList.
ListSet - Based on a ListDictionary.
HybridSet - Based on a HybridDictionary.
ImmutableSet - A wrapper for other Sets that makes them read-only.
SynchronizedSet - A wrapper for other Sets that makes them (mostly) thread-safe.
You will probably find the HashedSet and HybridSet to be the most useful implementations. They can contain any object that is immutable, can be compared using Equals(), and has a valid implementation of GetHashCode(). All of the normal value types and many objects already meet these requirements, so for most data types, HashedSet and HybridSet just work. The only downside to using them is that you can't predict the order of iteration.
SortedSet is useful if you are interested in the iteration order of your Set, but it imposes some different requirements, and they are usually more difficult to meet. In addition to being immutable, elements in a SortedSet must also implement IComparable. Further, they must actually be comparable without throwing an exception. So you would not be able to put string values and int values into the same Set instance.
ListSet is useful for very small data sets. When a ListSet contains less than 10 elements, it is actually going to be faster than any of the other implementations. However, once you get above 10 elements, the run time for many of the set operations increases as the square of the data size. So an operation on a ListSet containing 1,000 elements would be roughly 10,000 times slower than an operation on a ListSet containing 10 items.
ImmutableSet and SynchronizedSet are specialized wrappers. They contain other sets, which then do all the real work. ImmutableSet wraps an internal Set to make it read-only. SynchronizedSet wraps all the functions of an internal Set to synchronize them. This allows the Set to be used by more than one thread. See the documentation for more information on this, since there are special considerations for enumerating collections that are in use by multiple threads.
If you are interested in creating your own Set types, this library supports that. If you want to do it from scratch, you can extend Set and implement all the abstract functions. If you want to create a new Set type based on an existing IDictionary implementation, extend DictionarySet. If you just want to add some new functionality to an existing, working Set implementation, choose one of HashedSet, HybridSet, ListSet, or SortedSet to extend.
The example below demonstrates creating new sets and manipulating them using set operators. The currently supported set operators are described briefly in the following table:
Set Operators
| Name |
Symbol |
Description |
Union() |
| |
A | B returns a Set containing all the elements from both input sets. |
Intersect() |
& |
A & B returns a Set containing all the elements that are in both A and B. |
ExclusiveOr() |
^ |
A ^ B returns a Set containing the elements that are in A or in B, but are not in both A and B. |
Minus() |
- |
A - B returns a Set containing all the elements of A, with the elements from B removed. |
Equals() |
|
You can use Equals() to compare two Set instances for equivalence. That is, A.Equals(B) is true if A and B contain exactly the same elements. The '==' and '!=' operators are not overridden by design. |
The example uses Set instances to represent states in the southwestern United States. Each Set holds the names of the major rivers in the state. It then uses the basic state-river information to derive all sorts of fun facts about the rivers.
using System;
using Iesi.Collections;
namespace RiverDemo
{
class Rivers
{
[STAThread]
static void Main(string[] args)
{
Set arizona
= new SortedSet(new string[] {"Colorado River"});
Set california
= new SortedSet(new string[] {"Colorado River", "Sacramento River"});
Set colorado
= new SortedSet(new string[] {"Arkansas River",
"Colorado River", "Green River", "Rio Grande"});
Set kansas
= new SortedSet(new string[] {"Arkansas River", "Missouri River"});
Set nevada
= new SortedSet(new string[] {"Colorado River"});
Set newMexico
= new SortedSet(new string[] {"Rio Grande"});
Set utah
= new SortedSet(new string[] {"Colorado River", "Green River",
"San Juan River"});
Set southWest = colorado | newMexico | arizona | utah;
Set midWest = kansas;
Set west = california | nevada;
Set all = southWest | midWest | west;
Print("All rivers:", all);
Print("Rivers in the southwest:", southWest);
Print("Rivers in the west:", west);
Print("Rivers in the midwest:", midWest);
Console.WriteLine();
Print("Of all rivers, these don't pass through Colorado:",all - colorado);
Print("Rivers common to both Colorado and Utah:", colorado & utah);
Print("Rivers in Colorado and Utah that are not shared by both states:",
colorado ^ utah);
Print("Rivers common to Arizona, California, Colorado, Nevada, and Utah:",
arizona & california & colorado & nevada & utah);
Console.WriteLine("The type of this complex operation is: " +
((southWest ^ colorado & california) | kansas).GetType().FullName);
}
private static void Print(string title, Set elements)
{
Console.WriteLine(title);
foreach(object o in elements)
{
Console.WriteLine("\t" + o);
Console.WriteLine();
}
}
}
Although there are other kinds of sets available in the library, the example uses SortedSet throughout. This is nice for the example, since everything will print neatly in alphabetical order. But you may be wondering what kind of Set is returned when you "union," "intersect," "exclusive-or," or "minus" two Set instances. The library always returns a Set that is the same type as the Set on the left, unless the left operand is null, in which case it returns the type of the Set on the right.
What this means is that since we are using SortedSet instances, we will always get SortedSet instances when we combine sets using the binary operators. So the output in our example will always be in alphabetical order, just as you would expect.
Here is the output from running the example:
All rivers:
Arkansas River
Colorado River
Green River
Missouri River
Rio Grande
Sacramento River
San Juan River
Rivers in the southwest:
Arkansas River
Colorado River
Green River
Rio Grande
San Juan River
Rivers in the west:
Colorado River
Sacramento River
Rivers in the midwest:
Arkansas River
Missouri River
Of all rivers, these don't pass through Colorado:
Missouri River
Sacramento River
San Juan River
Rivers common to both Colorado and Utah:
Colorado River
Green River
Rivers in Colorado and Utah that are not shared by both states:
Arkansas River
Rio Grande
San Juan River
Rivers common to Arizona, California, Colorado, Nevada, and Utah:
Colorado River
The type of this complex operation is:
Iesi.Collections.SortedSet
Press any key to continue
There is an additional example and a lot more technical information included in the documentation. The source code is pretty easy to follow as well. All the hard work of searching and sorting is performed by classes that are already present in the .NET Framework, so none of the code is particularly difficult or tricky. If you have a question that is not covered by the documentation, it should not take more than 5 or 10 minutes to discover the answer by reading the code.
Enjoy!
|
|
 |
 | Any stats on perf for us? Member 4202435 | 10:06 22 Jun '08 |
|
|
 |
 | Minus is Complement jmelgaard | 3:12 27 Mar '08 |
|
 |
A Minor detail, but (as far as i know and are told).
When operating with sets, the operation A - B (Minus) is gennerally refered to as Complement
Complement (set theory)
for further detail on set Relative complement and Absolute complement, its the relative Complement your operations refers to as far as i can tell
Think it could be usefull for your documentation
|
|
|
|
 |
 | Depreciated by .NET framework 3.5's HashSet Generic Class James Kolpack | 15:25 3 Feb '08 |
|
 |
If you're now using .NET 3.5, the HashSet class will give you high performance set operations.
|
|
|
|
 |
 | Using type arguments ezln23 | 14:23 7 Jan '08 |
|
 |
I have converted the entire library to used type arguments (i.e. Set). If anyone wants to use that code, email me (christopher@ezln23.com)
|
|
|
|
 |
 | Debugger Support alon.albert | 16:11 10 Sep '06 |
|
 |
I've been using your Set classes for while now. It's very well done has been very useful, Thanks for your work.
I've recently upgraded to VS2005 and added a nice feature to the Set source code to assist in debugging. The addition is quite simple:
In DictionarySet, add the following lines right before the class definition:
[DebuggerTypeProxy(typeof(DictionarySet.DictionarySetDebugView))] [DebuggerDisplay("Count = {Count}")]
Then add the following inner class to DictionarySet:
private class DictionarySetDebugView { private IDictionary dictonary;
public DictionarySetDebugView(DictionarySet dictionarySet) { if (dictionarySet == null) { throw new ArgumentNullException("dictionarySet"); }
dictonary = dictionarySet.InternalDictionary; }
[DebuggerBrowsable(DebuggerBrowsableState.RootHidden)] public object[] Items { get { object[] items = new object[dictonary.Count]; dictonary.Keys.CopyTo(items, 0); return items; } } }
This will result in the follwoing view in a debugger:
- s Count = 2 Iesi.Collections.ISet {Iesi.Collections.SortedSet} [0] 1 int [1] 2 int + Raw View
Where s in a SortedSet that I added the integer values 1 & 2.
You might need to add a few #ifdef's to backwards support VS2003, I didn't bother testing on that system.
|
|
|
|
 |
 | Licensing terms Matthias83 | 4:54 12 Aug '06 |
|
 |
Hi Jason,
I was wondering if I could use this code in a project of my own. I'll leave all copyright messages in place. I was wondering under which licensing terms I can do this? BSD? GPL? anything else? I need to know because I have indicate this to my superiors.
-Matthias
|
|
|
|
 |
|
 |
You may use this code in your own project. It wouldn't be very useful if you couldn't, now would it?
Honestly though, this is just unmaintained sample code as far as I am concerned. Sorry I put those copyright notices in there. You have my permission to remove them.
The only legal thing is that if you use this code, it is yours. Add it to your project and whittle away. You can change it in any way you like, you can maintain it, and if something is wrong with it, you can fix it. 
In other words, there are no licensing terms. Give it away, sell it, repackage it, whatever. Even take credit for writing it if you need to. And that goes for anyone.
And if you really need licensing terms, suggest one here and I'll make that one the official one, or something like that. I know how "superiors" can be.
|
|
|
|
 |
 | Can't use XmlSerializer on ISet huzzaboo | 12:45 10 Mar '06 |
|
 |
I'm using ISet in the 1.1 .NET Framework and with NHibernate.
I tried to use XmlSerializer to serialize an NHibernate object that had a collection of type ISet and received the following exception:
[InvalidOperationException: You must implement a default accessor on Iesi.Collections.ISet because it inherits from ICollection.] System.Xml.Serialization.TypeScope.GetCollectionElementType(Type type) +580 System.Xml.Serialization.TypeScope.ImportTypeDesc(Type type, Boolean canBePrimitive, MemberInfo memberInfo) +549 System.Xml.Serialization.TypeScope.GetTypeDesc(Type type, MemberInfo source, Boolean directReference) +50 System.Xml.Serialization.StructModel.GetPropertyModel(PropertyInfo propertyInfo) +57 System.Xml.Serialization.StructModel.GetFieldModel(MemberInfo memberInfo) +88 System.Xml.Serialization.XmlReflectionImporter.ImportStructLikeMapping(StructModel model, String ns) +1534 System.Xml.Serialization.XmlReflectionImporter.ImportStructLikeMapping(StructModel model, String ns) +654 System.Xml.Serialization.XmlReflectionImporter.ImportTypeMapping(TypeModel model, String ns, ImportContext context, String dataType, Boolean repeats) +442
I'm not sure quite how to work around this. I did find this article which seems to describe how to properly build a collection class such that it can be serialized with XmlSerializer: http://www.topxml.com/xmlserializer/serializing_collection_classes.asp
Any hints on how to approach this?
Thanks
|
|
|
|
 |
|
 |
i had a similar requirement, seems like the ISet classes in Iesi namespace does not have indexer and the Add methods (strongly-typed). My teammate currently did a manual serialization per field just to make this work but i'm open for any suggestions on how to properly make this work.
(x-a)(x-b)(x-c)...(x-z)
|
|
|
|
 |
|
 |
Hi,
I'm currently facing the same probleme. Have you found a fix for this?
Thanks. Christophe.
|
|
|
|
 |
 | Maybe try the C5 collection library for .Net 2.0 Peter Sestoft | 0:59 12 Feb '06 |
|
 |
C5 is a library of generic collection classes for C# and other CLI languages.
C5 provides functionality and data structures not provided by the standard .Net System.Collections.Generic namespace, such as sets and bags, persistent tree data structures, heap based priority queues, hash indexed array lists and linked lists, and events on collection changes. Also, it is more comprehensive than collection class libraries on other similar platforms, such as Java. Unlike many other collection class libraries, C5 is designed with a strict policy of supporting "code to interface not implementation".
C5 is available in binary and source form under a liberal BSD-style license and is thoroughly documented in an accompanying free book.
See http://www.itu.dk/research/c5/
|
|
|
|
 |
|
 |
This one is the best as far as I know. I think NHibernate 2.0 should use this.
|
|
|
|
 |
 | bug found KailuoWang | 5:18 6 Feb '06 |
|
 |
In the set class, the method ExclusiveOr was writen as follow
public static ISet ExclusiveOr(ISet a, ISet b) { if(a == null && b == null) return null; else if(a == null) return (Set)b.Clone(); else if(b == null) return (Set)a.Clone(); else return a.ExclusiveOr(b); }
Note that the clones of a and b were uneccessarily down cast to Set, while in the Union method of this class, these two clones are down cast to ISet. Although this won't affect any current usage, but for people who wants to have their own implementations of ISet ( an example[^]), this could be a problem. I did try modify the code to
return (ISet)b.Clone(); ... return (ISet)a.Clone();
And all the Unit Tests from Nhibernate were passed
|
|
|
|
 |
 | Generic ISet<T> posted KailuoWang | 16:17 4 Feb '06 |
|
 |
Based on Jason and other's work, I just posted a geneic ISet and several implementations: http://www.codeproject.com/useritems/GenericISet.asp[^] It passed all the test. I will keep working on the implementation since I will rely on this collection very much.
-- modified at 21:23 Saturday 4th February, 2006
|
|
|
|
 |
 | Plans for an ISet? Nathan Baulch | 15:56 1 Feb '06 |
|
 |
Does anybody know of (or are there any plans to build) a generic extension to this cool collection library?
Just thought I'd ask before giving it a try myself (probably a good way to learn a few things about generics I guess).
Nathan
|
|
|
|
 |
|
 |
It looks like there are several good ideas in recent posts. I'll be starting up some .NET stuff again soon, so maybe this would be a good project to familiarize myself with the new tools. Feel free to do it yourself though - the library is really pretty simple, and you'll probably be able to get to it before I will.
|
|
|
|
 |
 | Immutable != ReadOnly Rüdiger Klaehn | 7:44 11 Jan '06 |
|
 |
Just a little remark: immutable is not the same as readonly.
An immutable object can not be modified after it has been created. A typical example would be the System.String type.
A readonly view of a mutable object just means that the owner of the view can not modify the underlying object. But it does not guarantee that the underlying object never changes. An example would be what you get out of ArrayList.ReadOnly(). The resulting object can not be used to modify the underyling ArrayList, but if the ArrayList itself changes, then so does the view/wrapper object.
So probably you should rename the class for clarity.
|
|
|
|
 |
|
|
 |
 | Possible bug: SortedSet uses IComparable for its Add/Remove kenrod | 12:34 4 Dec '05 |
|
 |
First, congratulations on providing such an excellent and much needed set implementation! Now that this code is also shipped with NHibernate, I am sure many people will be finding it extremely useful.
I have a query about your SortedSet implementation. At present, it uses a SortedList as its backing store? However, I do not believe that, semantically, a SortedList is equivalent to a SortedSet?
Specially, as can be seen at http://www.dotnet247.com/247reference/System/Collections/SortedList/__rotor[^], SortedList uses IComparable and BinarySearch, not GetHashCode() and Equals(), to implement its Add, Contains, etc.
I have a class whose GetHashCode() and Equals() methods key off one property of the object, and whose CompareTo() keys off another. This is because my GetHashCode() and Equals() method use internal, unique identifiers useful to the code whereas my CompareTo() method uses human-readable, non-unique identifiers useful to people.
I believe this is a valid use case: I do not believe the contract of CompareTo() requires it use the same identifier as GetHashCode() and Equals()?
However, SortedSet breaks in this scenario because (I believe) it goes looking for an existing object using CompareTo() rather than by GetHashCode() and Equals(). This means it is possible to put multiple items into a SortedSet whose Equals() methods return true, which breaks the ISet contract (though of course is perfectly valid for a SortedList).
Have I gotten this completely wrong? Is there a better approach?
Regards,
Richard.
-- modified at 17:48 Sunday 4th December, 2005
|
|
|
|
 |
|
 |
CompareTo(), Equals(), and GetHashCode() need to all key off the same value, or at least they need to provide consistent results. If you look at the object one way that is always unique, and another way that can return the same value, you have broken the contract. All three methods must provide consistent behavior - otherwise you can't expect any code that uses your object to behave correctly.
You could get around this by getting rid of CompareTo(). Put the objects into a HashSet. When you need to iterate in order, put the objects into a Dictionary based on the descriptive string. Does that solve your immediate needs?
|
|
|
|
 |
|
 |
A nice solution for this case is to support a comparator in a separate class, implementing the IComparator interface.
System.Collections.SortedList have a constructor that takes an IComparator. It should be easy to add to Iesi.Collections.
Fábio David Batista blog: http://nerd-o-matic.blogspot.com[^] company: http://suprifattus.com.br/[^]
|
|
|
|
 |
 | My Five for you. Gaurang Desai | 19:12 27 Jun '05 |
|
 |
. You made my job simple.
|
|
|
|
 |
 | Dynamically creating a set object plunkman | 10:33 19 Apr '05 |
|
 |
First off - great code! It opens many new coding approaches.
On to my question: I am putting together a security component that integrates your code into an XML based "expressionizor" that will pass in a session/context item reference. This session/context item is processed out a string. From this string I need the ability to create a new Set using this string variable.
Example: ***************************
string test ="counties";
Set test.String() = new SortedSet; //need the code to interpret as this //Set counties = new SortedSet; Needless to say it throws a casting object error. This is the last issue - all other code works perfectly.
Any ideas????
By the way I add a Values property to return a comma delimited list to easily get a string back for usage. You might think about officially putting that type of property in the class.
Ken
|
|
|
|
 |
|
 |
A string isn't a Set, though an array of Strings is a collection. You can initialize a Set using a collection (or array) of string values.
Since Sets can work with any type of comparable object, it would be rather strange to put methods on Sets for dealing specifically with strings. Also, there are numerous ways to deal with serialization of data: XML, plain text, comma separated, etc.
You may actually want to try some of the .NET serialization stuff as an alternative to rolling your own code. It isn't hard to parse comma separated values, but be sure to deal with the case where your data has a comma in it! 
Good luck!
|
|
|
|
 |
|
 |
Thanks for the feedback. I am already applying some of your suggestion deeper in the code.
The issue is the "reference name" of Set. I need to able to create the reference name dynamically.
I am parsing out security xml that has the necessary expression to be applied for a role. A subset example of such is: exp="Set([accesscounties])^Set([datacounty])".
I pull out the necessary equation, parse it, fill the set with the necessary county data (using a collection0, evaluate, and apply necessary security rendering based upon the expression result.
Everything works perfect if I hard code the Set creation (e.g., Set accesscounties = new SortedSet), but all the security rules are created via an interface and the there is no way of knowing the necessary rule(s) at design time.
Hence the issue is being able to create/substitute the "Set" based upon the value in the [] (e.g., Set [name] = new SortedSet).
Any additional help would be great.
Ken
|
|
|
|
 |
|
|