Click here to Skip to main content
Click here to Skip to main content

Add Support for "Set" Collections to .NET

By , 28 Mar 2004
 

Introduction

The System.Collections namespace in the .NET Framework provides a number of collection types that are extremely useful for manipulating data in memory. However, there is one type of collection that is conspicuously missing from System.Collections: the Set.

A Set is a collection that contains no duplicate elements. It is loosely modelled after the mathematical concept of a "set." This implementation is based on the Java Set interface definition, so if you are also a Java programmer, this may seem familiar. The major differences are that this library does not use interfaces, and it provides a number of "standard" Set operators that the Java library neglected to include.

Sets come in handy when an Array or a List won't quite fit the bill. Arrays in .NET have a fixed length, making it tedious to add and remove elements. Lists allow you add new objects easily, but you can have numerous duplicate elements, which is undesirable for some types of problems. Searching Arrays or Lists for elements is just plain slow for large data sets, requiring a linear search. You could keep the array sorted and use a binary search, but that is often more trouble than it is worth (especially since this library, and the .NET Framework, provide better ways that are already written for you).

With sets, adding elements, removing elements, and checking for the existence of an element is fast and simple. You can mix and match the elements in different sets using the supported mathematical set operators: union, intersection, exclusive-or, and minus. See the example below for more information.

You will see some interesting side effects with different Set implementations in this library, depending on the underlying search algorithm. For example, if you choose a sort-based Set, the elements will come out in sort order when you iterate using foreach. If you use a hash-based Set, the elements will come out in no particular order, but checking for inclusion will fastest when dealing with large data sets. If you use a list-based Set, elements will come out in the order you put them in when you iterate. Additionally, list-based sets are fastest for very small data sets (up to about 10 elements), but get slower very quickly as the number of contained elements increases. To get the best of both worlds, the library provides a Set type that uses lists for small data sets and switches to a hash-based algorithm when the data set gets large enough to warrant it.

Using the code

The Iesi.Collections library has the following object hierarchy:

  • Set - The abstract class from which all other sets inherit.
    • DictionarySet - An abstract class that lets you create new types of Set classes based on an arbitrary implementation of IDictionary.
      • HashedSet - Based on a HashTable.
      • SortedSet - Based on a SortedList.
      • ListSet - Based on a ListDictionary.
      • HybridSet - Based on a HybridDictionary.
    • ImmutableSet - A wrapper for other Sets that makes them read-only.
    • SynchronizedSet - A wrapper for other Sets that makes them (mostly) thread-safe.

You will probably find the HashedSet and HybridSet to be the most useful implementations. They can contain any object that is immutable, can be compared using Equals(), and has a valid implementation of GetHashCode(). All of the normal value types and many objects already meet these requirements, so for most data types, HashedSet and HybridSet just work. The only downside to using them is that you can't predict the order of iteration.

SortedSet is useful if you are interested in the iteration order of your Set, but it imposes some different requirements, and they are usually more difficult to meet. In addition to being immutable, elements in a SortedSet must also implement IComparable. Further, they must actually be comparable without throwing an exception. So you would not be able to put string values and int values into the same Set instance.

ListSet is useful for very small data sets. When a ListSet contains less than 10 elements, it is actually going to be faster than any of the other implementations. However, once you get above 10 elements, the run time for many of the set operations increases as the square of the data size. So an operation on a ListSet containing 1,000 elements would be roughly 10,000 times slower than an operation on a ListSet containing 10 items.

ImmutableSet and SynchronizedSet are specialized wrappers. They contain other sets, which then do all the real work. ImmutableSet wraps an internal Set to make it read-only. SynchronizedSet wraps all the functions of an internal Set to synchronize them. This allows the Set to be used by more than one thread. See the documentation for more information on this, since there are special considerations for enumerating collections that are in use by multiple threads.

If you are interested in creating your own Set types, this library supports that. If you want to do it from scratch, you can extend Set and implement all the abstract functions. If you want to create a new Set type based on an existing IDictionary implementation, extend DictionarySet. If you just want to add some new functionality to an existing, working Set implementation, choose one of HashedSet, HybridSet, ListSet, or SortedSet to extend.

The example below demonstrates creating new sets and manipulating them using set operators. The currently supported set operators are described briefly in the following table:

Set Operators

Name Symbol Description
Union() | A | B returns a Set containing all the elements from both input sets.
Intersect() & A & B returns a Set containing all the elements that are in both A and B.
ExclusiveOr() ^ A ^ B returns a Set containing the elements that are in A or in B, but are not in both A and B.
Minus() - A - B returns a Set containing all the elements of A, with the elements from B removed.
Equals()   You can use Equals() to compare two Set instances for equivalence. That is, A.Equals(B) is true if A and B contain exactly the same elements. The '==' and '!=' operators are not overridden by design.

The example uses Set instances to represent states in the southwestern United States. Each Set holds the names of the major rivers in the state. It then uses the basic state-river information to derive all sorts of fun facts about the rivers.

using System;
using Iesi.Collections;
namespace RiverDemo
{    
class Rivers
{
   [STAThread]
   static void Main(string[] args)
   {
     //Use Arrays (which are ICollection objects) to quickly initialize.
     Set arizona   
        = new SortedSet(new string[] {"Colorado River"});
     Set california
        = new SortedSet(new string[] {"Colorado River", "Sacramento River"});
     Set colorado
          = new SortedSet(new string[] {"Arkansas River", 
                      "Colorado River", "Green River", "Rio Grande"});
     Set kansas
            = new SortedSet(new string[] {"Arkansas River", "Missouri River"});
     Set nevada
            = new SortedSet(new string[] {"Colorado River"});
     Set newMexico
         = new SortedSet(new string[] {"Rio Grande"});
     Set utah
              = new SortedSet(new string[] {"Colorado River", "Green River", 
                              "San Juan River"});
     //Rivers by region.
     Set southWest = colorado | newMexico | arizona | utah;
     Set midWest = kansas;
     Set west = california | nevada;
     //All rivers (at least for the demo).
    Set all = southWest | midWest | west;
    Print("All rivers:", all);
    Print("Rivers in the southwest:", southWest);
    Print("Rivers in the west:", west);
    Print("Rivers in the midwest:", midWest);
    Console.WriteLine();

    //Use the '-' operator to subtract the rivers in Colorado from 
    //the set of all rivers.
    Print("Of all rivers, these don't pass through Colorado:",all - colorado);

   //Use the '&' operator to find rivers that are in Colorado AND in Utah.
   //A river must be present in both states, not just one.
   Print("Rivers common to both Colorado and Utah:", colorado & utah);

   //use the '^' operator to find rivers that are in Colorado OR Utah,
   //but not in both.
   Print("Rivers in Colorado and Utah that are not shared by both states:",
          colorado ^ utah);

   //Use the '&' operator to discover which rivers are present in Arizona, 
   // California,Colorado, Nevada, and Utah.  The river must be present in 
   // all states to be counted.
   Print("Rivers common to Arizona, California, Colorado, Nevada, and Utah:", 
         arizona & california & colorado & nevada & utah);
   //Just to prove to you that operators always return like types, let's do a
   //complex Set operation and look at the type of the result:
   Console.WriteLine("The type of this complex operation is: " + 
   ((southWest ^ colorado & california) | kansas).GetType().FullName);
}
    private static void Print(string title, Set elements)
    {
       Console.WriteLine(title);
       foreach(object o in elements)
       {
           Console.WriteLine("\t" + o);
           Console.WriteLine();
       }    
    }
}

Although there are other kinds of sets available in the library, the example uses SortedSet throughout. This is nice for the example, since everything will print neatly in alphabetical order. But you may be wondering what kind of Set is returned when you "union," "intersect," "exclusive-or," or "minus" two Set instances. The library always returns a Set that is the same type as the Set on the left, unless the left operand is null, in which case it returns the type of the Set on the right.

What this means is that since we are using SortedSet instances, we will always get SortedSet instances when we combine sets using the binary operators. So the output in our example will always be in alphabetical order, just as you would expect.

Here is the output from running the example:

All rivers:
	Arkansas River
	Colorado River
	Green River
	Missouri River
	Rio Grande
	Sacramento River
	San Juan River

Rivers in the southwest:
	Arkansas River
	Colorado River
	Green River
	Rio Grande
	San Juan River

Rivers in the west:
	Colorado River
	Sacramento River

Rivers in the midwest:
	Arkansas River
	Missouri River

Of all rivers, these don't pass through Colorado:
	Missouri River
	Sacramento River
	San Juan River

Rivers common to both Colorado and Utah:
	Colorado River
	Green River

Rivers in Colorado and Utah that are not shared by both states:
	Arkansas River
	Rio Grande
	San Juan River

Rivers common to Arizona, California, Colorado, Nevada, and Utah:
	Colorado River

The type of this complex operation is: 
	Iesi.Collections.SortedSet
Press any key to continue

There is an additional example and a lot more technical information included in the documentation. The source code is pretty easy to follow as well. All the hard work of searching and sorting is performed by classes that are already present in the .NET Framework, so none of the code is particularly difficult or tricky. If you have a question that is not covered by the documentation, it should not take more than 5 or 10 minutes to discover the answer by reading the code.

Enjoy!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

JasonSmith
Web Developer
United States United States
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 5memberfredatcodeproject6 Oct '12 - 11:11 
excellent
GeneralLicensing questionmemberthorsten.kraus31 Mar '11 - 22:04 
Hello,
 

I have a question regarding to the license of the Iesi.Collections. We need to know if we have to mention the usage of your component in our own product description or if we have to fulfill any other requirements regarding your licensing model. It would be very nice if you can give me information to this topic. Under which license does the project run?

 
Thanks in advance!
 
Best regards,
Thorsten
QuestionAny stats on perf for us?memberMember 420243522 Jun '08 - 9:06 
Good article, if you could more stats on perf it might help.
 
srhari
http://sriharik.blogspot.com
[^]
GeneralMinus is Complementmemberjmelgaard27 Mar '08 - 2:12 
A Minor detail, but (as far as i know and are told).
 
When operating with sets, the operation A - B (Minus) is gennerally refered to as Complement
 
Complement (set theory)
 
for further detail on set Relative complement and Absolute complement, its the relative Complement your operations refers to as far as i can tell Wink | ;)
 
Think it could be usefull for your documentation Smile | :)
NewsDepreciated by .NET framework 3.5's HashSet<t> Generic Class</t>memberJames Kolpack3 Feb '08 - 14:25 
If you're now using .NET 3.5, the HashSet class will give you high performance set operations.
GeneralUsing type argumentsmemberezln237 Jan '08 - 13:23 
I have converted the entire library to used type arguments (i.e. Set). If anyone wants to use that code, email me (christopher@ezln23.com)
GeneralDebugger Supportmemberalon.albert10 Sep '06 - 15:11 
I've been using your Set classes for while now. It's very well done has been very useful, Thanks for your work.
 
I've recently upgraded to VS2005 and added a nice feature to the Set source code to assist in debugging. The addition is quite simple:
 
In DictionarySet, add the following lines right before the class definition:
 
[DebuggerTypeProxy(typeof(DictionarySet.DictionarySetDebugView))]
[DebuggerDisplay("Count = {Count}")]
 
Then add the following inner class to DictionarySet:
 
private class DictionarySetDebugView {
private IDictionary dictonary;
 
public DictionarySetDebugView(DictionarySet dictionarySet) {
if (dictionarySet == null) {
throw new ArgumentNullException("dictionarySet");
}
 
dictonary = dictionarySet.InternalDictionary;
}
 
[DebuggerBrowsable(DebuggerBrowsableState.RootHidden)]
public object[] Items {
get {
object[] items = new object[dictonary.Count];
dictonary.Keys.CopyTo(items, 0);
return items;
}
}
}
 

This will result in the follwoing view in a debugger:
 
- s Count = 2 Iesi.Collections.ISet {Iesi.Collections.SortedSet}
[0] 1 int
[1] 2 int
+ Raw View
 
Where s in a SortedSet that I added the integer values 1 & 2.
 
You might need to add a few #ifdef's to backwards support VS2003, I didn't bother testing on that system.
 


QuestionLicensing termsmemberMatthias8312 Aug '06 - 3:54 
Hi Jason,
 
I was wondering if I could use this code in a project of my own. I'll leave all copyright messages in place.
I was wondering under which licensing terms I can do this? BSD? GPL? anything else?
I need to know because I have indicate this to my superiors.

 
-Matthias
AnswerRe: Licensing termsmemberJasonSmith12 Aug '06 - 6:34 
You may use this code in your own project. It wouldn't be very useful if you couldn't, now would it?
 
Honestly though, this is just unmaintained sample code as far as I am concerned. Sorry I put those copyright notices in there. You have my permission to remove them.
 
The only legal thing is that if you use this code, it is yours. Add it to your project and whittle away. You can change it in any way you like, you can maintain it, and if something is wrong with it, you can fix it. Smile | :)
 
In other words, there are no licensing terms. Give it away, sell it, repackage it, whatever. Even take credit for writing it if you need to. And that goes for anyone.
 
And if you really need licensing terms, suggest one here and I'll make that one the official one, or something like that. I know how "superiors" can be.
GeneralCan't use XmlSerializer on ISetmemberhuzzaboo10 Mar '06 - 11:45 
I'm using ISet in the 1.1 .NET Framework and with NHibernate.
 
I tried to use XmlSerializer to serialize an NHibernate object that had a collection of type ISet and received the following exception:
 
[InvalidOperationException: You must implement a default accessor on Iesi.Collections.ISet because it inherits from ICollection.]
System.Xml.Serialization.TypeScope.GetCollectionElementType(Type type) +580
System.Xml.Serialization.TypeScope.ImportTypeDesc(Type type, Boolean canBePrimitive, MemberInfo memberInfo) +549
System.Xml.Serialization.TypeScope.GetTypeDesc(Type type, MemberInfo source, Boolean directReference) +50
System.Xml.Serialization.StructModel.GetPropertyModel(PropertyInfo propertyInfo) +57
System.Xml.Serialization.StructModel.GetFieldModel(MemberInfo memberInfo) +88
System.Xml.Serialization.XmlReflectionImporter.ImportStructLikeMapping(StructModel model, String ns) +1534
System.Xml.Serialization.XmlReflectionImporter.ImportStructLikeMapping(StructModel model, String ns) +654
System.Xml.Serialization.XmlReflectionImporter.ImportTypeMapping(TypeModel model, String ns, ImportContext context, String dataType, Boolean repeats) +442
 
I'm not sure quite how to work around this. I did find this article which seems to describe how to properly build a collection class such that it can be serialized with XmlSerializer: http://www.topxml.com/xmlserializer/serializing_collection_classes.asp
 
Any hints on how to approach this?
 
Thanks
 

GeneralRe: Can't use XmlSerializer on ISetmemberjokiz9 Oct '06 - 15:58 
i had a similar requirement, seems like the ISet classes in Iesi namespace does not have indexer and the Add methods (strongly-typed). My teammate currently did a manual serialization per field just to make this work but i'm open for any suggestions on how to properly make this work.
 
(x-a)(x-b)(x-c)...(x-z)

GeneralRe: Can't use XmlSerializer on ISetmemberMember 78195128 Dec '08 - 23:17 
Hi,
 
I'm currently facing the same probleme.
Have you found a fix for this?
 

Thanks.
Christophe.
GeneralMaybe try the C5 collection library for .Net 2.0memberPeter Sestoft11 Feb '06 - 23:59 
C5 is a library of generic collection classes for C# and other CLI languages.
 
C5 provides functionality and data structures not provided by the standard .Net System.Collections.Generic namespace, such as sets and bags, persistent tree data structures, heap based priority queues, hash indexed array lists and linked lists, and events on collection changes. Also, it is more comprehensive than collection class libraries on other similar platforms, such as Java. Unlike many other collection class libraries, C5 is designed with a strict policy of supporting "code to interface not implementation".
 
C5 is available in binary and source form under a liberal BSD-style license
and is thoroughly documented in an accompanying free book.
 
See http://www.itu.dk/research/c5/
GeneralRe: Maybe try the C5 collection library for .Net 2.0memberKailuoWang14 Feb '06 - 15:00 
This one is the best as far as I know. I think NHibernate 2.0 should use this.
Generalbug foundmemberKailuoWang6 Feb '06 - 4:18 
In the set class, the method ExclusiveOr was writen as follow
 

public static ISet ExclusiveOr(ISet a, ISet b)
{
if(a == null && b == null)
return null;
else if(a == null)
return (Set)b.Clone();
else if(b == null)
return (Set)a.Clone();
else
return a.ExclusiveOr(b);
}
 
Note that the clones of a and b were uneccessarily down cast to Set, while in the Union method of this class, these two clones are down cast to ISet. Although this won't affect any current usage, but for people who wants to have their own implementations of ISet ( an example[^]), this could be a problem.
I did try modify the code to
 
return (ISet)b.Clone();
...
return (ISet)a.Clone();
 
And all the Unit Tests from Nhibernate were passed
 

GeneralGeneric ISet&lt;T&gt; postedmemberKailuoWang4 Feb '06 - 15:17 
Based on Jason and other's work, I just posted a geneic ISet and several implementations:
http://www.codeproject.com/useritems/GenericISet.asp[^]
It passed all the test. I will keep working on the implementation since I will rely on this collection very much.
 

-- modified at 21:23 Saturday 4th February, 2006
QuestionPlans for an ISet?memberNathan Baulch1 Feb '06 - 14:56 
Does anybody know of (or are there any plans to build) a generic extension to this cool collection library?
 
Just thought I'd ask before giving it a try myself (probably a good way to learn a few things about generics I guess).
 
Nathan
AnswerRe: Plans for an ISet?memberJasonSmith3 Feb '06 - 13:42 
It looks like there are several good ideas in recent posts. I'll be starting up some .NET stuff again soon, so maybe this would be a good project to familiarize myself with the new tools. Feel free to do it yourself though - the library is really pretty simple, and you'll probably be able to get to it before I will.
GeneralImmutable != ReadOnlymemberRüdiger Klaehn11 Jan '06 - 6:44 
Just a little remark: immutable is not the same as readonly.
 
An immutable object can not be modified after it has been created. A typical example would be the System.String type.
 
A readonly view of a mutable object just means that the owner of the view can not modify the underlying object. But it does not guarantee that the underlying object never changes. An example would be what you get out of ArrayList.ReadOnly(). The resulting object can not be used to modify the underyling ArrayList, but if the ArrayList itself changes, then so does the view/wrapper object.
 
So probably you should rename the class for clarity.
GeneralRe: Immutable != ReadOnlymemberJasonSmith3 Feb '06 - 13:44 
I see your point.
QuestionPossible bug: SortedSet uses IComparable for its Add/Removememberkenrod4 Dec '05 - 11:34 
First, congratulations on providing such an excellent and much needed set implementation! Now that this code is also shipped with NHibernate, I am sure many people will be finding it extremely useful.
 
I have a query about your SortedSet implementation. At present, it uses a SortedList as its backing store? However, I do not believe that, semantically, a SortedList is equivalent to a SortedSet?
 
Specially, as can be seen at http://www.dotnet247.com/247reference/System/Collections/SortedList/__rotor[^], SortedList uses IComparable and BinarySearch, not GetHashCode() and Equals(), to implement its Add, Contains, etc.
 
I have a class whose GetHashCode() and Equals() methods key off one property of the object, and whose CompareTo() keys off another. This is because my GetHashCode() and Equals() method use internal, unique identifiers useful to the code whereas my CompareTo() method uses human-readable, non-unique identifiers useful to people.
 
I believe this is a valid use case: I do not believe the contract of CompareTo() requires it use the same identifier as GetHashCode() and Equals()?
 
However, SortedSet breaks in this scenario because (I believe) it goes looking for an existing object using CompareTo() rather than by GetHashCode() and Equals(). This means it is possible to put multiple items into a SortedSet whose Equals() methods return true, which breaks the ISet contract (though of course is perfectly valid for a SortedList).
 
Have I gotten this completely wrong? Is there a better approach?
 
Regards,
 
Richard.
 

 
-- modified at 17:48 Sunday 4th December, 2005
AnswerRe: Possible bug: SortedSet uses IComparable for its Add/RemovememberJasonSmith5 Dec '05 - 15:09 
CompareTo(), Equals(), and GetHashCode() need to all key off the same value, or at least they need to provide consistent results. If you look at the object one way that is always unique, and another way that can return the same value, you have broken the contract. All three methods must provide consistent behavior - otherwise you can't expect any code that uses your object to behave correctly.
 
You could get around this by getting rid of CompareTo(). Put the objects into a HashSet. When you need to iterate in order, put the objects into a Dictionary based on the descriptive string. Does that solve your immediate needs?
 

 

AnswerRe: Possible bug: SortedSet uses IComparable for its Add/RemovememberFábio Batista7 Jan '06 - 10:28 
A nice solution for this case is to support a comparator in a separate class, implementing the IComparator interface.
 
System.Collections.SortedList have a constructor that takes an IComparator. It should be easy to add to Iesi.Collections.
 
Fábio David Batista
blog: http://nerd-o-matic.blogspot.com[^]
company: http://suprifattus.com.br/[^]
GeneralMy Five for you.memberGaurang Desai27 Jun '05 - 18:12 
Smile | :) . You made my job simple.
GeneralDynamically creating a set objectmemberplunkman19 Apr '05 - 9:33 
First off - great code! It opens many new coding approaches.
 
On to my question: I am putting together a security component that integrates your code into an XML based "expressionizor" that will pass in a session/context item reference. This session/context item is processed out a string. From this string I need the ability to create a new Set using this string variable.
 
Example:
***************************
 
string test ="counties";
 
Set test.String() = new SortedSet;
//need the code to interpret as this
//Set counties = new SortedSet;

Needless to say it throws a casting object error. This is the last issue - all other code works perfectly.
 
Any ideas????
 
By the way I add a Values property to return a comma delimited list to easily get a string back for usage. You might think about officially putting that type of property in the class.

 
Ken

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130516.1 | Last Updated 29 Mar 2004
Article Copyright 2002 by JasonSmith
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid