Click here to Skip to main content
Click here to Skip to main content

What's in Your Collection? Part 3 of 3: Custom Collections

, 22 Sep 2009 CPOL
Rate this:
Please Sign up or sign in to vote.
Understand collections, iterators, and the use of the yield statement to create powerful, custom collections in C#.

Introduction

This is the last installment in a three part series about using collections in C#.

The entire series can be accessed here:

We've covered the interfaces and some concrete instances of collections provided by the .NET Framework. Now you are interested in moving things to the next level. What if the provided collections simply don't meet your business requirements? What are some ways you can use the collections concept to build your own classes to solve business problems?

Yield to Iterators

The first important thing to understand when you begin building your custom collection is the concept of iterators in .NET and the yield statement. I'm surprised that many people use the language without truly understanding this statement, why it exists, and how it can be used.

You might have encountered yield in your journeys. If you've built custom AJAX client controls, you have probably implemented IScriptControl. One method asks for IEnumerable<ScriptReference>. The implementation is usually presented as:

...
ScriptReference sr = new ScriptReference("~/MyUserControl.js");
yield return sr;
...

You could alternatively have created a List or any other collection of ScriptReference and returned that. What does yield really do for us?

To better understand, I've created a short little console application. You can create a new console project and simply paste this code to build and run:

using System;
using System.Collections;
using System.Collections.Generic;

namespace Yield
{
    internal class Program
    {
        private delegate bool DoSomething();

        private sealed class Doer
        {
            private readonly DoSomething _doSomething;
            private readonly string _msg;

            public Doer(DoSomething doSomething, string message)
            {
                _doSomething = doSomething;
                _msg = message;
                Console.WriteLine(string.Format("{0}: Ctor()", _msg));
            }

            public bool Do()
            {
                Console.WriteLine(string.Format("{0}: Do()", _msg));
                return _doSomething();
            }
        }

        private sealed class DoerCollection : IEnumerable<Doer>
        {
            public IEnumerator<Doer> GetEnumerator()
            {
                yield return new Doer(() => true, "1");
                yield return new Doer(() => false, "2");
                yield return new Doer(() => true, "3");
                yield break;
            }

            IEnumerator IEnumerable.GetEnumerator()
            {
                return GetEnumerator();
            }
        }

        private static void _DoIt(IEnumerable<Doer> doerCollection)
        {
            foreach (Doer doer in doerCollection)
            {
                if (!doer.Do())
                {
                    break;
                }
                Console.WriteLine(".");                
            }
            Console.WriteLine("..");                
        }

        private static void Main(string[] args)
        {
            _DoIt(new DoerCollection());
            _DoIt(new List<Doer>
                      {
                          new Doer(() => true, "4"),
                          new Doer(() => false, "5"),
                          new Doer(() => true, "6")
                      });

            Console.ReadLine();
        }
    }
}

So, let's walk through the code.

First, I define a delegate called DoSomething that simply states "I want a method that takes no parameters and returns a boolean". This is a contrived example, of course, but in the "real world", you may have a pipeline or chain of responsibility that performs actions and then returns a status indicating that the process should continue or there is another node to consider, etc. I encapsulated the delegate in the class Doer. The constructor takes a "message" and an implementation of the delegate. The only reason I pass in the message is to track which object is doing what. What's important here is to see when the classes are created compared to when the main method is called, which simply invokes the delegate.

Next, I created my custom collection, DoerCollection. This is a collection of "activities" to perform. Obviously, I am simply returning true or false in the example, but again, in a real-world scenario, this could be a file system processor that iterates through a directory and returns files until no more can be found, or calls a Web Service and returns the status ... you get the idea. Notice that I simply yield return different instances of Doer that I pass the delegate implementation and a unique message identifier. If you recall from the first article in this series, this class is a collection because it implements IEnumerable.

The DoIt method takes any collection typed to the Doer class, and loops through the classes calling their "Do" method until false is returned. It also emits some output just to demonstrate how it is looping, etc.

Finally, we get to the implementation. The whole point of this example is to demonstrate how the yield command operates. We perform the exact same function on two very similar collections. The first pass uses an instance of my custom collection. The second pass creates a list and passes that into the method. What do you expect the output to look like? Compile the program and run it, and if you guessed correctly, you have a strong grasp of IEnumerator and yield.

Both collections were wired to contain three instances. Both had an instance return true, then false, then true, so the expected result would be to make it through two items and then break out of our loop. This is exactly what happens, but it's the output that is interesting. It turns out that using the List forced me to create everything up front, even if I weren't going to use them (and who knows when that garbage collector will come by). The custom class using yield however only created two instances. The third class was never created!

yield is nothing more than syntactic sugar for a state engine.

But wait, we are simply spinning through a collection. What do I mean by state engine??

If you recall in part 1, the key to collections is the Enumerator. An Enumerator is a state engine. The "current state" represents either nothing (empty collection or already iterated through the entire collection) or an instance within the collection. The only transition this state engine can make is to move to the next item or end up in an uninitialized state.

This is what the program outputs to the console:

yieldil.png

Now, we'll pull out ildasm to peek beneath the hood. I've highlighted the DoerCollection class.

yieldoutput.png

You'll notice that the GetEnumerator implementation actually creates a nested class behind the scenes. That class is our state engine. In red, you can see the key pieces of that engine: a state, a current Doer instance, and the reference to the parent class. Highlighted is the key method called to transition state, MoveNext.

What is really interesting is pulling open the MoveNext method. I've used RedGate's free Reflector tool to reverse engineer the code. This will take the generated IL and provide a C# representation, so we can see what the actual underlying algorithm for the enumerator is.

private bool MoveNext()
{
    switch (this.<>1__state)
    {
        case 0:
            this.<>1__state = -1;
            if (Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegatea == null)
            {
                Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegatea = 
                   new Program.DoSomething(Program.DoerCollection.<GetEnumerator>b__7);
            }
            this.<>2__current = new Program.Doer(
              Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegatea, "1");
            this.<>1__state = 1;
            return true;

        case 1:
            this.<>1__state = -1;
            if (Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegateb == null)
            {
                Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegateb = 
                  new Program.DoSomething(Program.DoerCollection.<GetEnumerator>b__8);
            }
            this.<>2__current = new Program.Doer(
              Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegateb, "2");
            this.<>1__state = 2;
            return true;

        case 2:
            this.<>1__state = -1;
            if (Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegatec == null)
            {
                Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegatec = 
                  new Program.DoSomething(Program.DoerCollection.<GetEnumerator>b__9);
            }
            this.<>2__current = new Program.Doer(
              Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegatec, "3");
            this.<>1__state = 3;
            return true;

        case 3:
            this.<>1__state = -1;
            break;
    }
    return false;
}

You can quickly see that what is generated is really a massive switch statement. Based on the current state, it updates the current reference and changes the state. Most important, however, is the fact that the results of the yield are executed "on demand". In other words, it is not creating a large list, filling it with instances, and then iterating. Instead, the classes are instantiated "on demand" and then referenced for re-use later in case the collection is iterated again.

The whole key to this process is that the enumerator hides the underlying implementation. The consuming code simply knows there is a collection to iterate through. How that collection is built is up to the enumerator, which leads to very interesting possibilities. In the case of the ASP.NET page, this means that controls can be called iteratively and yield their script references and descriptors. The "master" code is simply iterating through the collection and wiring up the script references.

Thinking of collections as different ways of grouping objects is certainly valuable and can pertain to many different business situations. Understanding that Enumerator is really a state machine, however, allows you to start thinking of collections as processes. They aren't necessarily pools of instances, but can be algorithms or other processes as well. The key is that the use of the Enumerator hides the implementation so that the consuming code simply iterates through something without having to understand the underlying implementation of how something is provided.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Jeremy Likness
Architect Wintellect
United States United States
Jeremy Likness is a principal consultant at Wintellect. Jeremy, an experienced entrepreneur and technology executive, has successfully helped ship commercial enterprise software for 20 years. He specializes in catalyzing growth, developing ideas and creating value through delivering software in technical enterprises. His roles as business owner, technology executive and hands-on developer provided unique opportunities to directly impact the bottom line of multiple businesses by helping them grow and increase their organizational capacity while improving operational efficiency. He has worked with several initially small companies like Manhattan Associates and AirWatch before they grew large and experienced their transition from good to great while helping direct vision and strategy to embrace changing technology and markets. Jeremy is capable of quickly adapting to new paradigms and helps technology teams endure change by providing strong leadership, working with team members “in the trenches” and mentoring them in the soft skills that are key for engineers to bridge the gap between business and technology.
Follow on   Twitter   Google+   LinkedIn

Comments and Discussions

 
GeneralYied and IEnumerable PinmemberJim Savarino29-Sep-09 6:25 
GeneralYield and Marshalling Pinmemberbaruchl28-Sep-09 23:39 
GeneralRe: Yield and Marshalling PinmemberJeremy Likness29-Sep-09 2:58 
Busi,
 
Yield is a way to understand the mechanism of enumerator but is not required. You have to look at the situation. When it makes sense to populate values on the fly or to merge values into a stream of results, yield is one way to do it. In your example, if we were truly encapsulating the logic, there would be a few approaches possible. You could do something similar to the IDataReader and keep the connection open and feed rows, but make the assumption the consumer is going to spin through quickly and then call Dispose (anything with resources managed as such would probably implement IDisposable). If you assume disconnected reads then you might open and close the connection and if you had a limited record size you might load everything into an internal list and expose the enumerator. There is hardly ever one single solution so its a question of what the purpose of your collection or set is, why you are encapsulating that way, how it is expected to be consumed and then coding it the right way following sound principles like SOLID and DRY.
 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.141223.1 | Last Updated 22 Sep 2009
Article Copyright 2009 by Jeremy Likness
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid