Click here to Skip to main content
16,020,565 members
Articles / Programming Languages / C#

What's in Your Collection? Part 3 of 3: Custom Collections

Rate me:
Please Sign up or sign in to vote.
4.71/5 (8 votes)
22 Sep 2009CPOL6 min read 27.9K   37   3
Understand collections, iterators, and the use of the yield statement to create powerful, custom collections in C#.

Introduction

This is the last installment in a three part series about using collections in C#.

The entire series can be accessed here:

We've covered the interfaces and some concrete instances of collections provided by the .NET Framework. Now you are interested in moving things to the next level. What if the provided collections simply don't meet your business requirements? What are some ways you can use the collections concept to build your own classes to solve business problems?

Yield to Iterators

The first important thing to understand when you begin building your custom collection is the concept of iterators in .NET and the yield statement. I'm surprised that many people use the language without truly understanding this statement, why it exists, and how it can be used.

You might have encountered yield in your journeys. If you've built custom AJAX client controls, you have probably implemented IScriptControl. One method asks for IEnumerable<ScriptReference>. The implementation is usually presented as:

C#
...
ScriptReference sr = new ScriptReference("~/MyUserControl.js");
yield return sr;
...

You could alternatively have created a List or any other collection of ScriptReference and returned that. What does yield really do for us?

To better understand, I've created a short little console application. You can create a new console project and simply paste this code to build and run:

C#
using System;
using System.Collections;
using System.Collections.Generic;

namespace Yield
{
    internal class Program
    {
        private delegate bool DoSomething();

        private sealed class Doer
        {
            private readonly DoSomething _doSomething;
            private readonly string _msg;

            public Doer(DoSomething doSomething, string message)
            {
                _doSomething = doSomething;
                _msg = message;
                Console.WriteLine(string.Format("{0}: Ctor()", _msg));
            }

            public bool Do()
            {
                Console.WriteLine(string.Format("{0}: Do()", _msg));
                return _doSomething();
            }
        }

        private sealed class DoerCollection : IEnumerable<Doer>
        {
            public IEnumerator<Doer> GetEnumerator()
            {
                yield return new Doer(() => true, "1");
                yield return new Doer(() => false, "2");
                yield return new Doer(() => true, "3");
                yield break;
            }

            IEnumerator IEnumerable.GetEnumerator()
            {
                return GetEnumerator();
            }
        }

        private static void _DoIt(IEnumerable<Doer> doerCollection)
        {
            foreach (Doer doer in doerCollection)
            {
                if (!doer.Do())
                {
                    break;
                }
                Console.WriteLine(".");                
            }
            Console.WriteLine("..");                
        }

        private static void Main(string[] args)
        {
            _DoIt(new DoerCollection());
            _DoIt(new List<Doer>
                      {
                          new Doer(() => true, "4"),
                          new Doer(() => false, "5"),
                          new Doer(() => true, "6")
                      });

            Console.ReadLine();
        }
    }
}

So, let's walk through the code.

First, I define a delegate called DoSomething that simply states "I want a method that takes no parameters and returns a boolean". This is a contrived example, of course, but in the "real world", you may have a pipeline or chain of responsibility that performs actions and then returns a status indicating that the process should continue or there is another node to consider, etc. I encapsulated the delegate in the class Doer. The constructor takes a "message" and an implementation of the delegate. The only reason I pass in the message is to track which object is doing what. What's important here is to see when the classes are created compared to when the main method is called, which simply invokes the delegate.

Next, I created my custom collection, DoerCollection. This is a collection of "activities" to perform. Obviously, I am simply returning true or false in the example, but again, in a real-world scenario, this could be a file system processor that iterates through a directory and returns files until no more can be found, or calls a Web Service and returns the status ... you get the idea. Notice that I simply yield return different instances of Doer that I pass the delegate implementation and a unique message identifier. If you recall from the first article in this series, this class is a collection because it implements IEnumerable.

The DoIt method takes any collection typed to the Doer class, and loops through the classes calling their "Do" method until false is returned. It also emits some output just to demonstrate how it is looping, etc.

Finally, we get to the implementation. The whole point of this example is to demonstrate how the yield command operates. We perform the exact same function on two very similar collections. The first pass uses an instance of my custom collection. The second pass creates a list and passes that into the method. What do you expect the output to look like? Compile the program and run it, and if you guessed correctly, you have a strong grasp of IEnumerator and yield.

Both collections were wired to contain three instances. Both had an instance return true, then false, then true, so the expected result would be to make it through two items and then break out of our loop. This is exactly what happens, but it's the output that is interesting. It turns out that using the List forced me to create everything up front, even if I weren't going to use them (and who knows when that garbage collector will come by). The custom class using yield however only created two instances. The third class was never created!

yield is nothing more than syntactic sugar for a state engine.

But wait, we are simply spinning through a collection. What do I mean by state engine??

If you recall in part 1, the key to collections is the Enumerator. An Enumerator is a state engine. The "current state" represents either nothing (empty collection or already iterated through the entire collection) or an instance within the collection. The only transition this state engine can make is to move to the next item or end up in an uninitialized state.

This is what the program outputs to the console:

yieldil.png

Now, we'll pull out ildasm to peek beneath the hood. I've highlighted the DoerCollection class.

yieldoutput.png

You'll notice that the GetEnumerator implementation actually creates a nested class behind the scenes. That class is our state engine. In red, you can see the key pieces of that engine: a state, a current Doer instance, and the reference to the parent class. Highlighted is the key method called to transition state, MoveNext.

What is really interesting is pulling open the MoveNext method. I've used RedGate's free Reflector tool to reverse engineer the code. This will take the generated IL and provide a C# representation, so we can see what the actual underlying algorithm for the enumerator is.

C#
private bool MoveNext()
{
    switch (this.<>1__state)
    {
        case 0:
            this.<>1__state = -1;
            if (Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegatea == null)
            {
                Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegatea = 
                   new Program.DoSomething(Program.DoerCollection.<GetEnumerator>b__7);
            }
            this.<>2__current = new Program.Doer(
              Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegatea, "1");
            this.<>1__state = 1;
            return true;

        case 1:
            this.<>1__state = -1;
            if (Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegateb == null)
            {
                Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegateb = 
                  new Program.DoSomething(Program.DoerCollection.<GetEnumerator>b__8);
            }
            this.<>2__current = new Program.Doer(
              Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegateb, "2");
            this.<>1__state = 2;
            return true;

        case 2:
            this.<>1__state = -1;
            if (Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegatec == null)
            {
                Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegatec = 
                  new Program.DoSomething(Program.DoerCollection.<GetEnumerator>b__9);
            }
            this.<>2__current = new Program.Doer(
              Program.DoerCollection.CS$<>9__CachedAnonymousMethodDelegatec, "3");
            this.<>1__state = 3;
            return true;

        case 3:
            this.<>1__state = -1;
            break;
    }
    return false;
}

You can quickly see that what is generated is really a massive switch statement. Based on the current state, it updates the current reference and changes the state. Most important, however, is the fact that the results of the yield are executed "on demand". In other words, it is not creating a large list, filling it with instances, and then iterating. Instead, the classes are instantiated "on demand" and then referenced for re-use later in case the collection is iterated again.

The whole key to this process is that the enumerator hides the underlying implementation. The consuming code simply knows there is a collection to iterate through. How that collection is built is up to the enumerator, which leads to very interesting possibilities. In the case of the ASP.NET page, this means that controls can be called iteratively and yield their script references and descriptors. The "master" code is simply iterating through the collection and wiring up the script references.

Thinking of collections as different ways of grouping objects is certainly valuable and can pertain to many different business situations. Understanding that Enumerator is really a state machine, however, allows you to start thinking of collections as processes. They aren't necessarily pools of instances, but can be algorithms or other processes as well. The key is that the use of the Enumerator hides the implementation so that the consuming code simply iterates through something without having to understand the underlying implementation of how something is provided.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Program Manager Microsoft
United States United States
Note: articles posted here are independently written and do not represent endorsements nor reflect the views of my employer.

I am a Program Manager for .NET Data at Microsoft. I have been building enterprise software with a focus on line of business web applications for more than two decades. I'm the author of several (now historical) technical books including Designing Silverlight Business Applications and Programming the Windows Runtime by Example. I use the Silverlight book everyday! It props up my monitor to the correct ergonomic height. I have delivered hundreds of technical presentations in dozens of countries around the world and love mentoring other developers. I am co-host of the Microsoft Channel 9 "On .NET" show. In my free time, I maintain a 95% plant-based diet, exercise regularly, hike in the Cascades and thrash Beat Saber levels.

I was diagnosed with young onset Parkinson's Disease in February of 2020. I maintain a blog about my personal journey with the disease at https://strengthwithparkinsons.com/.


Comments and Discussions

 
GeneralYied and IEnumerable Pin
Jim Savarino29-Sep-09 5:25
Jim Savarino29-Sep-09 5:25 
GeneralYield and Marshalling Pin
baruchl28-Sep-09 22:39
baruchl28-Sep-09 22:39 
Dear Jeremy,
maybe I'm missing something, but it seems like the yield method is a good method when the consumer of the list resides in the same app domain of the producer (e.g. two business logic classes in the server). However, if you expose an IEnumerable method that it implementation include usage of yield, specifically, using an external data source as the source of data (say DB) - what will happen in the client calling it? Will it access the DB for each iteration?

Thanks,
Busi

Busi

GeneralRe: Yield and Marshalling Pin
Jeremy Likness29-Sep-09 1:58
professionalJeremy Likness29-Sep-09 1:58 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.