Yield or Not To Yield

Pavel Rytikov

4.96/5 (9 votes)

Sep 24, 2016

CPOL

4 min read

29816

Brief explanation of what yield is and when to use it.

.NET community has frozen in anticipation of C# 7.0 and new features that it brings. Each version of the language that will turn 15 years old next year brought something new and useful for developers. Though each feature is worth separate mention, today I want to tell you about keyword yield. I noticed, that beginners (not only) avoid using it. In this article, I'll try to show its pros and cons, and provide cases when yield usage makes sense.

yield creates iterator and lets us not write separate Enumerator class when we implement IEnumerable. C# has two expressions that contain yield: yield return <expression> and yield break. yield can be used in methods, operators or get accessors but I'll mostly talk about methods as yield works the same way everywhere.

Writing yield return we indicate that current method returns IEnumerable, which elements are results of yield return expressions. After yield method stops its execution, it returns control to caller. yield continues execution after next element of sequence is requested. Method variables retain their values between yield return expressions. yield break does well-known role of break which we use in loops. The example below returns numbers sequence from 0 to 10:

private static IEnumerable<int> GetNumbers() {
    var number = 0;
    while (true) {
        if (number > 10)
            yield break;

        yield return number++;
    }
}

It's important to notice that yield has several constraints. Iterator's Reset method throws NotSupportedException. We can't use yield in anonymous methods and methods that have unsafe code. Also, yield return can't be placed inside try-catch, but can be used in try section of try-finally. yield break can be placed in section try of both try-catch and try-finally. I recommend reading about reasons of such behavior here and here.

Let’s see what yield compiles into. Each method with yield return is represented by a state machine which goes from one state to another during iterator execution. Listed below is a simple application which prints infinite sequence of odd numbers:

internal class Program
{
    private static void Main1() {
        foreach (var number in GetOddNumbers(10))
            Console.WriteLine(number);
    }

    private static IEnumerable<int> GetOddNumbers(int staringWith) {
        var previous = staringWith;
        while (true)
            if (++previous%2 != 0)
                yield return previous;
    }
}

Compiler generates the following code:

internal class Program
{
    private static void Main() {
        IEnumerator<int> enumerator = null;
        try {
            enumerator = GetOddNumbers().GetEnumerator();
            while (enumerator.MoveNext())
                Console.WriteLine(enumerator.Current);
        } finally {
            if (enumerator != null)
                enumerator.Dispose();
        }
    }

    [IteratorStateMachine(typeof(CompilerGeneratedYield))]
    private static IEnumerable<int> GetOddNumbers() {
        return new CompilerGeneratedYield(-2);
    }

    [CompilerGenerated]
    private sealed class CompilerGeneratedYield : IEnumerable<int>, 
        IEnumerable, IEnumerator<int>, IDisposable, IEnumerator
    {
        private readonly int _initialThreadId;
        private int _current;
        private int _previous;
        private int _state;

        [DebuggerHidden]
        public CompilerGeneratedYield(int state) {
            _state = state;
            _initialThreadId = Environment.CurrentManagedThreadId;
        }

        [DebuggerHidden]
        IEnumerator<int> IEnumerable<int>.GetEnumerator() {
            CompilerGeneratedYield getOddNumbers;
            if ((_state == -2) && (_initialThreadId == Environment.CurrentManagedThreadId)) {
                _state = 0;
                getOddNumbers = this;
            } else {
                getOddNumbers = new CompilerGeneratedYield(0);
            }

            return getOddNumbers;
        }

        [DebuggerHidden]
        IEnumerator IEnumerable.GetEnumerator() {
            return ((IEnumerable<int>)this).GetEnumerator();
        }

        int IEnumerator<int>.Current {
            [DebuggerHidden] get { return _current; }
        }

        object IEnumerator.Current {
            [DebuggerHidden] get { return _current; }
        }

        [DebuggerHidden]
        void IDisposable.Dispose() { }

        bool IEnumerator.MoveNext() {
            switch (_state) {
                case 0:
                    _state = -1;
                    _previous = 0;
                    break;
                case 1:
                    _state = -1;
                    break;
                default:
                    return false;
            }

            int num;
            do {
                num = _previous + 1;
                _previous = num;
            } while (num%2 == 0);

            _current = _previous;
            _state = 1;

            return true;
        }

        [DebuggerHidden]
        void IEnumerator.Reset() {
            throw new NotSupportedException();
        }
    }
}

From example, you can find that yield method body was replaced with class which implements IEnumerable and IEnumerator. It has yield method local variables as fields. Method's logic was transformed into a state machine and moved to MoveNext. Depending on initial yield logic, class can have Dispose method implementation.

Let’s go further and do 2 tests to measure yield performance and memory consumption. Just to note - these tests are synthetic and listed here only to show yield in comparison with straight implementation. I'm using BenchmarkDotNet with BenchmarkDotNet.Diagnostics.Windows diagnostic module. First test compares implementations of method that returns sequence of integers (like Enumerable.Range(start, count)). First implementation is without iterator, second with:

public int[] Array(int start, int count) {
    var numbers = new int[this.count];
    for (var i = start; i < this.count; i++)
        numbers[i] = this.start + i;

    return numbers;
}

public int[] Iterator(int start, int count) {
    return IteratorInternal(start, count).ToArray();
}

private IEnumerable<int> IteratorInternal(int start, int count) {
    for (var i = start; i < count; ++i)
        yield return start + i;
}

Method	Count	Start	Median	StdDev	Gen 0	Gen 1	Gen 2	Bytes Allocated/Op
Array	100	10	91.19 ns	1.25 ns	385.01	-	-	169.18
Iterator	100	10	1,173.26 ns	10.94 ns	1,593.00	-	-	700.37

As you can see from results, Array implementation is almost 10 times faster and consumes almost 4 times less memory. Iterator class and separate ToArray call do their job.

The second test is more complex. It emulates data stream processing. It sequentially selects records with even key and then records with key multiplied by 3. Similar to the previous test, the first implementation is without iterator, second with:

public List<Tuple<int, string>> List(int start, int count) {
    var odds = new List<Tuple<int, string>>();
    foreach (var record in OddsArray(ReadFromDb(start, count)))
        if (record.Item1%3 == 0)
            odds.Add(record);

    return odds;
}

public List<Tuple<int, string>> Iterator(int start, int count) {
    return IteratorInternal(start, count).ToList();
}

private IEnumerable<Tuple<int, string>> IteratorInternal(int start, int count) {
    foreach (var record in OddsIterator(ReadFromDb(start, count)))
        if (record.Item1%3 == 0)
            yield return record;
}

private IEnumerable<Tuple<int, string>> OddsIterator(IEnumerable<Tuple<int, string>> records) {
    foreach (var record in records)
        if (record.Item1%2 != 0)
            yield return record;
}

private List<Tuple<int, string>> OddsArray(IEnumerable<Tuple<int, string>> records) {
    var odds = new List<Tuple<int, string>>();
    foreach (var record in records)
        if (record.Item1%2 != 0)
            odds.Add(record);

    return odds;
}

private IEnumerable<Tuple<int, string>> ReadFromDb(int start, int count) {
    for (var i = start; i < count; ++i)
        yield return new KeyValuePair<int, string>(start + i, RandomString());
}

private static string RandomString() {
    return Guid.NewGuid().ToString("n");
}

Method	Count	Start	Median	StdDev	Gen 0	Gen 1	Gen 2	Bytes Allocated/Op
List	100	10	43.14 us	0.14 us	279.04	-	-	4,444.14
Iterator	100	10	43.22 us	0.76 us	231.00	-	-	3,760.96

In this test, implementations performance is the same, but yield memory consumption is lower. It's due to the fact that collection is computed only once and we saved memory on allocation only one List<Tuple<int, string>>.

Considering all the above, I can do a short conclusion. Main disadvantage of yield is creation of additional iterator class. When sequence is finite and caller doesn't have complex logic, yield is slower and allocates more memory. Usage of yield makes sense in cases of data processing when each collection computation causes large memory block allocation. Lazy nature of yield can help to avoid computation of elements that will be filtered. In such cases, yield significantly reduces memory consumption and CPU load.