IEnumerator and foreach

Vlad Neculai Vizitiu

1.67/5 (3 votes)

Nov 14, 2018

CPOL

4 min read

4285

IEnumerator and foreach

Introduction

In my latest post, I used the DirectoryInfo method called “EnumerateFiles”. But if we look into the available methods for the DirectoryInfo, we will notice that there is also a “GetFiles” method that has the same overloads and number of parameters, but what is the difference between the two methods, and why did I choose to use the “EnumerateFiles” method.

Well, to answer that, we will have to look at what happens in the inner workings of .NET, and for that, we will also have to see what Enumerators are and how they work.

Let’s start simple with the foreach construct.

Because foreach sounds so much like the for construct, some might think that the foreach is a more fancy way of using the for look without using another variable to keep count as to where we are.

As we know, the for loop is defined something like this:

for(int index; index < Number; index++) { ...code to execute... }

And the foreach loop is defined something like this:

foreach(var item in COLLECTION) { ... code to execute ... }

But how does the .NET framework know when to stop without a condition present? Well, the easy answer is that it doesn’t know when to stop. What I mean by that is that the .NET Framework uses the concept of Iterators or more commonly found in .NET as Enumerators.

Basically, all the collections found inside of .NET were made to work with Enumerators which is necessary for the foreach loop to work with.

So let’s go deeper, an Enumerator is an object that has a Current property and two methods called Reset and MoveNext, so if we have a collection, let’s say a list of items, and we use it in the foreach loop, then the foreach will call MoveNext, which sets the Current property to the next item in the list, and returns true or false if next object has been found, but if there are no more objects to be found in that list, then the Current object will be set to null.

Using this workflow, we could also use another of the loops found in .NET and call the MoveNext method and the Current property manually, actually there are some algorithms out there that still use the manual call.

So if we were to implement an Enumerator that just would give us even numbers, we would implement the MoveNext method like this:

public bool MoveNext() { Current = Current + 2; return true; }

Now, if you notice, earlier we only returned true, that means that if this Enumerator were to be used with a foreach loop, then it will run forever, unless we stop it, or if we set the overflow check on, until it reaches the maximum size of an int and then throw an exception.

Another consequence of using this workflow (and some of you might have run into it) is that if we use the foreach loop with a collection, then we cannot modify the collection inside that loop, if we tried, then an exception will be thrown.

But there is also another benefit in using Enumerator other than allowing us to use the foreach construct, and that is we work with one object at a time at the moment we executed the MoveNext method.

Back to our example of “GetFiles” vs “EnumerateFiles”, let’s use as an example of a folder with 1000 files in it.

When we call the “GetFiles” method, we will receive an array of all 1000 files inside that directory, but we will also have to wait until the method goes through each file, turns it into a FileInfo and add it to its inner array before getting back to us. Afterwards, we can cycle through the files and do our work with them.

On the other hand, when we call the “EnumerateFiles” method, the method will look for the files, get the first one it encounters, return it to us, we do the work we wanted to against that file, then we will move onto the next one.

Now imagine that that directory or folder, has tens of thousands of files in it, some of them nested even deeper inside other folders, then using the “GetFiles” approach is very inefficient, even worse, each time we call the method, we get a whole collection, so that means that if we ever want to process multiple files in parallel (more on parallels in a future post), we would have to manage which loop works with which file so we don’t overlap, in which case the foreach loop leveraging the Enumerator will make our lives easier because we only work with one item at a time and each time we request a new one, we just move on ahead.

I will let this sink in and in a future post, we will see how the Enumerator and Enumerables work, and we will delve a bit into the “state-machine” you didn’t (or maybe you did ) know you have, and some techniques of further leveraging Enumerators.

Thank you and I hope to see you next time.

CodeProject