IEnumerator and foreach






1.67/5 (3 votes)
IEnumerator and foreach
Introduction
In my latest post, I used the DirectoryInfo
method called “EnumerateFiles
”. But if we look into the available methods for the DirectoryInfo
, we will notice that there is also a “GetFiles
” method that has the same overloads and number of parameters, but what is the difference between the two methods, and why did I choose to use the “EnumerateFiles
” method.
Well, to answer that, we will have to look at what happens in the inner workings of .NET, and for that, we will also have to see what Enumerators
are and how they work.
Let’s start simple with the foreach
construct.
Because foreach
sounds so much like the for
construct, some might think that the foreach
is a more fancy way of using the for
look without using another variable to keep count as to where we are.
As we know, the for
loop is defined something like this:
for(int index; index < Number; index++) { ...code to execute... }
And the foreach
loop is defined something like this:
foreach(var item in COLLECTION) { ... code to execute ... }
But how does the .NET framework know when to stop without a condition present? Well, the easy answer is that it doesn’t know when to stop. What I mean by that is that the .NET Framework uses the concept of Iterators
or more commonly found in .NET as Enumerators
.
Basically, all the collections found inside of .NET were made to work with Enumerators
which is necessary for the foreach
loop to work with.
So let’s go deeper, an Enumerator
is an object that has a Current
property and two methods called Reset
and MoveNext
, so if we have a collection, let’s say a list of items, and we use it in the foreach
loop, then the foreach
will call MoveNext
, which sets the Current
property to the next item in the list, and returns true
or false
if next object has been found, but if there are no more objects to be found in that list, then the Current
object will be set to null
.
Using this workflow, we could also use another of the loops found in .NET and call the MoveNext
method and the Current
property manually, actually there are some algorithms out there that still use the manual call.
So if we were to implement an Enumerator
that just would give us even numbers, we would implement the MoveNext
method like this:
public bool MoveNext() { Current = Current + 2; return true; }
Now, if you notice, earlier we only returned true
, that means that if this Enumerator
were to be used with a foreach
loop, then it will run forever
, unless we stop it, or if we set the overflow
check on
, until it reaches the maximum size of an int
and then throw an exception.
Another consequence of using this workflow (and some of you might have run into it) is that if we use the foreach
loop with a collection, then we cannot modify the collection inside that loop, if we tried, then an exception will be thrown.
But there is also another benefit in using Enumerator
other than allowing us to use the foreach
construct, and that is we work with one object at a time at the moment we executed the MoveNext
method.
Back to our example of “GetFiles
” vs “EnumerateFiles
”, let’s use as an example of a folder with 1000 files in it.
When we call the “GetFiles
” method, we will receive an array of all 1000 files inside that directory, but we will also have to wait until the method goes through each file, turns it into a FileInfo
and add it to its inner array before getting back to us. Afterwards, we can cycle through the files and do our work with them.
On the other hand, when we call the “EnumerateFiles
” method, the method will look for the files, get the first one it encounters, return it to us, we do the work we wanted to against that file, then we will move onto the next one.
Now imagine that that directory or folder, has tens of thousands of files in it, some of them nested even deeper inside other folders, then using the “GetFiles
” approach is very inefficient, even worse, each time we call the method, we get a whole collection, so that means that if we ever want to process multiple files in parallel (more on parallels in a future post), we would have to manage which loop works with which file so we don’t overlap, in which case the foreach
loop leveraging the Enumerator
will make our lives easier because we only work with one item at a time and each time we request a new one, we just move on ahead.
I will let this sink in and in a future post, we will see how the Enumerator
and Enumerables
work, and we will delve a bit into the “state-machine” you didn’t (or maybe you did ) know you have, and some techniques of further leveraging
Enumerators
.
Thank you and I hope to see you next time.