Click here to Skip to main content
Click here to Skip to main content

Iteration Performance in .NET

By , 20 May 2003
 

Introduction

I’ve been implementing numerical libraries in .NET and have come to some conclusions about iteration performance. My classes have to hold a large amount of data and be able to iterate through that data as quickly as possible. In order to compare various methods, I created a simple class called Data that encapsulates an array of doubles.

Method #1: Enumeration

Data implements IEnumerable. It contains GetEnumerator which returns its own DataEnumerator, an inner class.
public DataEnumerator GetEnumerator() 
    {
      return new DataEnumerator( this );
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
      return GetEnumerator();
    }
  }
	
…
  public struct DataEnumerator : IEnumerator
  {
    private Data internal_;
    private int index_;

    public DataEnumerator( Data data ) 
    {
      internal_ = data;
      index_ = 1;
    }

    public double Current
    {
      get
      {
        return internal_.Array[index_];
      }
    }

    object IEnumerator.Current 
    {
      get
      {
        return Current;
      }
    }
  
    public bool MoveNext()
    {
      index_++;
      if ( index_ >= internal_.Array.Length ) 
      {
        return false;
      }
      return true;
    }

    public void Reset()
    {
      index_ = -1;
    }
  }
}
…

Method #2: Indexing

I implemented an index operator on the class which simply calls the index operator on the array.

    public double this[int position]
    {
      get
      {
        return array_[position];
      }
    }

Method #3: Indirect Array

I created a property to access the array.

    public double[] Array
    {
      get 
      {
        return array_;
      }
    }

When iterating, I called the Array property and then its index operator.

          d = data.Array[j];

Method #4: Direct Array

I created a reference to the array.

        double[] array = data.Array;

Then, I iterate through that reference.

          d = array[j];

Method #5: Pointer Math

Finally, I tried improving performance by iterating through the array in Managed C++ using pointer manipulation.

        static void iterate( Data& data )
        {
          double d;
          double __pin* ptr = &( data.Array[0] );
          for ( int i = 0; i < data.Array.Length; i++ ) 
          {
            d = *ptr;
            ++ptr;
          }
        }

I called it this way:

        Pointer.iterate( data );

Conclusions

To test the different methods, I allocated 1,000,000 doubles into an array and indexed over all of them. I repeated this 1,000 times to minimize randomness. Here are the results...

Results

Enumeration is always slow. That’s not surprising as I’m using a general data structure to hold the doubles. The three operator/property methods differed very slightly. These are probably all optimized similarly. Using pointer math to traverse over the raw data was significantly faster. This is probably due to the fact that there’s no bounds checking. In summary, if you have large amounts of data and performance is critical, consider using managed C++.

Acknowledgements

Thanks to Mark Vulfson of ProWorks for tips on using Flipper Graph Control. Also, to my colleagues Ken Baldwin and Steve Sneller at CenterSpace Software.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Trevor Misfeldt
CEO CenterSpace Software
United States United States
Member
Trevor has held demanding development positions for a variety of firms using C++, Java, .NET, and other technologies, including Rogue Wave Software, CleverSet, and ProWorks. He is coauthor of The Elements of Java Style , The Elements of C++ Style, and The Elements of C# Style, published by Cambridge University Press. He has also served on a course advisory board of the University of Washington. His teams have won the JavaWorld "GUI Product of the Year" and XML Magazine "Product of the Year" awards. Trevor holds a BSc in Computer Science from the University of British Columbia and a BA in Economics from the University of Western Ontario.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralNicememberAnthony_Yio13 Oct '03 - 15:51 
Good tips, man.
More tips to enhance the speed of .NET?
Any URLs?
 
thanks
 

 
Sonork 100.41263:Anthony_Yio
GeneralEnumerationsmemberPaul Evans26 May '03 - 23:19 
Hi there,
You seem to be very up on the subject, and the enumeration thing frankly scares me because I've inherited a project that uses ALOT of enumerations. They are the most OO friendly way of doing things IMHO, but it seems what you've descovered is that it's over 3 times more expensive time wise to use them!
 
Do you know of any standard structures (Queues, Stacks, etc) that have been implemented to support the IEnumerable interface in a lighter weight way that means enumeration is not quite so expensive in C#?
 
Thanks,
 
Paul
 
/**********************************
Paul Evans, Dorset, UK.
Personal Homepage "EnjoySoftware" @
http://www.enjoysoftware.co.uk/
**********************************/

GeneralRe: Enumerationssussasm4328 May '03 - 11:21 
I haven't tried it, but looking through the code in this article, I'm 99% sure that the slowness of enumerations is because of the boxing/unboxing of doubles to objects. Try replacing the doubles with an object reference type and you should see very different results.
 
Anyway, point is, I'm guessing "a project that uses ALOT of enumerations" probably isn't using them with value types like double, but with reference types (classes). Queues and Stacks in the System.Collections namespace should be very fast with reference types. Finally, even if you are using value types (ints/doubles/... anything derived from System.ValueType), I'd be very careful before you go out and start ripping code out because "it's using enumerations and they're slow." If you don't absolutely need the performance it's always better to use more easily understandable, maintainable OO designs than a less-elegant method just because it's a little faster. Remember, templates will likely be in the next major version of C#, and then all your value types can use fast enumerators too.
 
Btw, if you don't have a solid understanding of it, I'd go on to MSDN (probably some good articles here at CP too, I haven't looked) and read about boxing/unboxing in .NET. It happens very transparently, but it's a good thing to understand, especially if you're performance-conscious.
 
....
 
Just read some of the other comments on the article, they more or less cover what I was saying. See the "Enumeration performance" thread.
GeneralUpdatememberTrevor Misfeldt7 May '03 - 5:55 

I sent in an update of this article to Code Project. It now has the DataEnumerator as a struct. The casting from objects to doubles has been eliminated. Performance of the Enumeration Method has improved, therefore, but it doesn't change the overall results.
 

 
--
Trevor Misfeldt
CEO, CenterSpace Software
http://www.centerspace.net
misfeldt@centerspace.net
GeneralRe: Updatememberdog_spawn21 May '03 - 16:22 
Cheers for the article - it's useful information Smile | :)
GeneralRe: UpdatesussMorten Sparding11 Mar '05 - 22:28 
One simple explanation for the relative poor performance for the Enumeration is the number of statements that is executed on each iteration. This can be reduced by changing the MoveNext method to read:
 
return (internal_.Array.Length > ++index_);
 
In all other iteration test you assume to know the size of the Data array by writing:
 
for ( int j = 0; j < size; j++ )
 
which is unfair, so change it to:
 
for (int j = 0; j < data.Array.Length; j++)
 
All in all this gives me:
 
Enumeration: 5,125 seconds
Indexing: 3,531 seconds
Indirect Arrays: 3,89 seconds
Direct Arrays: 3,75 seconds
Poiner: N/A
 
Better but still a difference !
Generalunsafe C#memberJohn Rayner7 May '03 - 5:04 
Good article. Benchmarks should always be very important in the life of a programmer and there haven't been enough done in .NET yet. But is there any reason that you didn't try unsafe C# instead of managed C++?
GeneralRe: unsafe C#memberTrevor Misfeldt7 May '03 - 5:53 

I agree with you on the benchmarks. I started my company, CenterSpace Software, when I realized how good the performance is on the .NET platform. We have an excellent white paper on performance of numerical methods in .NET at NMath Core Performance Benchmarks
 
We'll get those Fortran programmers to C# someday. Smile | :)
 
About the C++... I picked managed C++ because our particular application used managed C++. No other reason. Anyone want to try this with unsafe C++?
 
Cheers,
Trevor Misfeldt
 

 


 
--
Trevor Misfeldt
CEO, CenterSpace Software
http://www.centerspace.net
misfeldt@centerspace.net
GeneralEnumeration performancememberJames T. Johnson2 May '03 - 1:59 
The casting isn't the only thing that is making your Enumerator so slow, since double is a value type you are also getting a boxing and unboxing operation for each iteration. I suspect that is the true cause of the Enumerator being so slow.
 
Chris Sells and contributers faced this problem as well for their CollectionGen utility (which creates strongly-typed collection classes). Their solution was to get rid of the object generic-ness* and go with writing the ArrayList type behavior from scratch.
 
The end result is that you eliminate any boxing operations when you are creating a value-type based collection.
 
I made similar changes to your code, but it looks like my laptop isn't the best for running benchmarks. I ran it 3-4 times and each time the order was far different from yours and from the time I ran it before. [one time the pointer version was second slowest OMG | :OMG: ]
 
To make similar changes, take advantage of explicit interface implementation. For example:
 
object IEnumerator.Current
{
  get { return (object) this.Current }
}
 
public double Current
{ 
  get { return _array.Array[index]; } // can't remember the var name
}
Now you still don't have access to the new, type-safe property because GetEnumerator returns an IEnumerator, which is still type-unsafe. So you need to make similar changes throughout until you have IEnumerator's type-unsafe stuff hidden from view and use the type-safe stuff by default.
 
It is a lot of extra work, for now; but if you use CollectionGen or some other template based generator you only need to do the work once. Of course this will hopefully be taken care of for us in .NET 2.0 when we get generics Smile | :)
 
James
 
"It is self repeating, of unknown pattern"
Data - Star Trek: The Next Generation

GeneralRe: Enumeration performancesussTrevor Misfeldt2 May '03 - 4:29 

I made the explicit interface change. Also, changed the class to a struct as suggested by Don Box. New numbers were:
 
Enumeration: 13.265 seconds
Indexing: 4.015 seconds
Indirect Arrays: 4.046 seconds
Direct Arrays: 4.218 seconds
Pointer Math: 3.015 seconds
 

 
--
Trevor Misfeldt
CEO, CenterSpace Software
http://www.centerspace.net
misfeldt@centerspace.net
GeneralRe: Enumeration performancememberAndy Gray6 May '03 - 6:23 
Great analysis. Thanks for the followup!
 
-- Andy

GeneralRe: Enumeration performancememberJedrekM20 Jun '03 - 8:35 
My times are quite different (.NET 2003, Framework 1.1, Athlon 1900+, 512MB RAM):
 
repetitions: 1000
iterations: 1,000000e+006
 
Enumeration: 17,103 seconds
Indexing: 11,886 seconds
Indirect Arrays: 11,876 seconds
Direct Arrays: 11,275 seconds
Pointer Math: 3,394 seconds
 
And my little trick:
- add member in DataEnumerator:
private int len_; (I would use m_nLen, but it's a detail Wink | ;-)
- add initialization of above member in DataEnumerator constructor:
len_ = internal_.Array.Length
- change some line in DataEnumerator::MoveNext to:
if (index_ >= len_)
 
Times after changes:
 
repetitions: 1000
iterations: 1,000000e+006
 
Enumeration: 15,781 seconds
Indexing: 11,636 seconds
Indirect Arrays: 11,626 seconds
Direct Arrays: 11,696 seconds
Pointer Math: 2,373 seconds
 
Enumeration test is little slower than others.
IMHO it is not worth to write ugly code to be little faster.
So: use enumerations to have pretty code and use PointerMath method if you use some advanced math algorithms.
 

GeneralRe: Enumeration performancememberTrevor Misfeldt20 Jun '03 - 9:46 

Good idea with the "len_". I'll update mine and post an update.
 
I agree with your conclusions. I would never advocate the pointer math except in special circumstances.
 

 
--
Trevor Misfeldt
CEO, CenterSpace Software
misfeldt@centerspace.net
GeneralRe: Enumeration performancememberOscar Bowyer6 May '03 - 3:44 
James,
 
Any info on when and how generics will be implemented?
 
I've read that they will be "similar" to C++ templates, and not available in VB.NET. Addition of generics should make C# work much easier and faster without the need for code generator type templates.
 
Oscar Bowyer
GeneralRe: Enumeration performancememberJames T. Johnson6 May '03 - 4:36 
Oscar Bowyer wrote:
Any info on when and how generics will be implemented?
 
There was a powerpoint presentation from some conference a while back, but I can't remember where I saw it. But this might be made obsolete by some info below.
 
IIRC the syntax was similar to C++, but it added more to it. Such as letting the template writer specify that the template parameter implement a certain interface (such as IComparable for a template which does sorting).
 
Oscar Bowyer wrote:
I've read that they will be "similar" to C++ templates, and not available in VB.NET.
 
I've seen the same.
 
I guess you could call the research group at MS[^] the authority, and they have a new (May 1, 2003) release for the SSCLI[^] which adds generics to it. Also note the PDF file they released back in May of 2001 (while the framework was still in beta 2, IIRC).
 
James
 
"It is self repeating, of unknown pattern"
Data - Star Trek: The Next Generation

GeneralRe: Enumeration performancemembermikasa22 May '03 - 3:11 
Thanks for the Info James. I had used the CollectionGen utility in the past for one of my VB.NET Collection Classes. It gave me a good understanding of how things are done. I also implemented "Keyed" lookups with the Strongly-Typed Collections. I since no longer use the CollectionGen because I have a very good Base class that I copy / paste for each Collection Class.
 
What I am wondering though, is there any way to speed up Object Lookups by using Keys? Currently, the only way I can determine if an Object exists with that Key is to enumerate every Object, look at the Key, and exit the Loop if it's found. Is there some type of Object Pointer or something I could be using?
 
'Function to Get the Index of a Key
Private Function GetIndex(ByVal Key As String) As Integer
     Dim i As Integer
     
     'Exit if there are no Items
     If (_Count = 0) Then Return -1
     
     'Determine if the Key Exists
     For i = 0 To _Items.Length - 1
          If (Not IsNothing(_Items(i))) Then
	       If (_Items(i).Key = Key) Then Exit For
	  End If
     Next i
     If (i > _Files.Length - 1) Then Return -1 Else Return i
End Function
 
I also Implement a pretty advanced "Add" function which Determines if the Key exists (if adding with a Key) and Adds the Item if it doesn't, or Modifies the Existing Item if it does exist. This ensures that I only have the Key in the Collection one time.
 
I had thought to make a separate ArrayList or an Array of Strings to store the Keys, but it didn't seem very efficient to be ensuring that two Arrays stay In-Sync with each other.
 
This current Method of looking up the Key is fast enough, however, if it could be improved, I'd be all ears!
GeneralRe: Enumeration performancememberJames T. Johnson22 May '03 - 9:11 
What you're describing is a dictionary-style collection. The framework has a generic one: System.Collections.Hashtable, and CollectionGen has one called HashTable.
 
HashTables work on the concept of buckets and hashes. The table maintains an array of 'buckets', each bucket is then also has an array or linked list of the key-value pairs you want to store.
 
The hash is a number generated from the key and is used as in index into the array of buckets, more on this later; for now just know that it is an Integer
 
To retreive a value out of the dictionary: Once it finds the correct bucket it then goes through all of the items in that bucket to find the key that is equal to the one used. If it finds the key then it returns that value, otherwise it returns Nothing
 
To add a value to the dictionary: Once it finds the correct bucket it goes through each of the items to see if the key already exists, if it doesn't then it adds your key-value pair to that bucket. If it does find your key it either replaces the value stored there or it raises an exception (if you are using the Item property or Add method, respectively)
 
While HashTables have a lot going on behind the scenes, there is one major downfall to them. The performance of the table relies on a good hashing algorithm. Unlike Java, .NET lets any object serve as the key to a HashTable because System.Object has a GetHashCode method which is supposed to return the hash for that object.
 
From the documentation, GetHashCode's default behavior is suitable unless you override the Equals method and change it from using reference equality to something else.
 
This adds a frusturating aspect to using your own classes in a HashTable. If two objects are equal (the Equals method returns true) then the hash returned by GetHashCode must be the same. So for the duration that your object is serving as a key it must be immutable. Most objects that people write aren't this way so they make poor candidates for use as a key.
 
HTH,
 
James
 
"I despise the city and much prefer being where a traffic jam means a line-up at McDonald's"
Me when telling a friend why I wouldn't want to live with him

GeneralRe: Enumeration performancemembermikasa22 May '03 - 9:18 
Hmm, Ok, I believe I understand. Although, I do not believe I am using a HashTable. The Implementation I am using is the exact same from the CollectionGen for a Strongly-Typed Collection (which uses an Array of the Type of Object / Class for that Collection). So, if what you're saying above is correct...then I guess there is no other faster method of finding a key than to just look one up by Iterating over the Array of Objects I am storing?
GeneralRe: Enumeration performancememberJames T. Johnson22 May '03 - 9:29 
mikasa wrote:
then I guess there is no other faster method of finding a key than to just look one up by Iterating over the Array of Objects I am storing
 
Not unless you change some of the structure you are using now, you are closely emulating a HashTable so you may want to take the plunge and fully implement one.
 
James
 
"I despise the city and much prefer being where a traffic jam means a line-up at McDonald's"
Me when telling a friend why I wouldn't want to live with him

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130523.1 | Last Updated 21 May 2003
Article Copyright 2003 by Trevor Misfeldt
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid