|
|
Comments and Discussions
|
|
 |

|
Good tips, man.
More tips to enhance the speed of .NET?
Any URLs?
thanks
Sonork 100.41263:Anthony_Yio
|
|
|
|

|
Hi there,
You seem to be very up on the subject, and the enumeration thing frankly scares me because I've inherited a project that uses ALOT of enumerations. They are the most OO friendly way of doing things IMHO, but it seems what you've descovered is that it's over 3 times more expensive time wise to use them!
Do you know of any standard structures (Queues, Stacks, etc) that have been implemented to support the IEnumerable interface in a lighter weight way that means enumeration is not quite so expensive in C#?
Thanks,
Paul
/**********************************
Paul Evans, Dorset, UK.
Personal Homepage "EnjoySoftware" @
http://www.enjoysoftware.co.uk/
**********************************/
|
|
|
|

|
I haven't tried it, but looking through the code in this article, I'm 99% sure that the slowness of enumerations is because of the boxing/unboxing of doubles to objects. Try replacing the doubles with an object reference type and you should see very different results.
Anyway, point is, I'm guessing "a project that uses ALOT of enumerations" probably isn't using them with value types like double, but with reference types (classes). Queues and Stacks in the System.Collections namespace should be very fast with reference types. Finally, even if you are using value types (ints/doubles/... anything derived from System.ValueType), I'd be very careful before you go out and start ripping code out because "it's using enumerations and they're slow." If you don't absolutely need the performance it's always better to use more easily understandable, maintainable OO designs than a less-elegant method just because it's a little faster. Remember, templates will likely be in the next major version of C#, and then all your value types can use fast enumerators too.
Btw, if you don't have a solid understanding of it, I'd go on to MSDN (probably some good articles here at CP too, I haven't looked) and read about boxing/unboxing in .NET. It happens very transparently, but it's a good thing to understand, especially if you're performance-conscious.
....
Just read some of the other comments on the article, they more or less cover what I was saying. See the "Enumeration performance" thread.
|
|
|
|

|
I sent in an update of this article to Code Project. It now has the DataEnumerator as a struct. The casting from objects to doubles has been eliminated. Performance of the Enumeration Method has improved, therefore, but it doesn't change the overall results.
--
Trevor Misfeldt
CEO, CenterSpace Software
http://www.centerspace.net
misfeldt@centerspace.net
|
|
|
|

|
Cheers for the article - it's useful information
|
|
|
|

|
One simple explanation for the relative poor performance for the Enumeration is the number of statements that is executed on each iteration. This can be reduced by changing the MoveNext method to read:
return (internal_.Array.Length > ++index_);
In all other iteration test you assume to know the size of the Data array by writing:
for ( int j = 0; j < size; j++ )
which is unfair, so change it to:
for (int j = 0; j < data.Array.Length; j++)
All in all this gives me:
Enumeration: 5,125 seconds
Indexing: 3,531 seconds
Indirect Arrays: 3,89 seconds
Direct Arrays: 3,75 seconds
Poiner: N/A
Better but still a difference !
|
|
|
|

|
Good article. Benchmarks should always be very important in the life of a programmer and there haven't been enough done in .NET yet. But is there any reason that you didn't try unsafe C# instead of managed C++?
|
|
|
|

|
I agree with you on the benchmarks. I started my company, CenterSpace Software, when I realized how good the performance is on the .NET platform. We have an excellent white paper on performance of numerical methods in .NET at NMath Core Performance Benchmarks
We'll get those Fortran programmers to C# someday.
About the C++... I picked managed C++ because our particular application used managed C++. No other reason. Anyone want to try this with unsafe C++?
Cheers,
Trevor Misfeldt
--
Trevor Misfeldt
CEO, CenterSpace Software
http://www.centerspace.net
misfeldt@centerspace.net
|
|
|
|
|

|
I made the explicit interface change. Also, changed the class to a struct as suggested by Don Box. New numbers were:
Enumeration: 13.265 seconds
Indexing: 4.015 seconds
Indirect Arrays: 4.046 seconds
Direct Arrays: 4.218 seconds
Pointer Math: 3.015 seconds
--
Trevor Misfeldt
CEO, CenterSpace Software
http://www.centerspace.net
misfeldt@centerspace.net
|
|
|
|

|
Great analysis. Thanks for the followup!
-- Andy
|
|
|
|

|
My times are quite different (.NET 2003, Framework 1.1, Athlon 1900+, 512MB RAM):
repetitions: 1000
iterations: 1,000000e+006
Enumeration: 17,103 seconds
Indexing: 11,886 seconds
Indirect Arrays: 11,876 seconds
Direct Arrays: 11,275 seconds
Pointer Math: 3,394 seconds
And my little trick:
- add member in DataEnumerator:
private int len_; (I would use m_nLen, but it's a detail
- add initialization of above member in DataEnumerator constructor:
len_ = internal_.Array.Length
- change some line in DataEnumerator::MoveNext to:
if (index_ >= len_)
Times after changes:
repetitions: 1000
iterations: 1,000000e+006
Enumeration: 15,781 seconds
Indexing: 11,636 seconds
Indirect Arrays: 11,626 seconds
Direct Arrays: 11,696 seconds
Pointer Math: 2,373 seconds
Enumeration test is little slower than others.
IMHO it is not worth to write ugly code to be little faster.
So: use enumerations to have pretty code and use PointerMath method if you use some advanced math algorithms.
|
|
|
|

|
Good idea with the "len_". I'll update mine and post an update.
I agree with your conclusions. I would never advocate the pointer math except in special circumstances.
--
Trevor Misfeldt
CEO, CenterSpace Software
misfeldt@centerspace.net
|
|
|
|

|
James,
Any info on when and how generics will be implemented?
I've read that they will be "similar" to C++ templates, and not available in VB.NET. Addition of generics should make C# work much easier and faster without the need for code generator type templates.
Oscar Bowyer
|
|
|
|

|
Oscar Bowyer wrote:
Any info on when and how generics will be implemented?
There was a powerpoint presentation from some conference a while back, but I can't remember where I saw it. But this might be made obsolete by some info below.
IIRC the syntax was similar to C++, but it added more to it. Such as letting the template writer specify that the template parameter implement a certain interface (such as IComparable for a template which does sorting).
Oscar Bowyer wrote:
I've read that they will be "similar" to C++ templates, and not available in VB.NET.
I've seen the same.
I guess you could call the research group at MS[^] the authority, and they have a new (May 1, 2003) release for the SSCLI[^] which adds generics to it. Also note the PDF file they released back in May of 2001 (while the framework was still in beta 2, IIRC).
James
"It is self repeating, of unknown pattern"
Data - Star Trek: The Next Generation
|
|
|
|

|
Thanks for the Info James. I had used the CollectionGen utility in the past for one of my VB.NET Collection Classes. It gave me a good understanding of how things are done. I also implemented "Keyed" lookups with the Strongly-Typed Collections. I since no longer use the CollectionGen because I have a very good Base class that I copy / paste for each Collection Class.
What I am wondering though, is there any way to speed up Object Lookups by using Keys? Currently, the only way I can determine if an Object exists with that Key is to enumerate every Object, look at the Key, and exit the Loop if it's found. Is there some type of Object Pointer or something I could be using?
Private Function GetIndex(ByVal Key As String) As Integer
Dim i As Integer
If (_Count = 0) Then Return -1
For i = 0 To _Items.Length - 1
If (Not IsNothing(_Items(i))) Then
If (_Items(i).Key = Key) Then Exit For
End If
Next i
If (i > _Files.Length - 1) Then Return -1 Else Return i
End Function
I also Implement a pretty advanced "Add" function which Determines if the Key exists (if adding with a Key) and Adds the Item if it doesn't, or Modifies the Existing Item if it does exist. This ensures that I only have the Key in the Collection one time.
I had thought to make a separate ArrayList or an Array of Strings to store the Keys, but it didn't seem very efficient to be ensuring that two Arrays stay In-Sync with each other.
This current Method of looking up the Key is fast enough, however, if it could be improved, I'd be all ears!
|
|
|
|

|
What you're describing is a dictionary-style collection. The framework has a generic one: System.Collections.Hashtable, and CollectionGen has one called HashTable.
HashTables work on the concept of buckets and hashes. The table maintains an array of 'buckets', each bucket is then also has an array or linked list of the key-value pairs you want to store.
The hash is a number generated from the key and is used as in index into the array of buckets, more on this later; for now just know that it is an Integer
To retreive a value out of the dictionary: Once it finds the correct bucket it then goes through all of the items in that bucket to find the key that is equal to the one used. If it finds the key then it returns that value, otherwise it returns Nothing
To add a value to the dictionary: Once it finds the correct bucket it goes through each of the items to see if the key already exists, if it doesn't then it adds your key-value pair to that bucket. If it does find your key it either replaces the value stored there or it raises an exception (if you are using the Item property or Add method, respectively)
While HashTables have a lot going on behind the scenes, there is one major downfall to them. The performance of the table relies on a good hashing algorithm. Unlike Java, .NET lets any object serve as the key to a HashTable because System.Object has a GetHashCode method which is supposed to return the hash for that object.
From the documentation, GetHashCode's default behavior is suitable unless you override the Equals method and change it from using reference equality to something else.
This adds a frusturating aspect to using your own classes in a HashTable. If two objects are equal (the Equals method returns true) then the hash returned by GetHashCode must be the same. So for the duration that your object is serving as a key it must be immutable. Most objects that people write aren't this way so they make poor candidates for use as a key.
HTH,
James
"I despise the city and much prefer being where a traffic jam means a line-up at McDonald's"
Me when telling a friend why I wouldn't want to live with him
|
|
|
|

|
Hmm, Ok, I believe I understand. Although, I do not believe I am using a HashTable. The Implementation I am using is the exact same from the CollectionGen for a Strongly-Typed Collection (which uses an Array of the Type of Object / Class for that Collection). So, if what you're saying above is correct...then I guess there is no other faster method of finding a key than to just look one up by Iterating over the Array of Objects I am storing?
|
|
|
|

|
mikasa wrote:
then I guess there is no other faster method of finding a key than to just look one up by Iterating over the Array of Objects I am storing
Not unless you change some of the structure you are using now, you are closely emulating a HashTable so you may want to take the plunge and fully implement one.
James
"I despise the city and much prefer being where a traffic jam means a line-up at McDonald's"
Me when telling a friend why I wouldn't want to live with him
|
|
|
|
 |
|
|
General News Suggestion Question Bug Answer Joke Rant Admin
|
Article on the relative performance of various methods of iterating through large amounts of data.
| Type | Article |
| Licence | |
| First Posted | 30 Apr 2003 |
| Views | 86,906 |
| Bookmarked | 25 times |
|
|