|
|||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
Note: This is an unedited contribution. If this article is inappropriate,
needs attention or copies someone else's work without reference then please
Report This Article
Fast(er) String BuildingIntroductionSo many articles have been written about the string versus StringBuilder topic that there really isn’t anything left for anyone to write about. Or is there? Is there something we all overlooked? Let’s recap: don’t use string for a lot of string operations on a string, do use StringBuilder when speed and efficiency is needed while doing a multitude of string operations. All is good and well; expect that StringBuilder has a major drawback. One that we never seem to talk about, memory usage. BackgroundI was in the situation recently where I had to write an application that would perform a huge number of operations on strings in the least amount of time. As per normal, I had to design, develop and implement the code as soon as it is humanly possible to do, so needless to say not a lot of time was invested into the design aspect of the code. Long story short, I wrote the application and to my amazement (in retrospect) I had used the string class for all string operations. Big mistake.
At first the problems weren’t noticeable, the program ran as it should (bar a few normal haste-bugs) but I noticed that the longer it ran the slower my strings were being built. On top of this it seemed like I had a memory leak or something was using an obscene amount of memory.
In comes the StringBuilder class. At first it seemed to be the answer to all my problems and would solve all the obvious design flaws I had ignorantly implemented.
Using the StringBuilder class immediately gave me a huge performance increase (see the statistical graphs below for some benchmarking) with strings being built in a fraction of time it used to take. The memory problem however was not solved. In fact I started getting Strings are immutable. There I had to say it at least once in this article to give it that certain bit of credibility. What this means is that when a value is assigned to a string, that value can’t ever be changed. Sure the string can change, but only because the initial value it was given is disregarded and substituted with the new one. Look at the following example: string myString = "This is my string"; myString += ", that is now longer."; In the first statement myString gets set to a value. In the second statement the string is extended with another string which should give us a string
Fair enough, I could have set the initial size of my StringBuilder’s internal string but the problem is that I don’t know beforehand what the size of the string would end up being and I couldn’t allocate that amount of memory for every string I was going to build (on average there are 400 strings being built at the same time). The solution should be obvious at this point: instead of increasing the internal string size by double, why not increase it with a smaller amount? Well, easier said than done. The StringBuilder class is a sealed class it would turn out, which means that it can’t be inherited from. The reason for this is unknown to me but might be due to the fact the StringBuilder class is a special class that gets handled by the runtime and JIT compiler and thus has optimization etc. in place that might be lost during inheritance. Which in itself is probably not too bad, because that is the reason why the StringBuilder is such an amazingly fast class, but that doesn’t help me with my problem. It was then suggested to me that I use some obscure Using the CodeI should add at this point that I haven’t written anything else into the code that I did not need and that the component was tailored for my needs, but being a normal .Net class could be extended upon very easily. As it stands I have two supported methods: String String myString; myString = "This is the first part"; myString += ", this is the second part" myString += "and the last part"; Console.WriteLine(myString);StringBuilder StringBuilder myString = new StringBuilder(); myString.Append("This is the first part"); myString.Append(",this is the second part"); myString.Append("and the last part."); Console.WriteLine(myString.ToString());FastStringBuilder FastStringBuilder.StringBuilder myString = new FastStringBuilder.StringBuilder(); myString.Append("This is the first part"); myString.Append(",this is the second part"); myString.Append("and the last part"); Console.WriteLine(myString.ToString());You can see that, except for the FastStringBuilder part, there is virtually no difference between the way you would use the StringBuilder class to the way you would use mine. Where the difference comes in is the fact that you can specify the increase size during the initialization phase as such:
myString = new FastStringBuilder.StringBuilder(100); This tells the FastStringBuilder class to increase the internal string size (when an overflow will occur) with 100 characters. Obviously you would want to find the right balance between time spent increasing the buffer versus memory used by the internal string, but with my component you at least have the ability to find a balance. The InternalsWhat happens internally in my component is that an array of characters will be kept in memory and every time you append a string to it, it will add the characters in the string into the internal character array. When a would-be overflow is detected, the internal buffer is increased by the set number of characters and the string is added as per normal. I have also written in the ability for you to specify the initial size of the internal character array as well as the initial value of the string. Just to provide you with a bit more options.
When the Points of InterestTest1: Append a string containing 100 characters to an initially empty string, 10000 times. String Using the following code string hundredAstring = new string('a', 100); string myString = ""; Console.WriteLine(" Started with 'string' test ..."); DateTime startTime = DateTime.Now; for (int i = 0; i < 10000; i++) myString += hundredAstring; TimeSpan timeTaken = new TimeSpan(DateTime.Now.Ticks - startTime.Ticks); Console.WriteLine(" Finished"); Console.WriteLine(" Seconds to complete : " + timeTaken.Seconds);
StringBuilder Using the following code string hundredAstring = new string('a', 100); StringBuilder myString = new StringBuilder(); Console.WriteLine(" Started with 'StringBuilder' test ..."); DateTime startTime = DateTime.Now; for (int i = 0; i < 10000; i++) myString.Append(hundredAstring); TimeSpan timeTaken = new TimeSpan(DateTime.Now.Ticks - startTime.Ticks); Console.WriteLine(" Finished"); Console.WriteLine(" Seconds to complete : " + timeTaken.Seconds);
FastStringBuilder Using the following code string hundredAstring = new string('a', 100); FastStringBuilder.StringBuilder myString = new FastStringBuilder.StringBuilder(); Console.WriteLine(" Started with 'FastStringBuilder' test ..."); DateTime startTime = DateTime.Now; for (int i = 0; i < 10000; i++) myString.Append(hundredAstring); TimeSpan timeTaken = new TimeSpan(DateTime.Now.Ticks - startTime.Ticks); Console.WriteLine(" Finished"); Console.WriteLine(" Seconds to complete : " + timeTaken.Seconds);
Not too bad, but not good enough. Remember the flexibility aspect I was talking about and the tweaking that you are possible to do? Well, I ran the same test but this time I set the increase size to 5000 because the default increase size is a 1000 and this was the result: Using the following code: string hundredAstring = new string('a', 100); FastStringBuilder.StringBuilder myString = new FastStringBuilder.StringBuilder(5000); Console.WriteLine(" Started with 'FastStringBuilder' test ..."); DateTime startTime = DateTime.Now; for (int i = 0; i < 10000; i++) myString.Append(hundredAstring); TimeSpan timeTaken = new TimeSpan(DateTime.Now.Ticks - startTime.Ticks); Console.WriteLine(" Finished"); Console.WriteLine(" Seconds to complete : " + timeTaken.Seconds + " (milliseconds : " + timeTaken.Milliseconds + ")");
Findings: StringBuilder is still the fastest of the lot, but as it will be shown in the next sample loses the plot somewhat. FastStringBuilder is second by being ~20 times faster than normal string operations and ~20 times slower than StringBuilder. Normal string operations are left at the back of the pack with a bad 30 seconds start to finish time. Test2: Append a string containing 100 characters to an initially empty string an infinite amount of times for a maximum of 30 seconds (In all the results below the code was kept the same as above except the iteration condition was changed so it would never leave the for loop) String
StringBuilder
FastStringBuilder
Findings: This is where FastStringBuilder showcases the fact that it has the best features from both the string and StringBuilder worlds. Both FastStringBuilder and string operations used its fair share in CPU cycles as well as its allocated memory. They both ran for 30 seconds at nearly the same stats, but from the previous tests we know that FastStringBuilder would have processed many more appends than the string method could. StringBuilder failed to run for 30 seconds, and after just (lets round it off) 5 seconds it threw an “Out of memory” exception. Here you can see that StringBuilder merely thunders along its processing path, not caring about the fact the with every increase it doubles the memory footprint ConclusionStringBuilder is still hands down the fastest way of appending to strings, although it is the least memory efficient when dealing with a large number of big sizes strings. String operators such as the + and += should only be used if your operations won’t occur too frequently and the strings you are operating on aren’t bigger than the most basics of strings. My custom created component appends to strings the same way in which StringBuilder does but allows for more control over the increase size so subsequently memory usage is much better than the traditional StringBuilder. I am sure once you start playing around with the component you will want to make some tweaks or identify possible feature additions which would make the component even more usable or faster, but if you want the speed minus the memory usage there is simply no other choice out there at the time this document was written. | ||||||||||||||||||||||||||||