Click here to Skip to main content
15,880,469 members
Articles / Programming Languages / C#
Article

StringBuilder vs. String / Fast String Operations with .NET 2.0

Rate me:
Please Sign up or sign in to vote.
3.91/5 (65 votes)
30 Mar 2007CPOL7 min read 380.4K   701   101   47
Comparision of String/StringBuilder functions. Efficient String handling.

Introduction

Strings are so heavily used in all programming languages that we do not think about them very much. We use them simply and hope to do the right thing. Normally all goes well but sometimes we need more performance so we switch to StringBuilder which is more efficient because it does contain a mutable string buffer. .NET Strings are immutable which is the reason why a new string object is created every time we alter it (insert, append, remove, etc.).

That sounds reasonable, so why do we still use the .NET String class functions and not the faster StringBuilder? Because optimal performance is a tricky thing and the first rule of the performance club is to measure it for yourself. Do not believe somebody telling you (including me!) that this or that is faster in every case. It is very difficult to predict the performance of some code in advance because you have to know so many variables that influence the outcome. Looking at the generated MSIL code does still NOT tell you how fast the code will perform. If you want to see why your function is so slow/fast you have to look at the compiled (JIT ed) x86 assembler code to get the full picture.

Greg Young did some very nice posts about what the JITer does make of your MSIL code at your CPU. In the following article I will show you the numbers for StringBuilder vs String which I did measure with .NET 2.0 a P4 3.0 GHz with 1 GB RAM. Every test was performed 5 million times to get a stable value.

Insert a String / Remove a character from one

I inserted the missing words at the beginning of the the sentence "The quick brown fox jumps over the lazy dog" to find out the break even point between String.Insert and StringBuilder.Insert. To see how the removal of characters worked I removed in a for loop one character from the beginning of our test sentence. The results are shown in the diagram below.

Screenshot - StringInsert.JPG
C#
// Used Test functions for this chart
string StringRemove(string str, int Count)  
{
    for(int i=0;i<Count;i++)
        str = str.Remove(0, 1);

    return str;
}

string StringBuilderRemove(string str, int Count)
{
    StringBuilder sb = new StringBuilder(str);
    for(int i=0;i<Count;i++)    
        sb.Remove(0, 1);

    return sb.ToString();        
} 

string StringInsert(string str, string [] inserts)
{
    foreach (string insert in inserts)
        str = str.Insert(0, insert);


    return str;        
}

string StringBuilderInsert(string str, string [] inserts)
{          
    StringBuilder sb = new StringBuilder(str);
    foreach (string insert in inserts)
        sb.Insert(0, insert);

    return sb.ToString();      
}

We see here that StringBuilder is clearly the better choice if we have to alter the string. Insert and Remove operations are nearly always faster with StringBulder. The removal of characters is especially fast with StringBuilder where we gain nearly a factor of two.

Replace one String with another String

Things do become more interesting when we do replace anywhere from one to five words of our fox test sentence.

Screenshot - StringReplace.JPG
C#
// Used Test functions for this chart
string StringReplace(string str, 
                     List<KeyValuePair<string,string>> searchReplace)
{
    foreach (KeyValuePair<string, string> sreplace in searchReplace)
        str = str.Replace(sreplace.Key, sreplace.Value);

    return str;
}

string StringBuilderReplace(string str, 
                            List<KeyValuePair<string,string>> searchReplace)
{
    StringBuilder sb = new StringBuilder(str);
    foreach (KeyValuePair<string, string> sreplace in searchReplace)
        sb.Replace(sreplace.Key, sreplace.Value);

    return sb.ToString();
}

This is somewhat surprising. StringBuilder does not beat String.Replace even if we do many replaces. There seems to be a constant overhead of about 1s we see in our data that we pay if we use StringBuilder. The overhead is quite significant (30%) when we have only a few String.Replaces to do.

String.Format

I checked when StringBuilder.AppendFormat is better than String.Format, and also appended it with the "+" operator.

Screenshot - StringFormat.JPG
C#
// Functions used for this chart
string StringFormat(string format, int Count, params object[] para) 
{ 
    string str=String.Empty; 
    for(int i=0;i<Count;i++) 
        str += String.Format(format,para); return str; 
} 

string StringBuilderFormat(string format, int Count, params object[] para )
{ 
    StringBuilder sb = new StringBuilder(); 
    for(int i=0;i<Count;i++) 
        sb.AppendFormat(format, para); 

    return sb.ToString(); 
}

StringBuilder is better when you have to format and concatenate a string more than five times. You can shift the break even point even further if you do recycle the StringBuilder instance.

String Concatenation

This is the most interesting test because we have several options here. We can concatenate strings with +, String.Concat, String.Join and StringBuilder.Append.

Screenshot - StringConcat.JPG
C#
string Add(params string[] strings) // Used Test functions for this chart
{
    string ret = String.Empty;    
    foreach (string str in strings)
        ret += str;

    return ret;
}

string Concat(params string[] strings)
{
    return String.Concat(strings);
}

string StringBuilderAppend(params string[] strings)
{
    StringBuilder sb = new StringBuilder();
    foreach (string str in strings)
        sb.Append(str);

    return sb.ToString();
}

string Join(params string[] strings)
{
    return String.Join(String.Empty, strings);
}

And the winner for String Concatenation is ... Not string builder but String.Join? After taking a deep look with Reflector I found that String.Join has the most efficient algorithm implemented which allocates in the first pass the final buffer size and then memcopy each string into the just allocated buffer. This is simply unbeatable. StringBuilder does become better above 7 strings compared to the + operator but this is not really code one would see very often.

Comparing Strings

An often underestimated topic is string comparisons. To compare Unicode strings your current locale settings has to be taken into account. Unicode characters with values greater than 65535 do not fit into the .NET Char type which is 16-bit wide. Especially in Asian countries these characters are quite common which complicates the matter even more (case invariant comparisons). The language specialties honoring comparison function of .NET 2.0 (I guess this is true for .NET 1.x also) is implemented in native code which does cost you a managed to unmanaged, and back transition.

Screenshot - StringCompare.JPG
C#
// Used Test functions for this chart
int StringCompare(string str1, string str2) 
{
    return String.Compare(str1, str2, StringComparison.InvariantCulture);
}


int StringCompareOrdinal(string str1, string str2)
{
    return String.CompareOrdinal(str1, str2);
}

It is good that we compared the string comparison functions. A factor of 3 is really impressive and shows that localization comes with a cost which is not always negligible. Even the innocent looking mode StringComparison.InvariantCulture goes into the same slow native function which explains this big difference. When strings are interned, the comparison operation is much faster (over a factor 30) because a check for reference equality is made by the CLR.

To tell the truth, I was surprised by this result also and I did not know for a long time th use of this strange CompareOrdinal function. String.CompareOrdinal does nothing else than to compare the string char (16-bit remember) by char which is done 100% in managed code. That does allow the JITer to play with its optimizing muscles as you can see. If somebody does ask you what this CompareOrdinal is good for you now know why. You can (should) use this function on strings that are not visible to the outside world (users) and are therefore never localized. Only then it is safe to use this function. Remember: Making a program working fast but incorrect is easy. But making it work correctly and operate quickly is a hard thing to do. When you mainly deal with UI code the it's a good bet that you should forget this function very fast.

Conclusions

The following recommendations are valid for our small test strings (~30 chars) but should be applicable to bigger strings (100-500) as well (measure for yourself!). I have seen many synthetic performance measurements that demonstrate the power of StringBuilder with strings that are 10KB and bigger. This is the 1% case in real world programs. Most strings will be significantly shorter. When you optimize a function and you can "feel" the construction costs of an additional object then you have to look very carefully if you can afford the additional initialization costs of StringBuilder. <thread>

String Operation Most Efficient
InsertStringBuilder.Insert > 2 Insertion Strings
String.Insert otherwise
RemoveStringBuilder is faster > 2 characters
to remove
ReplaceString.Replace always
FormatString.Format < 5 Append + Format operations
StringBuilder.AppendFormat > 5 calls
Concatenation+ for 2 strings
String.Join > 2 strings to concatenate

The shiny performance saving StringBuilder does not help in all cases and is, in some cases, slower than other functions. When you want to have good string concatenation performance I recommend strongly that you use String.Join which does an incredible job.

Points of Interest

  • I did not tell you more about the String.Intern function. You need to know more about string interning only if you need to save memory in favor of processing power.
  • If you want to see a good example how you can improve string formatting 14 times for fixed length strings have a look at my blog.
  • Did you notice that there is no String.Reverse in .NET? In any case, you would rarely need that function anyway Greg did put up a little contest to find the fastest String.Reverse function. The functions presented there are fast but do not work correct with surrogate (chars with a value > 65535) Unicode characters. Making it fast and correct is not easy).
  • The test results obtained here are .NET Framework, machine and string length specific. Please do not simply look at the numbers and use this or that function without being certain that the results obtained here are applicable to your concrete problem.

History

  • 28.7.2006 Fixed Download/Fine tuning the coloring of the charts to make it more readable.
  • 27.7.2006 Updated String Comparison graph. Interned string comparison is much faster.
  • 27.7.2006 Fixed bug in String.Concat Diagram. The numbers below 3 string concats where wrong. Thanks Greg for pointing this out.
  • 27.7.2006 Changed String.Format diagramm to get the full picture until when StringBuilder does outperform String.Format and Concat.
  • 26.7.2006 Released v1.0 on CodeProject

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Systems Engineer Siemens
Germany Germany
He is working for a multi national company which is a hard and software vendor of medical equipment. Currently he is located in Germany and enjoys living in general. Although he finds pretty much everything interesting he pays special attention to .NET software development, software architecture and nuclear physics. To complete the picture he likes hiking in the mountains and collecting crystals.

Comments and Discussions

 
GeneralDon't quit your day job... Pin
tonyt26-Jul-06 14:04
tonyt26-Jul-06 14:04 
GeneralRe: Don't quit your day job... Pin
Alois Kraus26-Jul-06 22:32
Alois Kraus26-Jul-06 22:32 
JokeRe: Don't quit your day job... Pin
Abi Bellamkonda3-Aug-06 13:09
Abi Bellamkonda3-Aug-06 13:09 
QuestionGC overhead? Pin
johnb4426-Jul-06 4:20
johnb4426-Jul-06 4:20 
AnswerRe: GC overhead? Pin
gregoryyoung26-Jul-06 13:13
gregoryyoung26-Jul-06 13:13 
AnswerRe: GC overhead? Pin
Alois Kraus26-Jul-06 13:46
Alois Kraus26-Jul-06 13:46 
GeneralString.Format Implementation (viewed by Reflector) Pin
Alexander Nesterenko25-Jul-06 22:25
Alexander Nesterenko25-Jul-06 22:25 
GeneralRe: String.Format Implementation (viewed by Reflector) Pin
Alois Kraus25-Jul-06 23:08
Alois Kraus25-Jul-06 23:08 
GeneralRe: String.Format Implementation (viewed by Reflector) Pin
Christian Klauser31-Mar-07 5:21
Christian Klauser31-Mar-07 5:21 
GeneralRe: String.Format Implementation (viewed by Reflector) Pin
Alois Kraus1-Apr-07 2:35
Alois Kraus1-Apr-07 2:35 
GeneralAppending Strings Pin
Steve Hansen25-Jul-06 19:54
Steve Hansen25-Jul-06 19:54 
GeneralRe: Appending Strings Pin
Alois Kraus25-Jul-06 20:53
Alois Kraus25-Jul-06 20:53 
GeneralRe: Appending Strings Pin
simon.proctor26-Jul-06 2:26
simon.proctor26-Jul-06 2:26 
GeneralWrong Testing Method Pin
davepermen25-Jul-06 14:21
davepermen25-Jul-06 14:21 
GeneralRe: Wrong Testing Method Pin
Alois Kraus25-Jul-06 20:55
Alois Kraus25-Jul-06 20:55 
GeneralRe: Wrong Testing Method Pin
MogobuTheFool2-Aug-06 5:26
MogobuTheFool2-Aug-06 5:26 
GeneralRe: Wrong Testing Method Pin
Alois Kraus2-Aug-06 7:43
Alois Kraus2-Aug-06 7:43 
GeneralRe: Wrong Testing Method Pin
MogobuTheFool3-Aug-06 11:15
MogobuTheFool3-Aug-06 11:15 
GeneralRe: Wrong Testing Method Pin
Alois Kraus3-Aug-06 13:19
Alois Kraus3-Aug-06 13:19 
GeneralRe: Wrong Testing Method Pin
MrDnote15-Nov-06 23:34
MrDnote15-Nov-06 23:34 
GeneralRe: Wrong Testing Method Pin
mross015-Dec-06 8:19
mross015-Dec-06 8:19 
GeneralRe: Wrong Testing Method Pin
Stephen Brannan30-Mar-07 10:34
Stephen Brannan30-Mar-07 10:34 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.