Click here to Skip to main content
Licence 
First Posted 4 Jul 2005
Views 177,208
Bookmarked 48 times

Fastest C# Case Insenstive String Replace

By | 4 Jul 2005 | Article
Find a fast way to replace case insenstive string.

Introduction

Dealing with data, for many cases, we need to manipulate strings (strings could be the final form of any data, even binary), and among them, replacement is frequently used. We could find string replacement in .NET is easy, since there are different means to accomplish it, such as String.Replace, System.Text.Regex, iteration of String.SubString, or even using Microsoft Visual Basic RunTime's (Microsoft.VisualBasic.DLL) Strings.Replace function. However, each of them has drawbacks. Hence a new method is discovered to perform fast case insensitive string replacement.

Background

I am developing a web resource mining system, I need to deal with heavy and large amount of string replacement of site pages. Since characters may differ from lower/upper case, case insensitivity is the core requirement, and speed is also very important. But I could not find a satisfactory method as a standalone (pure C#), fast and case insensitive string replacement function.

Using the code

In .NET, without using C++/CLI, there are a few ways to perform string replacement:

  1. The most commonly used method is the String.Replace, but case insensitivity is not supported.
  2. System.Text.Regex (Regular Expression) could be case insensitive through defining the RegExpOption to be IgnoreCase, but it is not so efficient.
  3. Iteration of String.SubString with normal string concatenation (+) seems to be faster than Regex in small amounts of replacements.
  4. The only difference against method 3 is to use StringBuilder instead of "+". Why is it faster? There are a few articles talking about it on CodeProject.
  5. Import Microsoft Visual Basic RunTime (Microsoft.VisualBasic.DLL) namespace and using Strings.Replace function, is fast! But, some C# users don't like that.
  6. Hey, why not reflecting method 5? I used Reflector with the help of Denis Bauer's Reflector.FileDisassembler, disassembled the source code of Strings.Replace (it has some related functions such as Strings.Split and Strings.Join etc.). It is just as fast as method 5, but without using Microsoft Visual Basic RunTime.

Well, is there any method that could achieve better and even much better performance? Here goes the super fast string replacement method:

private static string ReplaceEx(string original, 
                    string pattern, string replacement)
{
    int count, position0, position1;
    count = position0 = position1 = 0;
    string upperString = original.ToUpper();
    string upperPattern = pattern.ToUpper();
    int inc = (original.Length/pattern.Length) * 
              (replacement.Length-pattern.Length);
    char [] chars = new char[original.Length + Math.Max(0, inc)];
    while( (position1 = upperString.IndexOf(upperPattern, 
                                      position0)) != -1 )
    {
        for ( int i=position0 ; i < position1 ; ++i )
            chars[count++] = original[i];
        for ( int i=0 ; i < replacement.Length ; ++i )
            chars[count++] = replacement[i];
        position0 = position1+pattern.Length;
    }
    if ( position0 == 0 ) return original;
    for ( int i=position0 ; i < original.Length ; ++i )
        chars[count++] = original[i];
    return new string(chars, 0, count);
}

Now let's do some comparisons: the test case is to first generate a long string, then iterate the replacement 1000 times. We include String.Replace to let us know the difference between case sensitive and insensitive options. Please do remember: String.Replace does not support case insensitivity!

static void Main(string[] args)
{
    string segment = "AaBbCc";
    string source;
    string pattern = "AbC";
    string destination = "Some";
    string result = "";
    
    const long count = 1000;
    StringBuilder pressure = new StringBuilder();
    HiPerfTimer time;

    for (int i = 0; i < count; i++)
    {
        pressure.Append(segment);
    }
    source = pressure.ToString();
    GC.Collect();

    //regexp
    time = new HiPerfTimer();
    time.Start();
    for (int i = 0; i < count; i++)
    {
        result = Regex.Replace(source, pattern, 
                  destination, RegexOptions.IgnoreCase);
    }
    time.Stop();

    Console.WriteLine("regexp    = " + time.Duration + "s");
    GC.Collect();

    //vb
    time = new HiPerfTimer();
    time.Start();
    for (int i = 0; i < count; i++)
    {
        result = Strings.Replace(source, pattern, 
                   destination, 1, -1, CompareMethod.Text);
    }
    time.Stop();

    Console.WriteLine("vb        = " + time.Duration + "s");
    GC.Collect();


    //vbReplace
    time = new HiPerfTimer();
    time.Start();
    for (int i = 0; i < count; i++)
    {
        result = VBString.Replace(source, pattern, 
                   destination, 1, -1, StringCompareMethod.Text);
    }
    time.Stop();

    Console.WriteLine("vbReplace = " + time.Duration + "s");// + result);
    GC.Collect();


    // ReplaceEx
    time = new HiPerfTimer();
    time.Start();
    for (int i = 0; i < count; i++)
    {
        result = Test.ReplaceEx(source, pattern, destination);
    }
    time.Stop();

    Console.WriteLine("ReplaceEx = " + time.Duration + "s");
    GC.Collect();


    // Replace
    time = new HiPerfTimer();
    time.Start();
    for (int i = 0; i < count; i++)
    {
        result = source.Replace(pattern.ToLower(), destination);
    }
    time.Stop();

    Console.WriteLine("Replace   = " + time.Duration + "s");
    GC.Collect();


    //sorry, two slow :(
    /*//substring
    time = new HiPerfTimer();
    time.Start();
    for (int i = 0; i < count; i++)
    {
        result = StringHelper.ReplaceText(source, pattern, 
                   destination, StringHelper.CompareMethods.Text);
    }
    time.Stop();

    Console.WriteLine("substring =" + time.Duration + ":");
    GC.Collect();


    //substring with stringbuilder
    time = new HiPerfTimer();
    time.Start();
    for (int i = 0; i < count; i++)
    {
        result = StringHelper.ReplaceTextB(source, pattern, 
                    destination, StringHelper.CompareMethods.Text);
    }
    time.Stop();

    Console.WriteLine("substringB=" + time.Duration + ":");
    GC.Collect();
    */

    Console.ReadLine();
}

The results:

1¡¢string segment = "abcaBc";
regexp = 3.75481827997692s
vb = 1.52745502570857s
vbReplace = 1.46234256029747s
ReplaceEx = 0.797071415501132s !!!<FONT color=gray>Replace = 0.178327413120941s </FONT>
// ReplaceEx > vbReplace > vb > regexp

2¡¢string segment = "abcaBcabC";
regexp = 5.30117431126023s
vb = 2.46258449048692s
vbReplace = 2.5018721653171s
ReplaceEx = 1.00662179131705s !!!
<FONT color=gray>Replace = 0.233760994763301s </FONT>
// ReplaceEx > vb > vbReplace > regexp

3¡¢string segment = "abcaBcabCAbc";
regexp = 7.00987862982586s
vb = 3.61050301085753s
vbReplace = 3.42324876485699s
ReplaceEx = 1.14969947297771s !!!
<FONT color=gray>Replace = 0.277254511397398s </FONT>
// ReplaceEx > vbReplace > vb > regexp

4¡¢string segment = "ABCabcAbCaBcAbcabCABCAbcaBC";
regexp = 13.5940090151123s
vb = 11.6806222578568s
vbReplace = 11.1757614445411s
ReplaceEx = 1.70264153684337s !!!(my god!)
<FONT color=gray>Replace = 0.42236820601501s</FONT>
// ReplaceEx > vbReplace > vb > regexp

OK, is the ReplaceEx function really the fastest in all conditions?

5¡¢string segment = "AaBbCc";
regexp = 0.671307945562914s
vb = 0.32356849823092s
vbReplace = 0.316965703741677s !!!
ReplaceEx = 0.418256510254795s
Replace = 0.0453026851178013s 
// vbReplace > vb > ReplaceEx > regexp

Why? The bottle neck is:

string upperString = original.ToUpper();
string upperPattern = pattern.ToUpper();

When there is no string to be replaced, time is wasted in string.ToUpper().

Points of Interest

I love resources mining and clustering. I have developed a GUI system with support for windows, mouse, multimedia, true color screen using Quick BASIC 7.1!

History

  • 2005.7.2 - First release.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Huisheng Chen

Architect
www.xnlab.com
Australia Australia

Member

Follow on Twitter Follow on Twitter
I was born in the south of China, started to write GWBASIC code since 1993 when I was 13 years old, with professional .net(c#) and vb, founder of www.xnlab.com
 
Now I am living in Sydney, Australia.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
GeneralMy vote of 5 PinmemberEddy Vluggen23:42 19 Dec '11  
GeneralMy vote of 5 PinmemberJeffrey Schaefer10:45 17 Dec '11  
GeneralMy vote of 1 PinmemberPriyank Bolia2:47 3 Dec '09  
GeneralRe: My vote of 1 PinmemberEddy Vluggen23:41 19 Dec '11  
GeneralRe: My vote of 1 PinmemberUnruled Boy9:50 20 Dec '11  
GeneralMy vote of 1 Pinmemberabcd1234f8:39 10 Aug '09  
GeneralOne more way to do this Pinmembersohail.2320:13 29 May '08  
GeneralRe: One more way to do this PinmemberMichael Epner23:41 8 Apr '09  
Generalfurther optimizations Pinmemberclerigo7:32 1 Aug '07  
Generalthis one is even faster and more flexible [modified] PinPopularmemberMichael Epner7:08 9 Jan '07  
I wrote this a while ago and with such large chunks it seems to be even faster than yours, more flexible and easier to understand Wink | ;-) ... I win with 2.56s versus 2.99s with your first example and 0.35s versus 0.47s with your last example.
 
You can use it also with culture-specific comparison, here is the usage for simple case insensitivity:
 
 
MyToolsClass.Replace("MyOriginalString", "Original", "Replacement", StringComparison.OrdinalIgnoreCase)
 
 
Here goes the method:
 
 
        static public string Replace(string original, string pattern, string replacement, StringComparison comparisonType)
        {
            return Replace(original, pattern, replacement, comparisonType, -1);
        }
        
        static public string Replace(string original, string pattern, string replacement, StringComparison comparisonType, int stringBuilderInitialSize)
        {
            if (original == null)
            {
                return null;
            }
 
            if (String.IsNullOrEmpty(pattern))
            {
                return original;
            }
 

            int posCurrent = 0;
            int lenPattern = pattern.Length;
            int idxNext = original.IndexOf(pattern, comparisonType);
            StringBuilder result = new StringBuilder(stringBuilderInitialSize < 0 ? Math.Min(4096, original.Length) : stringBuilderInitialSize);
 
            while (idxNext >= 0)
            {
                result.Append(original, posCurrent, idxNext - posCurrent);
                result.Append(replacement);
 
                posCurrent = idxNext + lenPattern;
 
                idxNext = original.IndexOf(pattern, posCurrent, comparisonType);
            }
 
            result.Append(original, posCurrent, original.Length - posCurrent);
 
            return result.ToString();
        }
 
 

The secret might be the overload of the StringBuilder.Append method which is used here, which allows to append a part of a string without having to create any substring from it. That might be a feature that many have overseen yet.
 

EDIT: Fixed bug thanks to "Member 551508". Provided overload where you can specify an initial StringBuilder size as inspired by user tmbrye. The default value is the length of the original string, but not larger than 4096. A large initial size will increase performance slightly but also allocate more memory. Also remember when specifying a large size that the result of a large string could theoretically be a very tiny or even empty string).
 
modified on Thursday, April 9, 2009 5:17 AM

GeneralRe: this one is even faster and more flexible PinmemberdCyphr3:59 15 Feb '08  
GeneralRe: this one is even faster and more flexible PinmemberBiff_MaGriff11:32 11 Apr '08  
GeneralRe: this one is even faster and more flexible PinmemberdCyphr18:30 18 Apr '08  
GeneralRe: this one is even faster and more flexible PinmemberMichael Epner23:53 8 Apr '09  
GeneralRe: this one is even faster and more flexible Pinmembertmbrye14:16 22 Sep '08  
GeneralRe: this one is even faster and more flexible PinmemberMichael Epner4:35 6 Nov '08  
GeneralRe: this one is even faster and more flexible PinmemberK.v.S.3:14 30 Oct '08  
GeneralRe: this one is even faster and more flexible PinmemberMember 5515086:55 9 Dec '08  
GeneralRe: this one is even faster and more flexible PinmemberMichael Epner23:26 8 Apr '09  
GeneralRe: this one is even faster and more flexible Pinmemberjamie jones8:11 8 Apr '09  
GeneralRe: this one is even faster and more flexible PinmemberMichael Epner23:54 8 Apr '09  
GeneralRe: Modify to Replace Only at End of String Pinmemberkevinswarner10:35 9 Apr '09  
GeneralRe: Modify to Replace Only at End of String PinmemberMichael Epner5:57 14 Apr '09  
GeneralRe: this one is even faster and more flexible Pinmemberdb conner6:46 27 Feb '10  
GeneralThere is no clear winner PinmemberIK1310:46 22 Feb '12  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web01 | 2.5.120529.1 | Last Updated 4 Jul 2005
Article Copyright 2005 by Huisheng Chen
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid