![]() |
General Programming »
String handling »
Strings
Intermediate
Fastest C# Case Insenstive String ReplaceBy Unruled BoyFind a fast way to replace case insenstive string. |
C#.NET 1.1, Win2K, Win2003, VistaVS.NET2003, Dev
|
|
Advanced Search Add to IE Search |
|
|
|
||||||||||||||||
Dealing with data, for many cases, we need to manipulate strings (strings could be the final form of any data, even binary), and among them, replacement is frequently used. We could find string replacement in .NET is easy, since there are different means to accomplish it, such as String.Replace, System.Text.Regex, iteration of String.SubString, or even using Microsoft Visual Basic RunTime's (Microsoft.VisualBasic.DLL) Strings.Replace function. However, each of them has drawbacks. Hence a new method is discovered to perform fast case insensitive string replacement.
I am developing a web resource mining system, I need to deal with heavy and large amount of string replacement of site pages. Since characters may differ from lower/upper case, case insensitivity is the core requirement, and speed is also very important. But I could not find a satisfactory method as a standalone (pure C#), fast and case insensitive string replacement function.
In .NET, without using C++/CLI, there are a few ways to perform string replacement:
String.Replace, but case insensitivity is not supported.
System.Text.Regex (Regular Expression) could be case insensitive through defining the RegExpOption to be IgnoreCase, but it is not so efficient.
String.SubString with normal string concatenation (+) seems to be faster than Regex in small amounts of replacements.
StringBuilder instead of "+". Why is it faster? There are a few articles talking about it on CodeProject.
Strings.Replace function, is fast! But, some C# users don't like that.
Strings.Replace (it has some related functions such as Strings.Split and Strings.Join etc.). It is just as fast as method 5, but without using Microsoft Visual Basic RunTime. Well, is there any method that could achieve better and even much better performance? Here goes the super fast string replacement method:
private static string ReplaceEx(string original,
string pattern, string replacement)
{
int count, position0, position1;
count = position0 = position1 = 0;
string upperString = original.ToUpper();
string upperPattern = pattern.ToUpper();
int inc = (original.Length/pattern.Length) *
(replacement.Length-pattern.Length);
char [] chars = new char[original.Length + Math.Max(0, inc)];
while( (position1 = upperString.IndexOf(upperPattern,
position0)) != -1 )
{
for ( int i=position0 ; i < position1 ; ++i )
chars[count++] = original[i];
for ( int i=0 ; i < replacement.Length ; ++i )
chars[count++] = replacement[i];
position0 = position1+pattern.Length;
}
if ( position0 == 0 ) return original;
for ( int i=position0 ; i < original.Length ; ++i )
chars[count++] = original[i];
return new string(chars, 0, count);
}
Now let's do some comparisons: the test case is to first generate a long string, then iterate the replacement 1000 times. We include String.Replace to let us know the difference between case sensitive and insensitive options. Please do remember: String.Replace does not support case insensitivity!
static void Main(string[] args)
{
string segment = "AaBbCc";
string source;
string pattern = "AbC";
string destination = "Some";
string result = "";
const long count = 1000;
StringBuilder pressure = new StringBuilder();
HiPerfTimer time;
for (int i = 0; i < count; i++)
{
pressure.Append(segment);
}
source = pressure.ToString();
GC.Collect();
//regexp
time = new HiPerfTimer();
time.Start();
for (int i = 0; i < count; i++)
{
result = Regex.Replace(source, pattern,
destination, RegexOptions.IgnoreCase);
}
time.Stop();
Console.WriteLine("regexp = " + time.Duration + "s");
GC.Collect();
//vb
time = new HiPerfTimer();
time.Start();
for (int i = 0; i < count; i++)
{
result = Strings.Replace(source, pattern,
destination, 1, -1, CompareMethod.Text);
}
time.Stop();
Console.WriteLine("vb = " + time.Duration + "s");
GC.Collect();
//vbReplace
time = new HiPerfTimer();
time.Start();
for (int i = 0; i < count; i++)
{
result = VBString.Replace(source, pattern,
destination, 1, -1, StringCompareMethod.Text);
}
time.Stop();
Console.WriteLine("vbReplace = " + time.Duration + "s");// + result);
GC.Collect();
// ReplaceEx
time = new HiPerfTimer();
time.Start();
for (int i = 0; i < count; i++)
{
result = Test.ReplaceEx(source, pattern, destination);
}
time.Stop();
Console.WriteLine("ReplaceEx = " + time.Duration + "s");
GC.Collect();
// Replace
time = new HiPerfTimer();
time.Start();
for (int i = 0; i < count; i++)
{
result = source.Replace(pattern.ToLower(), destination);
}
time.Stop();
Console.WriteLine("Replace = " + time.Duration + "s");
GC.Collect();
//sorry, two slow :(
/*//substring
time = new HiPerfTimer();
time.Start();
for (int i = 0; i < count; i++)
{
result = StringHelper.ReplaceText(source, pattern,
destination, StringHelper.CompareMethods.Text);
}
time.Stop();
Console.WriteLine("substring =" + time.Duration + ":");
GC.Collect();
//substring with stringbuilder
time = new HiPerfTimer();
time.Start();
for (int i = 0; i < count; i++)
{
result = StringHelper.ReplaceTextB(source, pattern,
destination, StringHelper.CompareMethods.Text);
}
time.Stop();
Console.WriteLine("substringB=" + time.Duration + ":");
GC.Collect();
*/
Console.ReadLine();
}
The results:
1��string segment = "abcaBc";
regexp = 3.75481827997692s
vb = 1.52745502570857s
vbReplace = 1.46234256029747s
ReplaceEx = 0.797071415501132s !!!Replace = 0.178327413120941s
// ReplaceEx > vbReplace > vb > regexp
2��string segment = "abcaBcabC";
regexp = 5.30117431126023s
vb = 2.46258449048692s
vbReplace = 2.5018721653171s
ReplaceEx = 1.00662179131705s !!!
Replace = 0.233760994763301s
// ReplaceEx > vb > vbReplace > regexp
3��string segment = "abcaBcabCAbc";
regexp = 7.00987862982586s
vb = 3.61050301085753s
vbReplace = 3.42324876485699s
ReplaceEx = 1.14969947297771s !!!
Replace = 0.277254511397398s
// ReplaceEx > vbReplace > vb > regexp
4��string segment = "ABCabcAbCaBcAbcabCABCAbcaBC";
regexp = 13.5940090151123s
vb = 11.6806222578568s
vbReplace = 11.1757614445411s
ReplaceEx = 1.70264153684337s !!!(my god!)
Replace = 0.42236820601501s
// ReplaceEx > vbReplace > vb > regexp
OK, is the ReplaceEx function really the fastest in all conditions?
5��string segment = "AaBbCc";
regexp = 0.671307945562914s
vb = 0.32356849823092s
vbReplace = 0.316965703741677s !!!
ReplaceEx = 0.418256510254795s
Replace = 0.0453026851178013s
// vbReplace > vb > ReplaceEx > regexp
Why? The bottle neck is:
string upperString = original.ToUpper();
string upperPattern = pattern.ToUpper();
When there is no string to be replaced, time is wasted in string.ToUpper().
I love resources mining and clustering. I have developed a GUI system with support for windows, mouse, multimedia, true color screen using Quick BASIC 7.1!
General
News
Question
Answer
Joke
Rant
Admin
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 4 Jul 2005 Editor: Smitha Vijayan |
Copyright 2005 by Unruled Boy Everything else Copyright © CodeProject, 1999-2009 Web16 | Advertise on the Code Project |