Click here to Skip to main content
15,881,380 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
i have a string of length 98975333 and i need to remove first 5 letters in it. Can anyone suggest the best way to do this keeping performance in mind?

I tried
str.Substring(5,str.Length);
str.Substring(0,5);

which gives me result in 0.29 sec

but i want something even faster than the above.



Thanks in advance
Posted

1 solution

You are wrong in so many aspects… First of all, there are absolute no way to remove anything from a string: strings are immutable.

Surprised? Don't rush to argue based on your code samples. Look carefully:
C#
string str = //...
str.Substring(5, str.Length);
// check up the string value at this point

Compared? Will it come to your surprise: the sting after "removing" is exactly the same as before. Why? Because this method does not remove anything, it creates a new string in the returned value, but you ignore it.

Let's improve it:
C#
string str = //...
str = str.Substring(5, str.Length - 5); // sic!
// now, str value is different

But did you remove anything? No! You still did not (it is absolutely impossible, by the design if this class), even though you really changed the string. Instead, you created a brand new string. You can check it up: the string lost its referential identity. Why? Because a brand new string is created first, its reference part is different, because original string is still being used. Then you lost the reference to the original string. It will later be re-used or removed by GC, eventually. (Do you understand that every reference-type object is actually two objects, often put in different types of memory: one is the reference to the object, another one is the referenced object itself?)

Now, you should understand why the code is pretty slow: you create a brand new sting and initialize it, copying all 98975333 - 5 characters to it from the original string. And then you forget the original string.

But, now, I cannot trust your timing unless you show how you do it exactly. Why? Because on CLR, big part of time is spent in JIT-compiler: http://en.wikipedia.org/wiki/Just-in-time_compilation[^].

Please read the above article. The compilation is generally done on per-method basis. When you call some method for the very first time, a big part of this time is JIT-compilation. Only in second call, the timing is just for execution. But you should also collect statistics: the standard deviation if the execution time of the application code is amazingly high!

See also: http://en.wikipedia.org/wiki/String_interning[^].

And also my past answers:
Confusion about strings in c#[^],
Value Type and reference type[^],
how to compare two strings in c#?[^].

Finally, what to do?

First, consider a mutable class: http://msdn.microsoft.com/en-us/library/system.text.stringbuilder%28v=vs.110%29.aspx[^].

See also my past answer: Difference Between Mutable , Immutable string in C#[^].

However, this is not a main thing.

The answer is simple: in most cases, the operation you demonstrated is the result of bad design of your code. You simply should not remove few bytes from the big string. Instead, you should think how you ended up with that long string. You should not have it in first place. Try to analyze the problem when you get the original string, and, instead, form a data structure where the pars of the data are already isolated. Yes, I'm am serious. Such mistakes often stem from the very bad present-day trend (only in beginners though) to use string representing data instead data itself.

Reduce parsing/splitting, develop synthesis/formatting.

—SA
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900