Click here to Skip to main content
11,930,069 members (57,657 online)
Click here to Skip to main content
Add your own
alternative version

Tagged as



Counting lines in a string

, 28 Feb 2012 CPOL
Rate this:
Please Sign up or sign in to vote.
Great analysis!I found out that Regex can be accelerated by a factor of about two.Instead of new Regex(@"\n", RegexOptions.Compiled|RegexOptions.Multiline);you can speed up by using:new Regex(@"^.*?$", RegexOptions.Compiled|RegexOptions.Multiline);But admittedly, nothing beats...
Great analysis!

I found out that Regex can be accelerated by a factor of about two.

Instead of
new Regex(@"\n", RegexOptions.Compiled|RegexOptions.Multiline);

you can speed up by using:
new Regex(@"^.*?$", RegexOptions.Compiled|RegexOptions.Multiline);

But admittedly, nothing beats the native methods (IndexOf).

My statement above is wrong: I did compare "$" (and not "\n") against "^.*?".
The measurments show that "\n" is the fastest of all Regex matches, while "$" is the slowest (5 times slower than "\n"...!).
That's a real surprise to me.

The comparison:

Regex Match[ms] for 2.500.000 linesRegexOptions

As a comparison: IndexOf('\n') only takes 237 [ms].



This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Andreas Gieriet
Founder eXternSoft GmbH
Switzerland Switzerland
I feel comfortable on a variety of systems (UNIX, Windows, cross-compiled embedded systems, etc.) in a variety of languages, environments, and tools.
I have a particular affinity to computer language analysis, testing, as well as quality management.

More information about what I do for a living can be found at my LinkedIn Profile and on my company's web page (German only).

You may also be interested in...

Comments and Discussions

GeneralRe: So Regex("^.*?$") is not faster than Regex("\n"), as you ori... Pin
Ronald M. Martin29-Feb-12 8:06
memberRonald M. Martin29-Feb-12 8:06 
GeneralRe: Ah, I see your initial question. "*" is greedy match (match ... Pin
Andreas Gieriet28-Feb-12 17:48
memberAndreas Gieriet28-Feb-12 17:48 
Ah, I see your initial question. "*" is greedy match (match as much as you can) while "*?" is non-greedy match (match as little as you can).

See Quantifiers.

Finally, the following results on my machine (count 2.500.000 lines):
1. IndexOf = 237 [ms]
2. Regex("\n") = 1851 [ms]
3. Regex("^.*$") = 2282 [ms]
4. Regex("^.*?$") = 5327 [ms]
5. Regex("$") = 10100 [ms]

The crazy thing is, that Regex("\n") is 5 times faster than Regex("$")!

And I see that I did compare the wrong items in my initial posting: I did assume that "$" and "\n" are equivalent... Frown | :-(
GeneralRe: Let me rephrase my question. Assuming that your syntax (@"^.... Pin
Ronald M. Martin28-Feb-12 4:50
memberRonald M. Martin28-Feb-12 4:50 
GeneralI don't understand the use of the question mark (?) in this ... Pin
Ronald M. Martin27-Feb-12 18:27
memberRonald M. Martin27-Feb-12 18:27 
GeneralRe: I simple measured a difference of a factor of about two. No ... Pin
Andreas Gieriet27-Feb-12 22:21
memberAndreas Gieriet27-Feb-12 22:21 
GeneralCan anyone please explain why LinesCount2 is so slow? I thou... Pin
Miller426-Feb-12 7:54
memberMiller426-Feb-12 7:54 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.151126.1 | Last Updated 28 Feb 2012
Article Copyright 2012 by Andreas Gieriet
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid