Click here to Skip to main content
Click here to Skip to main content

Counting lines in a string

By , 28 Feb 2012
 
Great analysis!
 
I found out that Regex can be accelerated by a factor of about two.
 
Instead of
new Regex(@"\n", RegexOptions.Compiled|RegexOptions.Multiline);
 
you can speed up by using:
new Regex(@"^.*?$", RegexOptions.Compiled|RegexOptions.Multiline);
 
But admittedly, nothing beats the native methods (IndexOf).
 
[EDIT]
My statement above is wrong: I did compare "$" (and not "\n") against "^.*?".
The measurments show that "\n" is the fastest of all Regex matches, while "$" is the slowest (5 times slower than "\n"...!).
That's a real surprise to me.
 
The comparison:

Regex Match[ms] for 2.500.000 linesRegexOptions
\n1847Compiled|Singleline
\n1851Compiled|Multiline
^.*$2282Compiled|Multiline
^.*?$5327Compiled|Multiline
$10100Compiled|Multiline
 
As a comparison: IndexOf('\n') only takes 237 [ms].
 
[/EDIT]

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Andreas Gieriet
eXternSoft GmbH
Switzerland Switzerland
Member
I feel comfortable on a variety of systems (UNIX, Windows, cross-compiled embedded systems, etc.) in a variety of languages, environments, and tools.
I have a particular affinity to computer language analysis, testing, as well as quality management.
 
More information about what I do for a living can be found at my LinkedIn Profile and on my company's web page (German only).

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralRe: So Regex("^.*?$") is not faster than Regex("\n"), as you ori... PinmemberRonald M. Martin29 Feb '12 - 7:06 
So Regex("^.*?$") is not faster than Regex("\n"), as you originally stated and Regex("^.*?$") is also slower than Regex("^.*$").
 
The thing that originally confused me was the use of "?" as a trailing qualifier applied to quantifiers. It's syntactically odd in that it requires look ahead, much as two-character operators in C# or C++. This type of syntax is generally handled lexically, as if the character pair represents a unique character (token) in the grammar. Reading the documentation from top to bottom, I stopped when I matched "*" and "?" separately.
 
Thanks for clearing this up for me.
GeneralRe: Ah, I see your initial question. "*" is greedy match (match ... PinmemberAndreas Gieriet28 Feb '12 - 16:48 
GeneralRe: Let me rephrase my question. Assuming that your syntax (@"^.... PinmemberRonald M. Martin28 Feb '12 - 3:50 
GeneralI don't understand the use of the question mark (?) in this ... PinmemberRonald M. Martin27 Feb '12 - 17:27 
GeneralRe: I simple measured a difference of a factor of about two. No ... PinmemberAndreas Gieriet27 Feb '12 - 21:21 
GeneralCan anyone please explain why LinesCount2 is so slow? I thou... PinmemberMiller426 Feb '12 - 6:54 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130516.1 | Last Updated 28 Feb 2012
Article Copyright 2012 by Andreas Gieriet
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid