Click here to Skip to main content
Click here to Skip to main content

Tagged as

Counting Lines in a String

, 18 Jun 2012 CPOL
Rate this:
Please Sign up or sign in to vote.
It seems like an obvious requirement, but the .NET framework will not count occurrences of a character in a string. It's easy to do, but which way is the quickest?

Introduction

I needed to know, so I wrote a quick, brute-force-and-ignorance (henceforth known as "BF+I") method to do it:

static long LinesCount(string s)
    {
    long count = 0;
    int position = 0;
    while ((position = s.IndexOf('\n', position)) != -1)
        {
        count++;
        position++;         // Skip this occurrence!
        }
    return count;
    }

A quick test, and it works fine.

But then, I thought - isn't that a bit...well...rubbish? Isn't there another way to do it?

And sure, there is: I can think of two off the top of my head: Regex and Linq. So, I knocked up more methods.
And then I got a lot of feedback suggesting other ways, so I have re-worked this tip to reflect those.
A quick-n-dirty test rig to check them:

            string s = File.ReadAllText(@"D:\Temp\MyLargeTextFile.txt");
            long index = LinesCountIndexOf(s);
            long regex = LinesCountRegex(s);
            long linq= LinesCountLinq(s);
            Console.WriteLine("{0}:{1}:{2}", index, regex, linq);
            Stopwatch s0 = new Stopwatch();
... repeated, one per test
            Stopwatch sn = new Stopwatch();
            s0.Start();
            for (int i = 0; i < 100; i++)
                {
                index = LinesCountIndexOf(s);
                }
            s0.Stop();
            sb.AppendFormat("IndexOf     : {0,06}\n", s0.ElapsedMilliseconds);
            tbResults.Lines = sb.ToString().Split('\n');
            sl.Start();
... Identical blocks, different method names.

The input file is 1.64MB, and contains around 23K lines. Please note that this file is read once for the entire test run, and is just used to give a significant amount of data for the tests to run against. A small amount of data would take us below the timer resolution of 1 millisecond and distort the result. This is not a test of how fast I can count lines in a file!

The methods tried so far:

        static long LinesCountIndexOf(string s)
            {
            long count = 0;
            int position = 0;
            while ((position = s.IndexOf('\n', position)) != -1)
                {
                count++;
                position++;         // Skip this occurrence!
                }
            return count;
            }
        static Regex r = new Regex("\n", RegexOptions.Multiline);
        static long LinesCountRegex(string s)
            {
            MatchCollection mc = r.Matches(s);
            return mc.Count;
            }
        static long LinesCountLinq(string s)
            {
            return (from ch in s
                    where ch == '\n'
                    select ch).Count();
            }
        static long LinesCountSplit(string s)
            {
            return (s.Split(new char[] { '\n' })).Length;
            }
 
        static long LinesCount2(string s)
            {
            long count = 0;
            int posMax = s.Length;
 
            for (int position = 0; position < posMax; )
                if (s[position++] == '\n')
                    count++;
 
            return count;
            }
 
        static long LinesCount3(string s)
            {
            long count = 0;
            int posMax = s.Length;
            char[] a = s.ToCharArray();
 
            for (int position = 0; position < posMax; )
                if (a[position++] == '\n')
                    count++;
 
            return count;
            }
 
        static long LinesCount4(string s)
            {
            long count = 0;
            int position = -1;
            while ((position = s.IndexOf('\n', position + 1)) != -1)
                {
                count++;
                }
            return count;
            }
 
        static long LinesCount5(string s)
            {
            long count = 0;
            int posMax = s.Length;
            char[] a = s.ToCharArray();
 
            foreach (char c in a)
                if (c == '\n')
                    count++;
 
            return count;
            }
 
        static long LinesCount6(string s)
            {
            long count = 0;
 
            foreach (char c in s)
                if (c == '\n')
                    count++;
 
            return count;
            }
 
        static long LinesCount7(string s)
            {
            return s.Length - s.Replace("\n", "").Length;
            }
        static long LinesCountAhoCorasick(string s)
            {
            IStringSearchAlgorithm searchAlg = new StringSearch();
            searchAlg.Keywords = new string[] { "\n" };
            StringSearchResult[] results = searchAlg.FindAll(s);
            return results.Length;
            }
        static long LinesCountLinqCount(string s)
            {
            return s.Count(c => (c == '\n'));
            }
        unsafe static long LinesCountUnsafePtr(string s)
            {
            long lineCount = 1;
            fixed (char* pchar = s)
                {
                char* p = pchar;
                for (; *p != '\0'; p++)
                    {
                    if (*p == '\n') lineCount++;
                    }
                }
            return lineCount;
            }

The addition of a StreamReader version, which doesn't really fit, but people seem to be suggesting file based solutions:

            ssr.Start();
            for (int i = 0; i < 100; i++)
                {
                StreamReader sr = new StreamReader(@"D:\Temp\MyLargeTextFile.txt");
                index = LinesCountStream(sr);
                sr.Dispose();
                }
            ssr.Stop();

...

        static long LinesCountStream(StreamReader sr)
            {
            long count = 0;
            while (sr.ReadLine() != null)
                {
                count++;
                }
            return count;
            }

They all return the same number of lines, which is fine. But the timing tests were interesting:

Test    ElapsedMilliseconds
IndexOf     :    167
Index (2)   :    953
Index (3)   :   1026
Index (4)   :    167
Index (5)   :   1211
Index (6)   :   1544
Index (7)   :   1088
Split       :   1103
Regex       :   2517
Linq (Expl) :   3576
Linq Count  :   3531
Stream Read :    976
Unsafe Ptr  :    633

I had expected Regex to be much, much faster! The serious difference between Linq and Regex was a surprise, but I was really surprised when BF+I beat them both hands down! The Split method didn't surprise me by being slower than Index, as it means .NET has to allocate 23K strings each time. To finish it all off, I rewrote the BF+I version as an Extension Method, and made it assume that a string with no newlines was a single line. As an extension method, it was within a millisecond of the original times, as you would expect.

static class ExtensionMethods
    {
    /// <summary>
    /// Returns the number of lines in a string
    /// </summary>
    /// <param name="s"></param>
    /// <returns></returns>
    public static long Lines(this string s)
        {
        long count = 1;
        int position = 0;
        while ((position = s.IndexOf('\n', position)) != -1)
            {
            count++;
            position++;         // Skip this occurrence!
            }
        return count;
        }
    }

BTW: Adding RegexOptions.Compiled to the regular expression brought it down to:

Test    ElapsedMilliseconds
BF+I     170
Split   1089
Regex   2063
Linq    3583

So it is WELL worth adding it to all your Regexes if you can.

Revision History

  • [Edit]Split method suggested by Grasshopper.iics[^] added -OriginalGriff[/edit]
  • [Edit2]I forgot to add the credit for the split method D'Oh! | <img src= " /> - OriginalGriff[/edit2]
  • [Edit3]Deeksha Shenoy[/Edit3]
  • [Edit4]Inclusion of more variants as suggested by: unruled boy[^] (Aho-Corasick, awaiting method) Grasshopper.iics[^] (Split) akemper[^] (Variations on BF+I, Replace, Split) George Swan[^] (LinqCount) Thanks to all these people (and others if I haven't updated this yet)! -OriginalGriff [/Edit4]
  • [Edit5]I forgot to introduce the three letter acronym "BF+I" when I first defined it as "Brute-Force-and-Ignorance". My thanks to fuzz_ball for pointing this out. Sorry for any confusion caused... - OriginalGriff[/Edit5]
  • [Edit6]Jon Bellamy[^] suggested a StreamReader, which I have included despite it being file related, rather than string. Please note that the test is different, as StreamReader does not support Seek or Rewind so has to be instantiated new each time. (Removing the Dispose only took 10 milliseconds off the time) Roman Shero[^] suggested the unsafe version Extra text added to the file read to make it clear why I load a file at all - OriginalGriff[/Edit6]
  • [Edit7]Added download for test rig (it's not pretty!), fixed formatting of code section (don't know what happened to it there), minor spelling mistake fixed, and Revision history reformatted - OriginalGriff[/Edit7]

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

OriginalGriff
CEO
Wales Wales
Born at an early age, he grew older. At the same time, his hair grew longer, and was tied up behind his head.
Has problems spelling the word "the".
Invented the portable cat-flap.
Currently, has not died yet. Or has he?
Follow on   Google+

Comments and Discussions

 
GeneralMy vote of 5 PinmemberS. M. Ahasan Habib18-Feb-13 18:08 
GeneralMy vote of 5 PinmemberVadimAlk9-Jul-12 19:07 
GeneralCombination of IndexOf an Parallel.ForEach [modified] Pinmemberbunnyboy0158-Jul-12 11:55 
That one gives even a faster result.
 
private static long LinesCountIndexOfThreaded(string s)
        {
            long total = 0;
            const char newline = '\n';
 
            Parallel.ForEach(Partitioner.Create(0, s.Length, s.Length /(Environment.ProcessorCount * 2)),
            () => 0L, (range, state, count) =>
            {
                int position = range.Item1;
                while ((position = s.IndexOf(newline, position)) != -1 && (position < range.Item2))
                {
                    count++;
                    position++;
                }
                return count;
            }, localTotal => Interlocked.Add(ref total, localTotal));
 
            return total;
        }


modified 8-Jul-12 17:09pm.

AnswerLast minute PinmemberYvesDaoust25-Jun-12 1:34 
QuestionVery sad ! PinmemberYvesDaoust25-Jun-12 0:53 
AnswerRe: Very sad ! PinmvpOriginalGriff25-Jun-12 1:20 
QuestionRegex comparison to precompiled would be nice PinmemberSteav18-Jun-12 2:43 
AnswerRe: Regex comparison to precompiled would be nice PinmvpOriginalGriff18-Jun-12 3:25 
GeneralRe: Thanks. PinmemberMiller427-Feb-12 2:25 
GeneralRe: If you're going to save off the Regex and use it more than o... PinmemberAndrew Rissing9-Jan-12 5:56 
GeneralRe: In my case, I was writing a massively parallel find routine,... PinmemberJohn Brett9-Jan-12 5:26 
GeneralReason for my vote of 5 Nice one. Pinmembernikhi _singh26-Feb-12 18:43 
GeneralCan anyone please explain why LinesCount2 is so slow? I thou... PinmemberMiller426-Feb-12 7:57 
GeneralRe: If you check the implementaiton of the IndexOf() method in R... PinmemberAndreas Gieriet26-Feb-12 12:41 
GeneralReason for my vote of 5 Great analysis! PinmemberAndreas Gieriet26-Feb-12 4:53 
GeneralWhat does BF+I stand for? It doesn't appear related to the m... PinmemberQwertie23-Jan-12 21:58 
GeneralRe: "brute-force-and-ignorance" - it's in the first line of the ... PinmvpOriginalGriff23-Jan-12 22:35 
GeneralReason for my vote of 5 nicely illustrated PinmemberRavi Gupta from India17-Jan-12 9:22 
GeneralLine terminations are found closer to the end of a string th... PinmemberBob Van Wagner17-Jan-12 3:54 
GeneralReason for my vote of 5 Really excellent thorough research a... PinmemberBillWoodruff16-Jan-12 23:42 
GeneralReason for my vote of 5 Wonderful! PinmemberAlex Essilfie16-Jan-12 23:15 
GeneralJust a general writing comment, when you toss in a TLA it's ... Pinmemberfuzz_ball16-Jan-12 10:46 
GeneralRe: :O Sorry - I forgot...fixed PinmvpOriginalGriff16-Jan-12 22:19 
GeneralIt looks like Linq is built for comfort, not speed. PinmemberGeorge Swan11-Jan-12 7:00 
GeneralOK, so how about a less ignorant version that ignores newlin... PinmemberPIEBALDconsult9-Jan-12 5:29 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.1411028.1 | Last Updated 18 Jun 2012
Article Copyright 2012 by OriginalGriff
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid