Click here to Skip to main content
Click here to Skip to main content

Don't count spaces when counting words.

By , 17 Oct 2011
Rate this:
Please Sign up or sign in to vote.
Over the last couple of days I've seen numerous examples of people posting about how to count words in a sentence. Disturbingly, these postings recommend suggest counting the number of spaces in the sentence and use that as the basis of a word count.

You may be asking why this is a problem. Well, consider the following sentence:

The total number of words    \t     in this sentence,is 10.

As you can see, simply counting spaces isn't going to work. There's the special characters (the \t) to take care of, the multiple spaces, and the words separated by a comma without a space. So, if counting spaces doesn't work, what does? The answer is to use a regular expression, and you are going to love how simple it is. There's a simple regular expression that matches words, and takes care of all the guff demonstrated above; all you need to match a word is use \w+. Here's a quick sample:
Regex regex = new Regex("\\w+",  RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.CultureInvariant);
string input = "The total number of words       \t        this sentence is 10.";
MatchCollection match = regex.Matches(input);


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Pete O'Hanlon
United Kingdom United Kingdom
A developer for over 30 years, I've been lucky enough to write articles and applications for Code Project as well as the Intel Ultimate Coder - Going Perceptual challenge. I live in the North East of England with 2 wonderful daughters and a wonderful wife.
I am not the Stig, but I do wish I had Lotus Tuned Suspension.
Follow on   Twitter   Google+

Comments and Discussions

QuestionDefinition of word? PinmemberAndreas Gieriet12-Mar-12 22:42 
AnswerRe: Definition of word? Pinmemberpwasser19-Mar-12 15:23 
GeneralRe: Definition of word? PinmemberAndreas Gieriet19-Mar-12 20:38 
GeneralRe: Definition of word? Pinmemberpwasser19-Mar-12 20:50 
GeneralRe: Definition of word? PinmemberAndreas Gieriet19-Mar-12 21:31 
GeneralRe: Definition of word? Pinmemberpwasser19-Mar-12 22:50 
GeneralRe: Definition of word? PinmemberAndreas Gieriet20-Mar-12 0:35 
GeneralRe: Definition of word? PinmemberAndreas Gieriet20-Mar-12 4:32 
The following is the best I could find so far (to my intuitive understanding of "word"):
t=> Regex.Matches(t, @"(?:\d[\.,:]\d|\w[-\.']\w|\w)+").Count,  // M#2
The resulting table is:
tip A#1 A#2 A#3 \S+ M#1 M#2 <-- Text
 10   9   9   9   9   9  10  <-- The total number of words           in this sentence,is 10.
 13   8   9   8   8   8   8  <-- Mr O'Brien-Smith arrived at 8.30 and spent      $1,000.99
  3   1   1   1   1   1   1  <-- $123,000.00
  7  10  10  10  10   7   7  <-- incidentally , and might I say ( without prejudice )
 15   7   8   7   7   7   7  <--  (e.g., i.e., ad-hoc, 12,345.00, 10:45:00, didn't, etc.).
M#2 is the only one that matches my expectations for all sample strings. But as said, it's first of all not an absolute measure and second, it is pure heuristic - it may be sufficient in many, but not all cases...
GeneralReason for my vote of 5 Very nice. PinmemberUSABebopKid18-Oct-11 5:20 
General@Pete, Have updated the Regex code part, please take a look ... Pinmemberzenwalker198517-Oct-11 22:59 
GeneralReason for my vote of 5 I did consider your example in the p... PinmemberEddy Vluggen15-Oct-11 10:41 
GeneralExcellent, Pete. I've done something similar in the past, bu... PinsubeditorWalt Fair, Jr.22-Aug-11 17:13 
GeneralReason for my vote of 5 nice tip and my learn something new ... PinmemberSimon_Whale11-Oct-10 5:50 
QuestionChance to count the characters in same way?? PinmemberDanielLey28-Sep-11 11:12 
GeneralNice point Pinmemberr verma19-Mar-10 7:34 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140415.2 | Last Updated 18 Oct 2011
Article Copyright 2010 by Pete O'Hanlon
Everything else Copyright © CodeProject, 1999-2014
Terms of Use
Layout: fixed | fluid