65.9K
CodeProject is changing. Read more.
Home

Don't count spaces when counting words.

starIconstarIconstarIconstarIconstarIcon

5.00/5 (2 votes)

Oct 25, 2011

CPOL
viewsIcon

8310

I also use a Regex expression to count words, which returns the same number of words as MS Word. I wrap the Regular Expression in a String extension method to make it easy to use.public static class StringExtensions{ /// /// WordCounts Regular Expression /// ...

I also use a Regex expression to count words, which returns the same number of words as MS Word. I wrap the Regular Expression in a String extension method to make it easy to use.

public static class StringExtensions
{
  /// <summary>
  /// WordCounts Regular Expression
  /// </summary>
  private const string WordCountRegex = @"[^\s!?¡¿\-\–]+";

  /// <summary>
  /// Static WordCounts Regular Expression Object
  /// </summary>
  private static Regex regexWordCounts = new Regex(WordCountRegex, 
             RegexOptions.Compiled | RegexOptions.Multiline);
  
  /// <summary>
  /// Returns the number of words in a given <paramref name="sentence" />
  /// </summary>
  /// <param name="sentence">Text in which to count words</param>
  /// <returns>Number of words, or zero if regular expression failed</returns>
  public static int WordCounts(this string sentence)
  {
    try
    {
      MatchCollection matchCollection = regexWordCounts.Matches(sentence);
      return matchCollection.Count;
    }
    catch
    {
      return 0;
    }
  }
}

Taking the samples above, this would give the following:

string input = 
  "The total number of words       \t        this sentence is 10.";
int wordCounts = input.WordCounts(); //Returns 9

input = "Mr O'Brien-Smith arrived at 8.30 and spent \t $1,000.99";
int wordCounts = input.WordCounts(); //Returns 9

Hope this helps.