65.9K
CodeProject is changing. Read more.
Home

String Tokenizer class

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.75/5 (5 votes)

Apr 9, 2000

viewsIcon

81793

downloadIcon

1076

A customizable string tokenizer.

Introduction

Here is a customizable string tokenizer class. You can attach it to a CString object and you can tokenize the respective string. You can customize the tokenizing at a very high level, you can set up which characters can be used for words, whitespace chars, numbers and so on. If you are familiar with the StreamTokenizer class from Java, you already know how to use this class.

The tokenizer class is called CStringTokenizer. By default, this class is initialized with standard parsing parameters, but you can reset that and customize it the way you want it to work. For using the tokenizer class, the string you pass has to be terminated with FILE_EOF char. This is because the algorithm can be used for streams. (If you modify the class, you can adapt it to use a stream instead of a string.) For example, a simple parsing can be as:

m_sSampleString += FILE_EOF;
CStringTokenizer tokenizer(m_sSampleString);
while (TT_EOF != tokenizer.NextToken ())
{
    m_sResultString+= tokenizer.GetStrValue () + "\r\n";
}

// eliminate the added EOF to the end of the string
m_sSampleString = m_sSampleString.Left (m_sSampleString.GetLength ()-1);

You need to include: StringTokenizer.h.

The class's public interface is:

public:
  // Public functions
  
  // Constructor, specify the string asociated
  CStringTokenizer(CString& string);
  virtual ~CStringTokenizer();    // Destructor
  double GetNumValue(); // Get the numeric value of the token
  void PascalComments(BOOL bFlag); // Enable / disable Pascal comments
  virtual CString GetStrValue(); // Get the str value
  void QuoteChar(int ch); // Specify the quote char
  int LineNo(); // Get the curent line number
  virtual void PushBack(); // Get's back one token (can go back only once)
  virtual int NextToken(); // Get the next token
  void LowerCaseMode(BOOL bFlag); // enable/disable case sensitive
  void SlSlComments(BOOL bFlag); // // coments
  void SlStComments(BOOL bFlag); // /* comments
  void EolIsSignificant(BOOL bFlag); // consider EOL as token or not
  void ParseNumbers(); // set upt the parsing so that will parse characters
  
  // Character Type Setting Functions
  
  // Reset the syntax so that characters are asigned no special meanings
  void ResetSyntax(); 
  // Set the chars in the range as chars that can be used for words
  void WordChars(int cLow, int cHi); 
  // Set the chars in the range as whitespace chars
  void WhiteSpaceChars(int cLow, int cHi);
  // Set the chars in the range as ordinary chars  
  void OrdinaryChars(int cLow, int cHi); 
  // Set the char as ordinary
  void OrdinaryChar(int ch);
  // Set the char as a char used for commenting 
  void CommentChar(int ch); 

Notes

The class is using the specified string directly, not a copy of it.

If you find any bugs, please send me a report to: zolyfarkas@mail.usa.com.

That's it!