String Tokenizer class






4.75/5 (5 votes)
Apr 9, 2000

81793

1076
A customizable string tokenizer.
Introduction
Here is a customizable string tokenizer class. You can attach it to a CString
object and you can tokenize the respective string. You can customize the tokenizing at a very high level, you can set up which characters can be used for words, whitespace chars, numbers and so on. If you are familiar with the StreamTokenizer
class from Java, you already know how to use this class.
The tokenizer class is called CStringTokenizer
. By default, this class is initialized with standard parsing parameters, but you can reset that and customize it the way you want it to work. For using the tokenizer class, the string you pass has to be terminated with FILE_EOF
char. This is because the algorithm can be used for streams. (If you modify the class, you can adapt it to use a stream instead of a string.) For example, a simple parsing can be as:
m_sSampleString += FILE_EOF; CStringTokenizer tokenizer(m_sSampleString); while (TT_EOF != tokenizer.NextToken ()) { m_sResultString+= tokenizer.GetStrValue () + "\r\n"; } // eliminate the added EOF to the end of the string m_sSampleString = m_sSampleString.Left (m_sSampleString.GetLength ()-1);
You need to include: StringTokenizer.h.
The class's public interface is:
public: // Public functions // Constructor, specify the string asociated CStringTokenizer(CString& string); virtual ~CStringTokenizer(); // Destructor double GetNumValue(); // Get the numeric value of the token void PascalComments(BOOL bFlag); // Enable / disable Pascal comments virtual CString GetStrValue(); // Get the str value void QuoteChar(int ch); // Specify the quote char int LineNo(); // Get the curent line number virtual void PushBack(); // Get's back one token (can go back only once) virtual int NextToken(); // Get the next token void LowerCaseMode(BOOL bFlag); // enable/disable case sensitive void SlSlComments(BOOL bFlag); // // coments void SlStComments(BOOL bFlag); // /* comments void EolIsSignificant(BOOL bFlag); // consider EOL as token or not void ParseNumbers(); // set upt the parsing so that will parse characters // Character Type Setting Functions // Reset the syntax so that characters are asigned no special meanings void ResetSyntax(); // Set the chars in the range as chars that can be used for words void WordChars(int cLow, int cHi); // Set the chars in the range as whitespace chars void WhiteSpaceChars(int cLow, int cHi); // Set the chars in the range as ordinary chars void OrdinaryChars(int cLow, int cHi); // Set the char as ordinary void OrdinaryChar(int ch); // Set the char as a char used for commenting void CommentChar(int ch);
Notes
The class is using the specified string directly, not a copy of it.
If you find any bugs, please send me a report to: zolyfarkas@mail.usa.com.
That's it!