After finishing my XSearch article, I decided that I wanted to support quoted strings, just like web search engines. At first I planned to build in support for quoted strings; then I realized that this really should be generalized, because I could see other situations where I would want to use it.
So I decided to make a separate function to handle extracting quoted tokens, and at the same time do something about how painful it is to use strtok (i.e., looping to get all tokens). I wrote down a list of what I would like XTokenString function to do:
- No looping! Tokens must be returned in a
CStringArray. This allows you to verify number of tokens found immediately. Another input parameter lets you specify maximum number of tokens to be returned, to eliminate runaway looping in case of bad data.
- String must not be modified (e.g., by inserting nul characters). This allows input parameter to be
const, and avoids need for casting.
- Specify delimiters, just like strtok.
- Optionally trim leading/trailing whitespace from returned tokens.
- Optionally handle quoted tokens. To take example of web search engines, double quotes are used to indicate exact matches, and so may include characters specified as token delimiters.
- Optionally handle escaped characters, like \" (or any of the token delimiter characters).
- Optionally return empty tokens. For example, for CSV record, where all values are not present:
213-555-1234where empty tokens are returned for missing fields (address, city, state, zip). It is also important to handle special cases of leading/trailing empty token, and several consecutive empty tokens.
The implementation of XTokenString assumes that this option will not be used when delimiters are whitespace. Otherwise, two consecutive spaces would produce an empty token in returned array, which is probably not what you want.
XTokenString In Action
The demo app allows you to compare behavior of XTokenString
with that of strtok
. You can choose a built-in string or enter your own. The four checkboxes allow you to control how XTokenString
will parse string.
XTokenString() - Parse string to extract tokens.
// Purpose: XTokenString parses lpszString to find tokens delimited by
// lpszDelimiters. The tokens are returned in the saTokens
// array, until a maximum of nMaxTokens are found (0 = no max.).
// Options may be used to select handling of escape characters,
// double quotes, and empty tokens.
// Parameters: lpszString - address of string containing tokens
// lpszDelimiters - set of delimiter characters
// saTokens - array to hold returned tokens
// nMaxTokens - max no. of tokens to return; default
// is 0, meaning no limit.
// bTrimToken - TRUE = trim left and right whitespace
// from returned token. Default = FALSE.
// bEnableEscapedChars - TRUE = enable handling
// of escape character
// sequences such as \". Default = FALSE.
// bEnableDoubleQuote - TRUE = enable handling of double quotes.
// Default = FALSE.
// bReturnEmptyToken - TRUE = return empty tokens (delimiters
// with no content). Used mostly for
// CSV-type records, where number
// of fields is fixed. Default = FALSE.
// Returns: int - number of tokens returned; 0 if error.
How To Use
To integrate XTokenString
into your app, you first need to add following files to your project:
For details on how to use XTokenString, refer to code in XTokenStringTestDlg.cpp.
Version 1.0 - 2005 August 2
This software is released into the public domain. You are free to use it in any way you like, except that you may not sell this source code. If you modify it or extend it, please to consider posting new code here for everyone to share. This software is provided "as is" with no expressed or implied warranty. I accept no liability for any damage or loss of business that this software may cause.