Click here to Skip to main content
Click here to Skip to main content
Go to top

XTokenString - a function to extract tokens from a string

, 4 Aug 2005
Rate this:
Please Sign up or sign in to vote.
XTokenString extracts tokens from a string, and returns a CStringArray that contains tokens. A token is defined by specified delimiters; double-quotes may optionally group multiple words into a single token, and an option is provided for handling escaped characters.

Introduction

After finishing my XSearch article, I decided that I wanted to support quoted strings, just like web search engines. At first I planned to build in support for quoted strings; then I realized that this really should be generalized, because I could see other situations where I would want to use it.

So I decided to make a separate function to handle extracting quoted tokens, and at the same time do something about how painful it is to use strtok (i.e., looping to get all tokens). I wrote down a list of what I would like XTokenString function to do:

  • No looping! Tokens must be returned in a CStringArray. This allows you to verify number of tokens found immediately. Another input parameter lets you specify maximum number of tokens to be returned, to eliminate runaway looping in case of bad data.
  • String must not be modified (e.g., by inserting nul characters). This allows input parameter to be const, and avoids need for casting.
  • Specify delimiters, just like strtok.
  • Optionally trim leading/trailing whitespace from returned tokens.
  • Optionally handle quoted tokens. To take example of web search engines, double quotes are used to indicate exact matches, and so may include characters specified as token delimiters.
  • Optionally handle escaped characters, like \" (or any of the token delimiter characters).
  • Optionally return empty tokens. For example, for CSV record, where all values are not present:
    Dietrich,Hans,,,,,213-555-1234
    should return:
    Dietrich
    Hans
    <empty token>
    <empty token>
    <empty token>
    <empty token>
    213-555-1234
    where empty tokens are returned for missing fields (address, city, state, zip). It is also important to handle special cases of leading/trailing empty token, and several consecutive empty tokens.

    The implementation of XTokenString assumes that this option will not be used when delimiters are whitespace. Otherwise, two consecutive spaces would produce an empty token in returned array, which is probably not what you want.

XTokenString In Action

The demo app allows you to compare behavior of XTokenString with that of strtok. You can choose a built-in string or enter your own. The four checkboxes allow you to control how XTokenString will parse string.

screenshot

XTokenString Function

  • XTokenString() - Parse string to extract tokens.
    /////////////////////////////////////////////////////////////////////////
    //
    // XTokenString()
    //
    // Purpose:     XTokenString parses lpszString to find tokens delimited by
    //              lpszDelimiters.  The tokens are returned in the saTokens
    //              array, until a maximum of nMaxTokens are found (0 = no max.).
    //              Options may be used to select handling of escape characters,
    //              double quotes, and empty tokens.
    //
    // Parameters:  lpszString          - address of string containing tokens
    //              lpszDelimiters      - set of delimiter characters
    //              saTokens            - array to hold returned tokens
    //              nMaxTokens          - max no. of tokens to return;  default
    //                                    is 0, meaning no limit.
    //              bTrimToken          - TRUE = trim left and right whitespace
    //                                    from returned token.  Default = FALSE.
    //              bEnableEscapedChars - TRUE = enable handling 
    //                                    of escape character
    //                                    sequences such as \".  Default = FALSE.
    //              bEnableDoubleQuote  - TRUE = enable handling of double quotes.
    //                                    Default = FALSE.
    //              bReturnEmptyToken   - TRUE = return empty tokens (delimiters
    //                                    with no content).  Used mostly for
    //                                    CSV-type records, where number 
    //                                    of fields is fixed. Default = FALSE.
    //
    // Returns:     int - number of tokens returned;  0 if error.
    //
    

How To Use

To integrate XTokenString into your app, you first need to add following files to your project:
  • XTokenString.cpp
  • XTokenString.h

For details on how to use XTokenString, refer to code in XTokenStringTestDlg.cpp.

Revision History

Version 1.0 - 2005 August 2

  • Initial public release

Usage

This software is released into the public domain. You are free to use it in any way you like, except that you may not sell this source code. If you modify it or extend it, please to consider posting new code here for everyone to share. This software is provided "as is" with no expressed or implied warranty. I accept no liability for any damage or loss of business that this software may cause.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Hans Dietrich
Software Developer (Senior) Hans Dietrich Software
United States United States
I attended St. Michael's College of the University of Toronto, with the intention of becoming a priest. A friend in the University's Computer Science Department got me interested in programming, and I have been hooked ever since.
 
Recently, I have moved to Los Angeles where I am doing consulting and development work.
 
For consulting and custom software development, please see www.hdsoft.org.






Comments and Discussions

 
GeneralCool! PinmemberSam Levy4-Aug-05 11:58 
Questionwhat can i say... Pinmemberbevpet4-Aug-05 10:57 
AnswerRe: what can i say... PinmentorHans Dietrich19-Apr-11 23:57 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.140926.1 | Last Updated 4 Aug 2005
Article Copyright 2005 by Hans Dietrich
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid