Click here to Skip to main content
13,146,395 members (49,805 online)
Click here to Skip to main content
Add your own
alternative version


10 bookmarked
Posted 5 Dec 2006

Template String Tokenizer

, 5 Jan 2007
Rate this:
Please Sign up or sign in to vote.
A template string tokenizer class that works with both CStringArray and CStringList.


Often we need to parse a string and store the fragments in an array or a list. For example, we might need to parse a line from a comma separated value (CSV) file or NMEA string. MFC provides the CStringArray and CStringList classes for handling arrays and lists of strings, respectively. The idea of this submission is simple: a tokenizer class that would inherit publicly either from CStringArray or CStringList depending on a template parameter. Once the string is tokenized, the calling code can access the tokens through direct calls to the methods of the parent collection class.

Parameterized inheritance

OR-ed template inheritance (UML)

Since both of these collection classes inherit from CObject, they support run time type identification (RTTI), which prevents the CStringTokenizer class from inheriting from other classes.

template <class T> // T is either a CStringArray or CStringList
class CStringTokenizer : public T
 UINT Tokenize(CString& strSrc, LPCTSTR pStrDelimit, 
      LPCTSTR strTerminate = '\0', UINT iOffset = 0);
 virtual ~CStringTokenizer() {;} 
 void AddOptions(OPTIONS iOptions)  {m_iFlags |= iOptions;}
 void RemoveOptions(OPTIONS iOptions) {m_iFlags &= ~iOptions;}
 // helper function with 2 seperate implementations
 // (template specialization) for CStringArray and CStringList
 void Add(LPCTSTR pStrNew);
 UINT m_iFlags;   // tokenization options
 CMutex m_mtxTokenize;  // serves to make the Tokenize() method non-reentrant

// PURPOSE:   Initialize the new CStringTokenizer object.
// Check the type of the parent class T.
// PRECONDITIONS: T is either CStringArray or CStringList
template <class T>
CStringTokenizer<t />::CStringTokenizer()
 : m_mtxTokenize(FALSE)  // the mutex is free for the taking
 CRuntimeClass* pRTC = T::GetRuntimeClass();
 if (RUNTIME_CLASS(CStringArray) == pRTC) return// admissible parent class
 if (RUNTIME_CLASS(CStringList) == pRTC)  return// same as above

Template specialization helps to smooth-out the difference between CStringArray and CStringList

The addition of a new token to a collection is the only place where the tokenizer code has to interact with the parent collection class. Unfortunately, between CStringArray and CStringList, there isn't a method with a common name for adding a new member to a collection. CStringList has AddHead() and AddTail(), while CStringArray has Add().

At first, I tried to fix this problem with RTTI built into the MFC framework. I tried to write code which would choose an appropriate method at run-time. This approach failed to compile. Then, I was suggested to try template specialization, and it worked! I've declared my own Add() method and added two separate implementations for the cases when CStringArray or CStringList is a parent.

// PURPOSE: A helper function, which does template specialization
// and resolves the difference between CStringArray and CStringlist
template <>
CStringTokenizer<CStringArray>::Add(LPCTSTR pStrNew)
// [in] a string to be added to array
  CStringArray::Add(pStrNew); // can throw CMemoryException
 CATCH(CMemoryException, pExc)
  THROW(pExc);    // rethrow to the calling code

// PURPOSE: A helper function, which does template
// serialization and resolves the difference
// between CStringArray and CStringlist
template <>
CStringTokenizer<CStringList>::Add(LPCTSTR pStrNew)
// [in] a string to be added to array
  CStringList::AddTail(pStrNew); // can throw CMemoryException
 CATCH(CMemoryException, pExc)
  THROW(pExc);    // rethrow to the calling code


Call the Tokenize(...) function to tokenize a string. After this call, you can deal with the tokens through the methods of CStringArray and CStringList. Note that the new tokens are appended to the collection, and Tokenize(...) doesn't remove the old tokens.

// PURPOSE:   Tokenize the string and APPEND the tokens into the parent collection class
// POSTCONDITIONS: The original string remains intact. The method can throw CMemoryException.
template <class T>
UINT // Offset of the next character after the terminator.
     // The return value and the iOffset parameter
     // can be used for parsing one sting with successive calls to Tokenize().
CStringTokenizer<T>::Tokenize(CString& strSrc,   // [in] a string that will be tokenized. 
         LPCTSTR pStrDelimit, // [in] a set of delimiting characters
         LPCTSTR pStrTerminate, // [in] a set of terminating characters, 
                                // or a terminating sequence, depending on options
         UINT  iOffset)  // [in] Tokenization will start at this offset. Defaulted to zero



If there are two delimiters in a row, the token between them is an empty string. By default, this token will be ignored. If RemoveOptions() is called with IGNORE_EMPTY_TOKENS, these tokens will be added to the collection (not ignored). This option can be useful for parsing <stockticker>CSV files and NMEA strings.


If this option is set, the tokenization stops when a terminating substring is encountered. Tokenize(...) treats pStrTerminate as an ordered substring. If this option is not set, the tokenization will stop when a character from a set of terminating characters is encountered. Tokenize(...) treats pStrTerminate as an unordered set of characters.

Thread safety notes

Even though the Tokenize(...) method is protected from re-entrancy with a mutex, the CStringTokenizer class is only partially thread-safe. The parent collection classes (CStringArray and CStringList) themselves are thread-safe. However, parsing is not thread-safe. If a producer thread writes the tokens to the CStringTokenizer object by calling Tokenize(...) and a consumer thread reads the tokens by calling the accessor methods of the parent collection classes, a situation may occur, when the consumer will see a combination of the old data and the new data.

Demo application / Test bed

void TestTokenizer()
   TRACE("Beginning of template string tokenizer demo\n");
   // a sting for parsing
   CString str1 = "She sells sea shells on a sea shore. \nShells  shine.";
   // declafre a tokenizer class derived from CStringArray
   CStringTokenizer<CStringArray> strTokArray;
   // Don't ignore the empty tokens. By default, they are ignored.
   // tokenize words in the 1st line
   UINT iStartOffset = strTokArray.Tokenize(str1, ". ", "\n");
   TRACE("Tokens in the Array:\n");
   for (int i = 0; i < strTokArray.GetSize(); ++i)
   // You can treat the tokenizer just like a regular CStringArray!
      TRACE("\t%s\n",strTokArray[i]);    // dump the parsed fragments
   // another string for parsing
   // declare a tokenizer class  derived from CStringList
   CStringTokenizer<CStringList> strTokList;
   // tokenize into separate words
   strTokList.Tokenize(str1, ". ", "\n", iStartOffset);
   TRACE("Tokens in the List:\n");
   for (POSITION pos = strTokList.GetHeadPosition(); pos != NULL; )
   // You can treat the tokenizer just like a regular CStringList!
      TRACE("\t%s\n", strTokList.GetNext(pos));
      // dump the parsed fragments
   // and another string for parsing
   str1 = "Marry had a little lamb... for dinner.";
   // terminate, when a given subscring is encountered
   strTokList.Tokenize(str1, ". ", "dinner");  // tokenize
   TRACE("Tokens in the List:\n");
   for (pos = strTokList.GetHeadPosition(); pos != NULL; )
   // You can treat the tokenizer just like a regular CStringList!
      TRACE("\t%s\n", strTokList.GetNext(pos)); // dump the parsed fragments 
 TRACE("End of template string tokenizer demo\n");


This idea seems very obvious. Probably, I couldn't find similar code on the web because I wasn't looking well enough. However, Googling for 'parser tokenizer CStringArray CStringList template' didn't produce anything similar.

Of course, there are loads of string tokenizers out there on the web. Most of them have an interface similar to Java's StringTokenizer. I didn't follow this de-facto standard. Maybe, I should have. On the other hand, my class preserves the original string.

As usual, suggestions, bug notes, comments etc., are most welcome!


  1. Another string tokenizer class on CodeProject.
  2. Yet another string tokenizer class (derived from CObject) on CodeGuru.
  4. String parser in German.


  • 0.1: Initial submission: December 4, 2006.
  • 0.2: Added a mutex to prevent re-entrance; added thread-safety notes: December 29, 2006.
  • 0.3: Changed the tokenization algorithm code slightly; added the TERMINATING_STRING option and updated the demo app to exercise this option; added notes about the options: January 5, 2007


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Nick Alexeev
Systems Engineer Prolitech
United States United States
doing business as Prolitech
Redwood City, CA

blog (mostly technical)

You may also be interested in...

Comments and Discussions

Generaloptions flag Pin
rfmobile7-Dec-06 9:11
memberrfmobile7-Dec-06 9:11 
GeneralRe: options flag [modified] Pin
Nick Alexeev7-Dec-06 16:43
memberNick Alexeev7-Dec-06 16:43 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.170915.1 | Last Updated 5 Jan 2007
Article Copyright 2006 by Nick Alexeev
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid