Click here to Skip to main content
11,416,194 members (67,588 online)
Click here to Skip to main content

A C++ STL String Tokenizer

, 13 Oct 2001 Ms-PL
Rate this:
Please Sign up or sign in to vote.
A C++ STL Tokenizer class capable to tokenize a string when the set of character separators is specified by another string
<!-- Download Links --> <!-- Add the rest of your HTML here -->

Introduction

The class CTokenizer I am presenting in this article is capable of tokenizing an STL string when the set of character separators is specified by a predicate class. This is a very generally designed as a class template:

template <class Pred>
void CTokenizer { /*...*/ };

The separating (tokenizing) criteria being implemented in the argument predicate class Pred. The predicate classes are usually derived from unary_function<char, bool> and implement the () operator. I am giving only three examples of predicate classes: CIsSpace where the set of separators contains the white spaces 0x09-0x0D and 0x20, CIsComma where the separator is the comma character ',' and CIsFromString where the set of separators is specified by the characters in a STL string. Other predicate classes can be easily added as needed.

Implementation

First I will present the implemented predicates.

For the case when the separators are white spaces 0x09-0x0D and 0x20;

class CIsSpace : public unary_function<char, bool>
{
public:
  bool operator()(char c) const;
};

inline bool CIsSpace::operator()(char c) const
{
  // isspace<char> returns true if c is a white-space character 
  // (0x09-0x0D or 0x20)
  return isspace<char>(c);
}

For the case where the separator is the comma character ',':

class CIsComma : public unary_function<char, bool>
{
public:
  bool operator()(char c) const;
};

inline bool CIsComma::operator()(char c) const
{
  return (',' == c);
}

For the case where the separator is a character from a set of characters given in a STL string:

class CIsFromString : public unary_function<char, bool>
{
public:
  //Constructor specifying the separators
  CIsFromString::CIsFromString(string const& rostr) : m_ostr(rostr) {}
  bool operator()(char c) const;

private:
  string m_ostr;
};

inline bool CIsFromString::operator()(char c) const
{
  int iFind = m_ostr.find(c);
  if(iFind != string::npos)
    return true;
  else
    return false;
}

Finally the string tokenizer class implementing the Tokenize() function is a static member function. Notice that CIsSpace is the default predicate for the Tokenize() function.

template <class Pred=CIsSpace>
class CTokenizer
{
public:
  //The predicate should evaluate to true when applied to a separator.
  static void Tokenize(vector<string>& roResult, string const& rostr, 
                       Pred const& roPred=Pred());
};

//The predicate should evaluate to true when applied to a separator.
template <class Pred>
inline void CTokenizer<Pred>::Tokenize(vector<string>& roResult, 
                                            string const& rostr, Pred const& roPred)
{
  //First clear the results vector
  roResult.clear();
  string::const_iterator it = rostr.begin();
  string::const_iterator itTokenEnd = rostr.begin();
  while(it != rostr.end())
  {
    //Eat seperators
    while(roPred(*it))
      it++;
    //Find next token
    itTokenEnd = find_if(it, rostr.end(), roPred);
    //Append token to result
    if(it < itTokenEnd)
      roResult.push_back(string(it, itTokenEnd));
    it = itTokenEnd;
  }
}

How to use

The following code snippet is showing some simple usage examples, one for each one of the implemented predicates:

//Test CIsSpace() predicate
cout << "Test CIsSpace() predicate:" << endl;
//The Results Vector
vector<string> oResult;
//Call Tokeniker
CTokenizer<>::Tokenize(oResult, " wqd \t hgwh \t sdhw \r\n kwqo \r\n  dk ");
//Display Results
for(int i=0; i<oResult.size(); i++)
  cout << oResult[i] << endl;
//Test CIsComma() predicate
cout << "Test CIsComma() predicate:" << endl;
//The Results Vector
vector<string> oResult;
//Call Tokeniker
CTokenizer<CIsComma>::Tokenize(oResult, "wqd,hgwh,sdhw,kwqo,dk", CIsComma());
//Display Results
for(int i=0; i<oResult.size(); i++)
  cout << oResult[i] << endl;
//Test CIsFromString predicate
cout << "Test CIsFromString() predicate:" << endl;
//The Results Vector
vector<string> oResult;
//Call Tokeniker
CTokenizer<CIsFromString>::Tokenize(oResult, ":wqd,;hgwh,:,sdhw,:;kwqo;dk,", 
                                          CIsFromString(",;:"));
//Display Results
cout << "Display strings:" << endl;
for(int i=0; i<oResult.size(); i++)
  cout << oResult[i] << endl;

Conclusion

The project StringTok.zip attached to this article includes the source code of the presented CTokenizer class and some test code. I am interested in any opinions and new ideas about this implementation.

License

This article, along with any associated source code and files, is licensed under The Microsoft Public License (Ms-PL)

Share

About the Author

George Anescu
Web Developer
Romania Romania
No Biography provided

Comments and Discussions

 
GeneralStatic function tokenizer() Pin
Member #27711389-Jan-07 16:15
memberMember #27711389-Jan-07 16:15 
AnswerRe: Static function tokenizer() Pin
6969-Apr-07 15:47
member6969-Apr-07 15:47 
GeneralVS 2005 Changes & bug fix Pin
Terry.Kelly12-Oct-06 8:06
memberTerry.Kelly12-Oct-06 8:06 
GeneralVS 2005: bug solution Pin
sirnowy11-Aug-06 3:15
membersirnowy11-Aug-06 3:15 
QuestionAny ideas on 'escaping' a character ? Pin
Garth J Lancaster5-Sep-05 17:22
memberGarth J Lancaster5-Sep-05 17:22 
AnswerRe: Any ideas on 'escaping' a character ? Pin
haightasbury7-Feb-06 20:56
memberhaightasbury7-Feb-06 20:56 
GeneralRe: Any ideas on 'escaping' a character ? Pin
Garth J Lancaster7-Feb-06 23:31
memberGarth J Lancaster7-Feb-06 23:31 
GeneralSeparate the file attributes Pin
IAMR16-Jul-04 21:32
memberIAMR16-Jul-04 21:32 
GeneralString Tokenizer Pin
TheSolver22-Jul-03 6:09
memberTheSolver22-Jul-03 6:09 
GeneralRe: String Tokenizer Pin
HatemMostafa18-Dec-04 20:31
memberHatemMostafa18-Dec-04 20:31 
GeneralUnicode Pin
Adam Pond5-Apr-02 1:56
memberAdam Pond5-Apr-02 1:56 
GeneralTry boost tokenizer Pin
Robin14-Oct-01 23:43
memberRobin14-Oct-01 23:43 
GeneralRe: Try boost tokenizer Pin
Anonymous18-Oct-01 23:25
memberAnonymous18-Oct-01 23:25 
GeneralRe: Try boost tokenizer Pin
Anonymous19-Apr-02 6:41
memberAnonymous19-Apr-02 6:41 
GeneralLooks quite complicated... Pin
Petr Prikryl14-Oct-01 23:29
memberPetr Prikryl14-Oct-01 23:29 
GeneralRe: Looks quite complicated... Pin
Mr.Prakash30-Nov-05 6:26
memberMr.Prakash30-Nov-05 6:26 
Questionstrtok ? Pin
Anonymous14-Oct-01 23:25
memberAnonymous14-Oct-01 23:25 
AnswerRe: strtok ? Pin
Anonymous14-Oct-01 23:57
memberAnonymous14-Oct-01 23:57 
GeneralRe: strtok ? Pin
HatemMostafa18-Dec-04 20:34
memberHatemMostafa18-Dec-04 20:34 
AnswerRe: strtok ? Pin
William E. Kempf15-Oct-01 5:24
memberWilliam E. Kempf15-Oct-01 5:24 
GeneralRe: strtok ? Pin
Anonymous15-Oct-01 5:48
memberAnonymous15-Oct-01 5:48 
GeneralRe: strtok ? Pin
Aliff2-Sep-04 4:41
memberAliff2-Sep-04 4:41 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.150427.4 | Last Updated 14 Oct 2001
Article Copyright 2001 by George Anescu
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid