Click here to Skip to main content
12,507,674 members (54,817 online)
Click here to Skip to main content
Add your own
alternative version

Stats

177.9K views
915 downloads
34 bookmarked
Posted

A C++ STL String Tokenizer

, 13 Oct 2001 Ms-PL
Rate this:
Please Sign up or sign in to vote.
A C++ STL Tokenizer class capable to tokenize a string when the set of character separators is specified by another string
<!-- Download Links --> <!-- Add the rest of your HTML here -->

Introduction

The class CTokenizer I am presenting in this article is capable of tokenizing an STL string when the set of character separators is specified by a predicate class. This is a very generally designed as a class template:

template <class Pred>
void CTokenizer { /*...*/ };

The separating (tokenizing) criteria being implemented in the argument predicate class Pred. The predicate classes are usually derived from unary_function<char, bool> and implement the () operator. I am giving only three examples of predicate classes: CIsSpace where the set of separators contains the white spaces 0x09-0x0D and 0x20, CIsComma where the separator is the comma character ',' and CIsFromString where the set of separators is specified by the characters in a STL string. Other predicate classes can be easily added as needed.

Implementation

First I will present the implemented predicates.

For the case when the separators are white spaces 0x09-0x0D and 0x20;

class CIsSpace : public unary_function<char, bool>
{
public:
  bool operator()(char c) const;
};

inline bool CIsSpace::operator()(char c) const
{
  // isspace<char> returns true if c is a white-space character 
  // (0x09-0x0D or 0x20)
  return isspace<char>(c);
}

For the case where the separator is the comma character ',':

class CIsComma : public unary_function<char, bool>
{
public:
  bool operator()(char c) const;
};

inline bool CIsComma::operator()(char c) const
{
  return (',' == c);
}

For the case where the separator is a character from a set of characters given in a STL string:

class CIsFromString : public unary_function<char, bool>
{
public:
  //Constructor specifying the separators
  CIsFromString::CIsFromString(string const& rostr) : m_ostr(rostr) {}
  bool operator()(char c) const;

private:
  string m_ostr;
};

inline bool CIsFromString::operator()(char c) const
{
  int iFind = m_ostr.find(c);
  if(iFind != string::npos)
    return true;
  else
    return false;
}

Finally the string tokenizer class implementing the Tokenize() function is a static member function. Notice that CIsSpace is the default predicate for the Tokenize() function.

template <class Pred=CIsSpace>
class CTokenizer
{
public:
  //The predicate should evaluate to true when applied to a separator.
  static void Tokenize(vector<string>& roResult, string const& rostr, 
                       Pred const& roPred=Pred());
};

//The predicate should evaluate to true when applied to a separator.
template <class Pred>
inline void CTokenizer<Pred>::Tokenize(vector<string>& roResult, 
                                            string const& rostr, Pred const& roPred)
{
  //First clear the results vector
  roResult.clear();
  string::const_iterator it = rostr.begin();
  string::const_iterator itTokenEnd = rostr.begin();
  while(it != rostr.end())
  {
    //Eat seperators
    while(roPred(*it))
      it++;
    //Find next token
    itTokenEnd = find_if(it, rostr.end(), roPred);
    //Append token to result
    if(it < itTokenEnd)
      roResult.push_back(string(it, itTokenEnd));
    it = itTokenEnd;
  }
}

How to use

The following code snippet is showing some simple usage examples, one for each one of the implemented predicates:

//Test CIsSpace() predicate
cout << "Test CIsSpace() predicate:" << endl;
//The Results Vector
vector<string> oResult;
//Call Tokeniker
CTokenizer<>::Tokenize(oResult, " wqd \t hgwh \t sdhw \r\n kwqo \r\n  dk ");
//Display Results
for(int i=0; i<oResult.size(); i++)
  cout << oResult[i] << endl;
//Test CIsComma() predicate
cout << "Test CIsComma() predicate:" << endl;
//The Results Vector
vector<string> oResult;
//Call Tokeniker
CTokenizer<CIsComma>::Tokenize(oResult, "wqd,hgwh,sdhw,kwqo,dk", CIsComma());
//Display Results
for(int i=0; i<oResult.size(); i++)
  cout << oResult[i] << endl;
//Test CIsFromString predicate
cout << "Test CIsFromString() predicate:" << endl;
//The Results Vector
vector<string> oResult;
//Call Tokeniker
CTokenizer<CIsFromString>::Tokenize(oResult, ":wqd,;hgwh,:,sdhw,:;kwqo;dk,", 
                                          CIsFromString(",;:"));
//Display Results
cout << "Display strings:" << endl;
for(int i=0; i<oResult.size(); i++)
  cout << oResult[i] << endl;

Conclusion

The project StringTok.zip attached to this article includes the source code of the presented CTokenizer class and some test code. I am interested in any opinions and new ideas about this implementation.

License

This article, along with any associated source code and files, is licensed under The Microsoft Public License (Ms-PL)

Share

About the Author

George Anescu
Web Developer
Romania Romania
No Biography provided

You may also be interested in...

Pro
Pro

Comments and Discussions

 
GeneralStatic function tokenizer() Pin
Member #27711389-Jan-07 15:15
memberMember #27711389-Jan-07 15:15 
AnswerRe: Static function tokenizer() Pin
6969-Apr-07 14:47
member6969-Apr-07 14:47 
GeneralVS 2005 Changes & bug fix Pin
Terry.Kelly12-Oct-06 7:06
memberTerry.Kelly12-Oct-06 7:06 
GeneralVS 2005: bug solution Pin
sirnowy11-Aug-06 2:15
membersirnowy11-Aug-06 2:15 
QuestionAny ideas on 'escaping' a character ? Pin
Garth J Lancaster5-Sep-05 16:22
memberGarth J Lancaster5-Sep-05 16:22 
AnswerRe: Any ideas on 'escaping' a character ? Pin
haightasbury7-Feb-06 19:56
memberhaightasbury7-Feb-06 19:56 
GeneralRe: Any ideas on 'escaping' a character ? Pin
Garth J Lancaster7-Feb-06 22:31
memberGarth J Lancaster7-Feb-06 22:31 
GeneralSeparate the file attributes Pin
IAMR16-Jul-04 20:32
memberIAMR16-Jul-04 20:32 
GeneralString Tokenizer Pin
TheSolver22-Jul-03 5:09
memberTheSolver22-Jul-03 5:09 
GeneralRe: String Tokenizer Pin
HatemMostafa18-Dec-04 19:31
memberHatemMostafa18-Dec-04 19:31 
GeneralUnicode Pin
Adam Pond5-Apr-02 0:56
memberAdam Pond5-Apr-02 0:56 
GeneralTry boost tokenizer Pin
Robin14-Oct-01 22:43
memberRobin14-Oct-01 22:43 
GeneralRe: Try boost tokenizer Pin
Anonymous18-Oct-01 22:25
memberAnonymous18-Oct-01 22:25 
GeneralRe: Try boost tokenizer Pin
Anonymous19-Apr-02 5:41
memberAnonymous19-Apr-02 5:41 
GeneralLooks quite complicated... Pin
Petr Prikryl14-Oct-01 22:29
memberPetr Prikryl14-Oct-01 22:29 
GeneralRe: Looks quite complicated... Pin
Mr.Prakash30-Nov-05 5:26
memberMr.Prakash30-Nov-05 5:26 
Questionstrtok ? Pin
Anonymous14-Oct-01 22:25
memberAnonymous14-Oct-01 22:25 
AnswerRe: strtok ? Pin
Anonymous14-Oct-01 22:57
memberAnonymous14-Oct-01 22:57 
GeneralRe: strtok ? Pin
HatemMostafa18-Dec-04 19:34
memberHatemMostafa18-Dec-04 19:34 
AnswerRe: strtok ? Pin
William E. Kempf15-Oct-01 4:24
memberWilliam E. Kempf15-Oct-01 4:24 
GeneralRe: strtok ? Pin
Anonymous15-Oct-01 4:48
memberAnonymous15-Oct-01 4:48 
GeneralRe: strtok ? Pin
Aliff2-Sep-04 3:41
memberAliff2-Sep-04 3:41 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.160927.1 | Last Updated 14 Oct 2001
Article Copyright 2001 by George Anescu
Everything else Copyright © CodeProject, 1999-2016
Layout: fixed | fluid