Click here to Skip to main content
15,867,594 members
Articles / Programming Languages / C++
Article

String Tokenizer Iterator Class

Rate me:
Please Sign up or sign in to vote.
4.00/5 (6 votes)
26 Jun 2002Public Domain 115.2K   23   24
A string tokenizer iterator class that works with std::string

Introduction

As a part of a larger project I had to write some basic string utility functions and classes. One of the things needed was a flexible way of splitting strings into separate tokens.

As is often the case when it comes to programming, there are different ways to handle a problem like this. After reviewing my options I decided that an iterator based solution would be flexible enough for my needs.

Non-iterator based solutions to this particular problem often have the disadvantage of tying the user to a certain container type. With an iterator based tokenizer the programmer is free to chose any type of container (or no container at all). Many STL containers such as std::list and std::vector offer constructors that can populate the container from a set of iterators. This feature makes it very easy to use the tokenizer.

Example usage

std::vector<std::string> s(string_token_iterator("one two three"),
                             string_token_iterator());
std::copy(s.begin(),
          s.end(),
          std::ostream_iterator<std::string>(std::cout,"\n"));
// output:
// one
// two
// three

std::copy(string_token_iterator("one,two..,..three",",."),
          string_token_iterator(),
          std::ostream_iterator<std::string>(std::cout,"\n"));
// same output as above

The code has been tested with Visual C++.NET and GCC 3.

The Code

#include <string>
#include <iterator>

struct string_token_iterator 
  : public std::iterator<std::input_iterator_tag, std::string>
{
public:
  string_token_iterator() : str(0), start(0), end(0) {}
  string_token_iterator(const std::string & str_, const char * separator_ = " ") :
    separator(separator_),
    str(&str_),
    end(0)
  {
    find_next();
  }
  string_token_iterator(const string_token_iterator & rhs) :
    separator(rhs.separator),
    str(rhs.str),
    start(rhs.start),
    end(rhs.end)
  {
  }

  string_token_iterator & operator++()
  {
    find_next();
    return *this;
  }

  string_token_iterator operator++(int)
  {
    string_token_iterator temp(*this);
    ++(*this);
    return temp;
  }

  std::string operator*() const
  {
    return std::string(*str, start, end - start);
  }

  bool operator==(const string_token_iterator & rhs) const
  {
    return (rhs.str == str && rhs.start == start && rhs.end == end);
  }

  bool operator!=(const string_token_iterator & rhs) const
  {
    return !(rhs == *this);
  }

private:

  void find_next(void)
  {
    start = str->find_first_not_of(separator, end);
    if(start == std::string::npos)
    {
      start = end = 0;
      str = 0;
      return;
    }

    end = str->find_first_of(separator, start);
  }

  const char * separator;
  const std::string * str;
  std::string::size_type start;
  std::string::size_type end;
};

License

This article, along with any associated source code and files, is licensed under A Public Domain dedication


Written By
Web Developer
Sweden Sweden
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionWhat license? Pin
Michael Broida5-Oct-12 4:57
Michael Broida5-Oct-12 4:57 
AnswerRe: What license? Pin
Daniel Andersson5-Oct-12 5:02
Daniel Andersson5-Oct-12 5:02 
GeneralRe: What license? Pin
Michael Broida5-Oct-12 6:11
Michael Broida5-Oct-12 6:11 
GeneralMy vote of 3 Pin
cccfff77730-Jun-10 4:49
cccfff77730-Jun-10 4:49 
GeneralSubtle Bug Pin
mollevp31-Jul-06 4:59
mollevp31-Jul-06 4:59 
AnswerRe: Subtle Bug Pin
Daniel Andersson13-Aug-06 22:37
Daniel Andersson13-Aug-06 22:37 
QuestionA little trouble with the basics? Pin
mtwombley4-Jul-04 17:49
mtwombley4-Jul-04 17:49 
AnswerRe: A little trouble with the basics? Pin
Daniel Andersson4-Jul-04 21:53
Daniel Andersson4-Jul-04 21:53 
GeneralRe: A little trouble with the basics? Pin
mtwombley5-Jul-04 9:39
mtwombley5-Jul-04 9:39 
GeneralRe: A little trouble with the basics? Pin
Daniel Andersson5-Jul-04 21:32
Daniel Andersson5-Jul-04 21:32 
QuestionTwo bugs - or features? Pin
RealSkydiver31-Mar-04 23:18
RealSkydiver31-Mar-04 23:18 
AnswerRe: Two bugs - or features? Pin
Daniel Andersson31-Mar-04 23:56
Daniel Andersson31-Mar-04 23:56 
GeneralRe: Two bugs - or features? Pin
RealSkydiver1-Apr-04 21:15
RealSkydiver1-Apr-04 21:15 
GeneralRe: Two bugs - or features? Pin
Daniel Andersson1-Apr-04 21:23
Daniel Andersson1-Apr-04 21:23 
GeneralRe: Two bugs - or features? Pin
tidi23-Sep-04 23:02
tidi23-Sep-04 23:02 
GeneralRe: Two bugs - or features? Pin
tidi23-Sep-04 23:50
tidi23-Sep-04 23:50 
GeneralNice Pin
Giles23-Sep-03 6:28
Giles23-Sep-03 6:28 
GeneralRe: Nice Pin
mouratos14-Oct-06 14:30
mouratos14-Oct-06 14:30 
GeneralProblems with vector Pin
bss25-Feb-03 22:33
bss25-Feb-03 22:33 
GeneralRe: Problems with vector Pin
Daniel Andersson26-Feb-03 4:11
Daniel Andersson26-Feb-03 4:11 
GeneralRe: Problems with vector Pin
Andreas Saurwein15-Sep-03 18:16
Andreas Saurwein15-Sep-03 18:16 
This seems to be not a problem with VC6 but rather with the STL implementation that comes with VC6. I'm using stlPort and it works nicely with the vector constructor.

Just in case you didnt notice: VC6's STL sucks. Laugh | :laugh:


Finally moved to Brazil
Generalmake it unicode aware Pin
27-Jun-02 18:25
suss27-Jun-02 18:25 
GeneralRe: make it unicode aware Pin
Anonymous23-Oct-02 7:56
Anonymous23-Oct-02 7:56 
GeneralVery good article (and a pointer to some more stuff) Pin
Joaquín M López Muñoz27-Jun-02 9:22
Joaquín M López Muñoz27-Jun-02 9:22 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.