Click here to Skip to main content
12,254,527 members (65,023 online)
Click here to Skip to main content
Add your own
alternative version

Stats

98K views
22 bookmarked
Posted

String Tokenizer Iterator Class

, 26 Jun 2002 Public Domain
Rate this:
Please Sign up or sign in to vote.
A string tokenizer iterator class that works with std::string
<!-- Add the rest of your HTML here -->

Introduction

As a part of a larger project I had to write some basic string utility functions and classes. One of the things needed was a flexible way of splitting strings into separate tokens.

As is often the case when it comes to programming, there are different ways to handle a problem like this. After reviewing my options I decided that an iterator based solution would be flexible enough for my needs.

Non-iterator based solutions to this particular problem often have the disadvantage of tying the user to a certain container type. With an iterator based tokenizer the programmer is free to chose any type of container (or no container at all). Many STL containers such as std::list and std::vector offer constructors that can populate the container from a set of iterators. This feature makes it very easy to use the tokenizer.

Example usage

std::vector<std::string> s(string_token_iterator("one two three"),
                             string_token_iterator());
std::copy(s.begin(),
          s.end(),
          std::ostream_iterator<std::string>(std::cout,"\n"));
// output:
// one
// two
// three

std::copy(string_token_iterator("one,two..,..three",",."),
          string_token_iterator(),
          std::ostream_iterator<std::string>(std::cout,"\n"));
// same output as above

The code has been tested with Visual C++.NET and GCC 3.

The Code

#include <string>
#include <iterator>

struct string_token_iterator 
  : public std::iterator<std::input_iterator_tag, std::string>
{
public:
  string_token_iterator() : str(0), start(0), end(0) {}
  string_token_iterator(const std::string & str_, const char * separator_ = " ") :
    separator(separator_),
    str(&str_),
    end(0)
  {
    find_next();
  }
  string_token_iterator(const string_token_iterator & rhs) :
    separator(rhs.separator),
    str(rhs.str),
    start(rhs.start),
    end(rhs.end)
  {
  }

  string_token_iterator & operator++()
  {
    find_next();
    return *this;
  }

  string_token_iterator operator++(int)
  {
    string_token_iterator temp(*this);
    ++(*this);
    return temp;
  }

  std::string operator*() const
  {
    return std::string(*str, start, end - start);
  }

  bool operator==(const string_token_iterator & rhs) const
  {
    return (rhs.str == str && rhs.start == start && rhs.end == end);
  }

  bool operator!=(const string_token_iterator & rhs) const
  {
    return !(rhs == *this);
  }

private:

  void find_next(void)
  {
    start = str->find_first_not_of(separator, end);
    if(start == std::string::npos)
    {
      start = end = 0;
      str = 0;
      return;
    }

    end = str->find_first_of(separator, start);
  }

  const char * separator;
  const std::string * str;
  std::string::size_type start;
  std::string::size_type end;
};

License

This article, along with any associated source code and files, is licensed under A Public Domain dedication

Share

About the Author

Daniel Andersson
Web Developer
Sweden Sweden
No Biography provided

You may also be interested in...

Comments and Discussions

 
QuestionWhat license? Pin
Michael Broida5-Oct-12 4:57
memberMichael Broida5-Oct-12 4:57 
AnswerRe: What license? Pin
Daniel Andersson5-Oct-12 5:02
memberDaniel Andersson5-Oct-12 5:02 
GeneralRe: What license? Pin
Michael Broida5-Oct-12 6:11
memberMichael Broida5-Oct-12 6:11 
GeneralMy vote of 3 Pin
cccfff77730-Jun-10 4:49
membercccfff77730-Jun-10 4:49 
GeneralSubtle Bug Pin
mollevp31-Jul-06 4:59
membermollevp31-Jul-06 4:59 
AnswerRe: Subtle Bug Pin
Daniel Andersson13-Aug-06 22:37
memberDaniel Andersson13-Aug-06 22:37 
QuestionA little trouble with the basics? Pin
mtwombley4-Jul-04 17:49
membermtwombley4-Jul-04 17:49 
AnswerRe: A little trouble with the basics? Pin
Daniel Andersson4-Jul-04 21:53
memberDaniel Andersson4-Jul-04 21:53 
GeneralRe: A little trouble with the basics? Pin
mtwombley5-Jul-04 9:39
membermtwombley5-Jul-04 9:39 
GeneralRe: A little trouble with the basics? Pin
Daniel Andersson5-Jul-04 21:32
memberDaniel Andersson5-Jul-04 21:32 
Yeah, like I said in my comment this is a problem whenever the user passes a const char* to the tokenizer constructor. Both string_token_iterator("abc") and string_token_iterator(argv[1])
end up as that and temporary objects are created.

In retrospect I think it's safe to say that the iterator constructor should take a const std::string * as argument instead of a reference. It would make it a lot safer for people to use. The downside would be that std::vector s(string_token_iterator(argv[1],";"), string_token_iterator()); would no longer work.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.160426.1 | Last Updated 27 Jun 2002
Article Copyright 2002 by Daniel Andersson
Everything else Copyright © CodeProject, 1999-2016
Layout: fixed | fluid