Click here to Skip to main content
13,348,514 members (90,898 online)
Click here to Skip to main content
Add your own
alternative version


20 bookmarked
Posted 28 Oct 2007

How to build an in-memory state engine at runtime

, 28 Oct 2007
Rate this:
Please Sign up or sign in to vote.
A C++ template for an efficient in-memory state engine.

tag tree


If you want to detect keywords or special tags in a data stream, you typically use a parser generator like yacc or ANTLR. Otherwise, you can use the power of Regular Expressions. In some cases, a handcrafted parser will fulfill the requirement. To simplify this task, I wrote the template tag_tree (as part of a larger library using boost) to create a kind of state engine in memory during runtime. It has a low memory footprint, and is more efficient than any tokenizer-string compare approach could be.


A handwritten parser/scanner needs a lot of states, and looks similar to the following piece of code:

boost::tribool parser::consume(char input)
  switch (state_)
  case state_1:
    if (input == 'P')
      state_ = state_2;
      return boost::indeterminate;
      return false;
  case state_2:
    if (input == 'O')
      return true;
    ... <snip> ...
    return false;

This is the implementation of a state engine, and is very error prone if you have to manage a lot of states. It works fine and fast for you, but there is a lot of work to modify the keywords. The following code snippet shows how you can solve this problem with the tag_tree template:

class parser
    typedef tag_tree < std::string, unsigned >     TagTree_;
    typedef typename TagTree_::MyPtr_    TagTreePtr_;
    TagTree_    kt_;

    : prev_( NULL )
    kt_.insertKeyWord( "POST", 10 );
    kt_.insertKeyWord( "HOST", 20 );
    kt_.insertKeyWord( "XONN", 30 );

unsigned parser::run()
    TagTreePtr_    prev = &kt_;
    for (;;)
        TagTreePtr_ p = prev->findNode( next() );
        if (p == NULL)
        prev = p;
    return (prev->isLeaf())
        ? prev->getValue()
        : -1;    //! nothing found

next() returns the next character from the data stream. While findNode() is matching, the method continues to loop. After breaking the loop, isLeaf() tests if a valid keyword was found and the method returns an identifier. It is quite possible to return the prev pointer itself. prev points in every case to a valid address, and contains all the information you need.

How does it work

The main idea is that each keyword builds a unique trace of linked nodes where each node represents a specific parser state. To build this trace, a tree of arrays will be used. The following figure shows a tree with the keywords PORT and POST:

tag tree

And here comes some comfort to you. If you prefer speed over memory, choose the maximum size for the arrays (256 for char, for example). Then choose the simplest hash algorithm that is possible - the ASCII code as the array index. To save memory, the array could be shrunken. But then we have to manage a collision list.

In the download section is a project named TagTree which contains a more sophisticated sample of usage for the tag_tree template.



Defines some typedefs and access methods for all node values. tag_tree_base is not restricted to strings. You can use any string-like container which defines a const_iterator and a value_type type.

tag_tree_base constructor and destructor

tag_tree_base::tag_tree_base( char_type c );
  • c: The character this node is representing.


value_type & tag_tree_base::getValue();
Return value:

Value of this node.

tag_tree constructor and destructor

tag_tree::tag_tree( char_type c );

tag_tree is designed for usage in a streaming parser. tag_tree can help you build compact state engines in memory, since each node represents a unique parser state.

  • c: The character this node is representing.

Setting all pointers to NULL is done by the containers.


tag_tree::MyPtr_ tag_tree::findNode( char_type c );
  • c: character to find.</a />
Return value

A pointer to a node containing the specified character c or NULL.


bool tag_tree::empty     (          )      const;
Return value

true if there are no outgoing node pointers.

The semantic is similar to the STL - container empty() method. If the node array nodeMap_ contains no pointers, then nodeList_ should not either. And it should be a leaf. Otherwise, something is going wrong.


size_t tag_tree::leaf_count( bool branch = false )     const;
Return value:

The count of keywords starting from this point.


void tag_tree::dump(     ostream_type_ &      os,
        size_t      depth = 0L,
        size_t      branch = 0L
    )             const;
  • os: an open output stream, std::cout, for example.

Prints a plain list of all nodes and their attributes. Useful for debugging purposes.


tag_tree::lookUp(    char_const_iterator from, char_const_iterator to ) const
  • from: An input iterator addressing the position of the first element in the keyword.
  • to: An input iterator addressing the position that is one past the final element in the keyword.
Return value:

The value of the keyword. If the keyword does not exist, a null value will be returned.

Looks up for the specified keyword.


  • 29 Oct., 2007 - Initial release.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Software Developer (Senior)
Switzerland Switzerland
No Biography provided

You may also be interested in...


Comments and Discussions

QuestionA trie? Pin
Robert Surtees30-Oct-07 13:53
memberRobert Surtees30-Oct-07 13:53 
AnswerRe: A trie? Pin
osy1-Nov-07 5:03
memberosy1-Nov-07 5:03 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.180111.1 | Last Updated 28 Oct 2007
Article Copyright 2007 by osy
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid