Click here to Skip to main content
Click here to Skip to main content

A simple STL based XML parser

By , 16 May 2000
 
  • Download source files - 10 Kb
  • This is a small XML parser, based purely on STL. There are two main classes XmlStream and XmlParser. XmlParser.h contains most of the parsing code. It has several state variable, which can be split up into two categories:

    1. Buffer state - Shows where we are parsing from
    2. Parsing state - Shows what we have found
    XmlParser makes extensive use of offsets to keep track of its state. This is done by design. In order to maximize speed it does not do any string copies.

    To start parsing declare an instance of the class XmlStream and setup the buffer that you want to parse. An example is included in Parser.cpp. Call parse in XmlStream, passing in a pointer to the buffer and the buffers length. You will see screen output showing what has been found. This is simple debug output and can be turned off.

    XmlNotify is used as an interface class to notify a subscriber of nodes and elements being found. There is a pointer to a subscriber in the XmlStream class. The subscriber can be set using setSubscriber.

    Notice that no XML document declaration is included nor is a schema included. If those exist in your buffer don't send them to the parser. Later, the ability to remove these, will added in the code to step through these. so this is a non-validating parser. There is one bug in the parser. When an empty node is encountered it will be reported as an element this will be fixed later. An example of this is included in the sample code.

    If you have any suggestions or improvements let me know.

    License

    This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

    A list of licenses authors might use can be found here

    About the Author

    David Hubbard
    United States United States
    Member
    No Biography provided

    Sign Up to vote   Poor Excellent
    Add a reason or comment to your vote: x
    Votes of 3 or less require a comment

    Comments and Discussions

     
    You must Sign In to use this message board.
    Search this forum  
        Spacing  Noise  Layout  Per page   
    Generalmultiple entries of the same namememberburnettb31714 Jul '03 - 11:58 
    thanks for the great XML tools, they are very easy to use.
     
    I would like to have a table represented in XML but this implementation of the parser doesn't handle multiple entries of the same name, for example:
     
    <HOST>
    <NAME>cod</NAME>
    <DNSNAME>cod.somecorp.com</DNSNAME>
    <NIC>
       <MAC>0040f4612820</MAC>
       <IP>192.168.172.54</IP>
       <IP>192.168.173.54</IP>
    </NIC>
    </HOST>
    <HOST>
    <NAME>drum</NAME>
    <DNSNAME>drum.somecorp.com</DNSNAME>
    <NIC>
       <IP>192.168.172.47</IP>
    </NIC>
    </HOST>
     
    in this example the <HOST> and <IP> tags both have multiple entries.   the XMLDialog displays the host cod for both entries, and only 1 IP address, it would be nice to be able to address these in the read/write methods like:
     
    "HOST[0]:NIC:IP[1]","192.168.173.54"
     

     
    --Ben Burnett

    GeneralRe: multiple entries of the same namemembermikeguz14 Oct '05 - 17:47 
    I found the same problem, and believe it is related to this code in XmlStream.cpp:
     
    // if new parse cur position is in the last parser
    // last tag position we are done with the node
    char * curPos      = parserNode.getCurPos();
    char * lastCurPos = parser.getLastTagPos();
    if ( curPos >= lastCurPos )
    {
         break;
    }
     
    This code seems to assume that if you hit the end of a node at a certain level of XML heirarchy, you also must be done with that entire heirarchical level altogether. (When first </HOST> is hit, there won't be any more <HOST> entries to follow).
     
    I simply removed that code and allowed the above parser.parse() method to do what it does, and that seemed to fix the problem.
     
    I don't quite know if this will introduce other issues, though.
     
    - MikeG
     
    -- modified at 23:47 Friday 14th October, 2005
    GeneralNodes have the same name BUGmemberKen Lee9 Apr '03 - 20:43 
    I found a bug...
    when nearby nodes have the same name, the next node will be ignore
    ex:







     
    will only found the first node "n0"
    GeneralRe: Nodes have the same name BUGmemberKen Lee9 Apr '03 - 20:52 
    sorry...Cry | :((
    example code here...
     
    <Node id="n0">
    <Node id="n1">
    <Element id="e1"/>
    </Node>
    </Node>
    Generalpretty print of streammemberfazer777 Feb '03 - 18:00 
    in generell i find my way of prettyprinting a bit more easier to read:
     

    stringstream strm;
    strm
                << "<Contacts>" << endl
     
                << "<Contact/>" << endl
     
                << initContact( "Joe Blow", 100 )
                << initContact( "Mary Jane", 400, true, true )
     
                << "</Contacts>" << endl
    ;
     
    another question:
    even its a examples, i find it better to return some string, which is streamed, instead to write directly to the stream.
     
    this method safes an unneccesary argument,
    can be reused easier,
    ok only disadvantage: creating temporarily a string for returning. but i think its ok, because the entries are all pretty small...
     

    ---------------------
    string initContact ( LPCTSTR cname, long id,
                           bool hasAttributes = false,
                           bool showToDo = false )
    {
          stringstream strm;
     
         if ( hasAttributes )
              strm << "<Contact language=\"english\" hasEmail=\"false\">" << endl;
         else
              strm << "<Contact>" << endl;
          strm
                << "<name>" << cname << "</name>" << endl
                << "<id>" << id << "</id>" << endl
          ;
     
         if ( showToDo )
         {
                strm
                      << "<ToDo>" << endl
                      << "<Item></Item>" << endl
                      << "<Item/>" << endl
                      << "<Item>Call</Item>" << endl
                      << "<Item>Say Hello</Item>" << endl
                      << "<Item>Take Notes</Item>" << endl
                      << "<Item>HangUp</Item>" << endl
                      << "</ToDo>" << endl
                ;
         }
     
         strm << "</Contact>" << endl;
          return( strm.str() );
    }

    Generalattributesmemberfazer777 Feb '03 - 17:56 
    to be valid XML u should enclose all attribs with ":
     
    strm << "<Contact language=\"english\" hasEmail=\"false\">" << endl;
     
    cu george.

    GeneralBug fix for null nodesmemberBenJeremy2 Jan '03 - 10:06 
    Parser will 'kick out' of a parent node when a subnode is a null node entry:
     
    <Skin Name="CMX">
    <Information>
       <Author>John Doe</Author>
       <Copyright>2003</Copyright>
    </Information>
    <Resource Type="Font" Name="SmallFont" File="Font10.fnt"/>
    <Resource Type="Font" Name="NormalFont" File="Font16.fnt"/>
    <Resource Type="Font" Name="GameText" File="GameText.fnt"/>
    <Resource Type="Image" Name="Title" File="TitleScr.jpg"/>
    <Resource Type="Image" Name="MenuBack" File="MenuScr.jpg"/>
    </Skin>
     
    This will kick out on the first "Resource" node.
     
    The problem lies in xmlparser.h, the routine 'parse()':
     
              // if null tag no data or last tag
              if ( hasNullTag() )
              {
                   // update cur position
                   _current   = _firstTagEnd + idTagRightLength;
     
                   // done so show success
                   return true;
              }         
     
    needs to change to:
     
              // if null tag no data or last tag
              if ( hasNullTag() )
              {
                   // update cur position
                   _current   = _firstTagEnd + idTagRightLength;
                   _lastTagStart = _current;
                   _lastTagEnd   = _current;
     
                   // done so show success
                   return true;
              }         
     
    and the parser will work properly.
    GeneralRe: Bug fix for null nodesmemberBenJeremy6 Jan '03 - 5:04 
    Strange...
     
    The fix does work, after a fashion, but there seems to be no rhyme or reason as to whether a null tag is considered a node or an element.
     
    I've modified my code not to worry about it, but if anybody can figure this out, I'd greatly appreciate it.
    GeneralAnoying bug for an xml parsermemberyarp21 Nov '02 - 20:42 
    I've got an xml file with standard header and comments like this:
    <?xml version="1.0" ?>
    <!-- Ok ? -->
    <!-- Ok -->
    <Tag currentTag="0">
    ...
    
    The parser fails reading the xml file because of the header and comments.
    Well, good job anyway. I'm working on the fix, I'll post it when done.
     
    Yarp
    http://www.senosoft.com/
    GeneralRe: Anoying bug for an xml parsermemberyarp21 Nov '02 - 21:17 
    So here'a partial fix:
      bool hasNullTag ()
      {
        // get beginning of first tag
        char * firstTagBegin = _buffer + _firstTagStart;
        // get end of first tag
        char * firstTagEnd = _buffer + _firstTagEnd - 1;
     
        // if null tag marker
        if (( *firstTagBegin == '<' && *(firstTagBegin+1) == '?' ) ||
            ( *firstTagBegin == '<' && *(firstTagBegin+1) == '!' ) ||
            ( *firstTagEnd == '/' && *(firstTagEnd+1) == '>' ))
          return true;
        else
          return false;
      }
    
     
    Not perfect since the XmlNotify class is still called but this is easy to fix too:
    Each time a _subscriber is called just test if the parsed name isn't "?" or "!". I recall that the _subscriber is called by XmlStream to notify the caller for the xml parsing result.
     
    Yarp
    http://www.senosoft.com/

    General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

    Permalink | Advertise | Privacy | Mobile
    Web03 | 2.6.130523.1 | Last Updated 17 May 2000
    Article Copyright 2000 by David Hubbard
    Everything else Copyright © CodeProject, 1999-2013
    Terms of Use
    Layout: fixed | fluid