Click here to Skip to main content
12,406,124 members (68,168 online)
Click here to Skip to main content
Add your own
alternative version

Stats

17.9K views
571 downloads
36 bookmarked
Posted

Portable Elmax: C++ XML DOM Parser

, 12 Apr 2016 CPOL
Rate this:
Please Sign up or sign in to vote.
A tutorial on a new cross-platform C++ XML DOM library

Table of Contents

Introduction

Portable Elmax is a cross-platform, non-validating XML DOM parser written in C++. Prior to this edition, there is another non-portable edition based on MSXML. To avoid confusion, that edition will be referred to as MS Elmax in the article. MS Elmax has superficial MFC CString support at the API boundary (meaning CString is converted to STL string before any string processing) while Portable Elmax can be flipped to use MFC CString natively by defining ELMAX_USE_MFC_CSTRING in the config.h file. This article is a short tutorial on Portable Elmax. While Portable Elmax and MS Elmax are very similar in terms of API calls, Portable Elmax is not a drop-in replacement for MS Elmax; there are some crucial differences that the user must be aware of, to use the library correctly and effectively.

Writing Element

Let us see how to create and write an integer value to an element. And the explanation comes in the next paragraph.

#include "../PortableElmax/Elmax.h"

void WriteElement(std::string& xml)
{
    using namespace Elmax;
    RootElement root("Products");

    root.Create("Product").Create("Qty").SetInt32(1234);

    xml = root.ToPrettyString("    ");

    std::cout << xml << std::endl;
}

First line of the code includes the Elmax.h header which includes all necessary XML classes you need. There is no document class. Each Element object doubles up as a document to read and save XML to file or string. The main important difference with MS Elmax, is that root must be given a name in the constructor, without which will result error when resolving the element to retrieve. Unlike MS Elmax, there is no need to call SetDomDoc or SetConverter; The library uses Boost lexical_cast to perform the data type conversion. [] operator always return the first child; to retrieve children, GetChildren should be called. Destroy function must be called any element which is detached from root. Destroy will delete internal XML tree. The only string parameter for ToPrettyString function is the indentation for pretty print. The output is listed below.

<Products>
    <Product>
        <Qty>1234</Qty>
    </Product>
</Products>

Reading Element

Next, the xml which is saved from the previous example will be read and qty is displayed.

void ReadElement(const std::string& xml)
{
    using namespace Elmax;
    RootElement root;
    root.ParseXMLString(xml);

    int qty = root["Product"]["Qty"].GetInt32(0);

    std::cout << "Qty:" << qty << std::endl;
}

Notice here, the root has no name because it will be set when the xml string is parsed. Even if root is given a name in the constructor, it will be overwritten after it parsed the xml string. Value of qty is displayed below.

Qty:1234

Writing Attribute

Let us see the code to create and write attribute.

void WriteAttr(std::string& xml)
{
    using namespace Elmax;
    RootElement root("Products");

    Element elem = root.Create("Product");
    elem.SetAttrInt32("Qty", 1234);

    xml = root.ToPrettyString("    ");

    std::cout << xml << std::endl;
}

Before writing to an attribute, the element must exist, therefore it must be created with Create. Below is the displayed qty value.

<Products>
    <Product Qty="1234"/>
</Products>

Reading Attribute

void ReadAttr(const std::string& xml)
{
    using namespace Elmax;
    RootElement root;
    root.ParseXMLString(xml);

    Element elem = root["Product"];

    int qty = elem.GetAttrInt32("Qty", 0);

    std::cout << "Qty:" << qty << std::endl;
}

Before reading the attribute, care must be taken to ensure the element exists else runtime_error exception will be thrown. Speaking of exception handling, Boost bad_lexical_cast and std::exception derived exception like runtime_error could be thrown so the code should be put in try-catch. The output is displayed below.

Qty:1234

Writing Comment

Comments can be added by calling AddComment. XML comment starts with <!-- and ends with -->

void WriteComment(std::string& xml)
{
    using namespace Elmax;
    RootElement root("Products");

    Element elem = root.Create("Product");
    elem.SetAttrInt32("Qty", 1234);
    elem.AddComment("Qty must not be less than 100");

    xml = root.ToPrettyString("    ");

    std::cout << xml << std::endl;
}

This is what the comment look like in XML.

<Products>
    <Product Qty="1234">
        <!--Qty must not be less than 100-->
    </Product>
</Products>

Reading Comment

The code example below shows how to retrieve a collection of comments under an element.

void ReadComment(const std::string& xml)
{
    using namespace Elmax;
    RootElement root;
    root.ParseXMLString(xml);

    Element elem = root["Product"];

    int qty = elem.GetAttrInt32("Qty", 0);

    std::vector<Comment> vec = elem.GetCommentCollection();

    std::cout << "Qty:" << qty << std::endl;

    if(vec.size()>0)
        std::cout << "Comment:" << vec[0].GetContent() << std::endl;
}
Qty:1234
Comment:Qty must not be less than 100

Writing CDATA Section

CDATA is (Unparsed) Character Data which the text within is ignored by the XML parser. CDATA can be added through AddCData. CDATA in XML starts with <![CDATA[ and ends with ]]>

void WriteCData(std::string& xml)
{
    using namespace Elmax;
    RootElement root("Products");

    Element elem = root.Create("Product");
    elem.SetAttrInt32("Qty", 1234);
    elem.AddCData("Hello world!");

    xml = root.ToPrettyString("    ");

    std::cout << xml << std::endl;
}

The output is shown below:

<Products>
    <Product Qty="1234">
        <![CDATA[Hello world!]]>
    </Product>
</Products>

For best practice, it is not advisable to store binary data in CDATA Section, as there could be a small possibility that ]]> will be encountered in the data. And due to the way text file library is used to read and write files, carriage return and linefeed has special meanings. Carriage return will be removed from the binary data. This is the limitation of using text file library. To overcome these limitations, it is best to store the data in Base64 format.

Reading CDATA Section

Below, an example is shown how to get a CDATA by retrieving the collection first with GetCDataCollection.

void ReadCData(const std::string& xml)
{
    using namespace Elmax;
    RootElement root;
    root.ParseXMLString(xml);

    Element elem = root["Product"];

    int qty = elem.GetAttrInt32("Qty", 0);

    std::vector<CData> vec = elem.GetCDataCollection();

    std::cout << "Qty:" << qty << std::endl;

    if(vec.size()>0)
        std::cout << "CData:" << vec[0].GetContent() << std::endl;
}

The above code display these.

Qty:1234
CData:Hello world!

Namespace

Namespace support is minimal. To create an Element under a namespace, call Create with a namespace URI. Element resolution does not take in account of namespace for performance reasons. When retrieving element through the [] operator, use the exact names as they appear in the XML.

void NamespaceUri()
{
    using namespace Elmax;
    RootElement root("Products");

    Element elem = root.Create("Product").Create("Grocery:Item", "http://www.example.com");
    elem.SetInt32(1234);

    std::string xml = root.ToPrettyString("    ");

    std::cout << xml << std::endl;
}

This is the output of the above code example.

<Products>
    <Product>
        <Grocery:Item xmlns:Grocery="http://www.example.com">1234</Grocery:Item>
    </Product>
</Products>

Collection

There are 2 methods to retrieve a group of elements as collection, AsCollection and GetChildren. AsCollection retrieves a collection of elements at the same level and with the same name; something like getting siblings but includes itself as well. GetChildren is self-explanatory.

void AsCollection()
{
    using namespace Elmax;
    RootElement root("Products");

    Element elem1 = root.Create("Product");
    elem1.SetAttrInt32("Qty", 400);
    elem1.SetString("Shower Cap");
    Element elem2 = root.Create("Product");
    elem2.SetAttrInt32("Qty", 600);
    elem2.SetString("Soap");
    Element elem3 = root.Create("Product");
    elem3.SetAttrInt32("Qty", 700);
    elem3.SetString("Shampoo");

    std::string xml = root.ToPrettyString("    ");
    std::cout << xml << std::endl;

    Element::collection_t vec = root["Product"].AsCollection();

    for(size_t i=0;i<vec.size(); ++i)
    {
        cout << vec[i].GetString("") << ":" << vec[i].GetAttrInt32("Qty", 0) << std::endl;
    }
}

The output is displayed below:

<Products>
    <Product Qty="400">Shower Cap</Product>
    <Product Qty="600">Soap</Product>
    <Product Qty="700">Shampoo</Product>
</Products>

Shower Cap:400
Soap:600
Shampoo:700

We can specify a predicate Lambda or functor to AsCollection or GetChildren to get elements which pass the predicate test.

void AsCollectionLambda()
{
    using namespace Elmax;
    RootElement root("Products");

    Element elem1 = root.Create("Product");
    elem1.SetAttrInt32("Qty", 400);
    elem1.SetString("Shower Cap");
    Element elem2 = root.Create("Product");
    elem2.SetAttrInt32("Qty", 600);
    elem2.SetString("Soap");
    Element elem3 = root.Create("Product");
    elem3.SetAttrInt32("Qty", 700);
    elem3.SetString("Shampoo");

    std::string xml = root.ToPrettyString("    ");
    std::cout << xml << std::endl;

    Element::collection_t vec = root["Product"].AsCollection([](Element elem){
        return (elem.GetAttrInt32("Qty", 0)>500);
    });

    for(size_t i=0;i<vec.size(); ++i)
    {
        cout << vec[i].GetString("") << ":" << vec[i].GetAttrInt32("Qty", 0) << std::endl;
    }
}

In the output, only those products with quantity more than 500 are displayed.

<Products>
    <Product Qty="400">Shower Cap</Product>
    <Product Qty="600">Soap</Product>
    <Product Qty="700">Shampoo</Product>
</Products>

Soap:600
Shampoo:700

AsCollection and GetChildren are similar in usage so I skip showing GetChildren code example.

Iterators

We can use Element::Iterator, instead of getting back a vector to iterate the collection.

void Iterators()
{
    using namespace Elmax;
    RootElement root(_TS("Products"));

    Element elem1 = root.Create("Product");
    elem1.SetAttrInt32("Qty", 400);
    elem1.SetString("Shower Cap");
    Element elem2 = root.Create("Product");
    elem2.SetAttrInt32("Qty", 600);
    elem2.SetString("Soap");
    Element elem3 = root.Create("Product");
    elem3.SetAttrInt32("Qty", 700);
    elem3.SetString("Shampoo");

    std::string xml = root.ToPrettyString("    ");
    std::cout << xml << std::endl;

    Element::Iterator it = root.Begin("*");

    for(;it!=root.End(); ++it)
    {
        cout << (*it).GetString("") << 
        ":" << (*it).GetAttrInt32("Qty", 0) << std::endl;
    }
}

By specifying "*" wildcard to Begin, I am telling the element to return all its child elements, regardless of their names. The output is shown below:

<Products>
    <Product Qty="400">Shower Cap</Product>
    <Product Qty="600">Soap</Product>
    <Product Qty="700">Shampoo</Product>
</Products>

Shower Cap:400
Soap:600
Shampoo:700

C++ LINQ

With C++ LINQ by Marten Range, we can now use LINQ to fill up our data structures with data gleamed from XML. In the code example below, we create a group of book and author elements.

void CppLinq()
{
    using namespace Elmax;
    RootElement root("Bookstore");

    Element Books = root.Create("Books");
    Element Book1 = Books.Create("Book");
    Book1.SetAttrInt32("AuthorID", 1255);
    Book1["Title"].SetString("The Joy Luck Club");
    Element Book2 = Books.Create("Book");
    Book2.SetAttrInt32("AuthorID", 2562);
    Book2["Title"].SetString("The First Phone Call from Heaven");
    Element Book3 = Books.Create("Book");
    Book3.SetAttrInt32("AuthorID", 3651);
    Book3["Title"].SetString("David and Goliath");

    Element Authors = root.Create("Authors");
    Element Author1 = Authors.Create("Author");
    Author1.SetAttrInt32("AuthorID", 1255);
    Author1["Name"].SetString("Amy Tan");
    Author1["Gender"].SetString("Female");
    Element Author2 = Authors.Create("Author");
    Author2.SetAttrInt32("AuthorID", 2562);
    Author2["Name"].SetString("Mitch Albom");
    Author2["Gender"].SetString("Male");
    Element Author3 = Authors.Create("Author");
    Author3.SetAttrInt32("AuthorID", 3651);
    Author3["Name"].SetString("Malcolm Gladwell");
    Author3["Gender"].SetString("Male");

    std::string xml = root.ToPrettyString("    ");
    std::cout << xml << std::endl;

The XML produced by Elmax is listed below:

<Bookstore>
    <Books>
        <Book AuthorID="1255">
            <Title>The Joy Luck Club</Title>
        </Book>
        <Book AuthorID="2562">
            <Title>The First Phone Call from Heaven</Title>
        </Book>
        <Book AuthorID="3651">
            <Title>David and Goliath</Title>
        </Book>
    </Books>
    <Authors>
        <Author AuthorID="1255">
            <Name>Amy Tan</Name>
            <Gender>Female</Gender>
        </Author>
        <Author AuthorID="2562">
            <Name>Mitch Albom</Name>
            <Gender>Male</Gender>
        </Author>
        <Author AuthorID="3651">
            <Name>Malcolm Gladwell</Name>
            <Gender>Male</Gender>
        </Author>
    </Authors>
</Bookstore>

Using C++ LINQ as shown below, the book and author elements are joined on the common AuthorID attribute. The title and author name will be returned in the vector of BookInfo structure while the gender information is discarded.

    using namespace cpplinq;
    struct BookInfo
    {
        std::string title;
        std::string author;
    };
    
    auto result = 
        from (root["Books"].GetChildren("Book"))
        >> join (
        from (root["Authors"].GetChildren("Author")),
        // Selects the AuthorID on book element to join on
        [](const Element& b) {return b.GetAttrInt32("AuthorID", -1);},
        // Selects the AuthorID on author element to join on
        [](const Element& a) {return a.GetAttrInt32("AuthorID", -1);},
        // Gets book title and author name
        [](const Element& b, const Element& a) -> BookInfo
        { BookInfo info = {b["Title"].GetString(""), 
        a["Name"].GetString("")}; return info;}
        )
        >> to_vector();

    for(size_t i=0;i<result.size(); ++i)
    {
        std::cout << result[i].title << " is written by " << result[i].author << std::endl;
    }
}

This is the list of BookInfo displayed.

The Joy Luck Club is written by Amy Tan
The First Phone Call from Heaven is written by Mitch Albom
David and Goliath is written by Malcolm Gladwell

Predefined Macros

There are some macros in the config.h to enable some behaviour on Portable Elmax. This section tries to shed light on what macros enable. For example, the macro below should be uncommented if you want to use wide characters for string.

//#define ELMAX_USE_UNICODE

ELMAX_USE_MFC_CSTRING must be defined if you prefer to use MFC CString. Whether it is CStringA or CStringW depends on the presence of macro ELMAX_USE_UNICODE. If this macro is disabled, then STL string is used.

//#define ELMAX_USE_MFC_CSTRING

Below are mutually exclusive macros that determine which container class to use for attributes. Available for selection are map, unordered_map, list or vector.

//#define ELMAX_USE_MAP_FOR_ATTRS
//#define ELMAX_USE_UNORDERED_MAP_FOR_ATTRS
//#define ELMAX_USE_LIST_FOR_ATTRS
#define ELMAX_USE_VECTOR_FOR_ATTRS

Conclusion

In this article, we briefly look at how to write and read element, attribute and so on. There are 105 unit tests. When you uncomment any of the predefined macros, remember to build and run the unit tests. The project is hosted at Github, so users should always download the latest source code from there. Portable Elmax will not be hosted on Nuget due to the many possible configurations, for example to use STL string or MFC CString, use ASCII or Unicode and so on. If any bugs are found, send me a copy of your config.h to help me to narrow down the problem. If the reader has any feature requests, please let me know in the article forum. Thank you for reading!

History

  • 2015-06-14: Version 0.9.5 Beta. Migrated to Github
  • 2013-11-26: Initial release

Breaking Changes in 0.9.5

  • No implicit type conversion: Implicit type accessor and mutator is removed. Accessor and mutator has to be called explicitly.
  • Use RootElement: Use RootElement for your root element to gain RAII destruction. RootElement is derived from Element.
  • Element is simplified: Element removed all other data members to be lightweight wrapper and only has one data member which is the RawElement pointer.
  • Attribute class is removed: User cannot call the Attr method to get the Attribute. Use instead the Attribute data accessors and mutators on Element class
  • [] does not support query: User cannot retrieve element by querying, for example elem["Products|Books"], use elem["Products"]["Books"]
  • [] operator is const: Since [] operator does not modify data members, now it respects const correctness (in cpplinq).
  • Create and CreateNew behaviour changed: Create and CreateNew used to create itself if the node does not exists. Now the behaviour is changed: Create is to create new child element and a name must be supplied. CreateNew is no more.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Shao Voon Wong
Software Developer (Senior)
United States United States
IT Certifications

  • IT Infrastructure Library Foundational (ITIL v3)
  • Scrum Alliance Certified Scrum Master (CSM)
  • EC-Council Certified Secure Programmer (ECSP) .NET
  • EC-Council Certified Ethical Hacker (CEH)
  • EC-Council Certified Security Analyst (ECSA)
  • Certified Secure Software Lifecycle Professional (CSSLP)

You may also be interested in...

Comments and Discussions

 
SuggestionSeparate version needed for every version of the compiler... Pin
H.Brydon7-Dec-13 8:54
memberH.Brydon7-Dec-13 8:54 
GeneralRe: Separate version needed for every version of the compiler... Pin
Wong Shao Voon7-Dec-13 22:44
memberWong Shao Voon7-Dec-13 22:44 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.160726.1 | Last Updated 12 Apr 2016
Article Copyright 2016 by Shao Voon Wong
Everything else Copyright © CodeProject, 1999-2016
Layout: fixed | fluid