Click here to Skip to main content
Click here to Skip to main content

Portable Elmax

By , 27 Nov 2013
Rate this:
Please Sign up or sign in to vote.

Table of Contents

Introduction

Portable Elmax is a cross-platform, non-validating XML DOM parser written in C++. Prior to this edition, there is another non-portable edition based on MSXML. To avoid confusion, that edition will be referred to as MS Elmax in the article. MS Elmax has superficial MFC CString support at the API boundary (meaning CString is converted to STL string before any string processing) while Portable Elmax can be flipped to use MFC CString natively by defining ELMAX_USE_MFC_CSTRING in the config.h file. This article is a short tutorial on Portable Elmax. While Portable Elmax and MS Elmax are very similar in terms of API calls, Portable Elmax is not a drop-in replacement for MS Elmax; there are some crucial differences that user must be aware of, to use the library correctly and effectively.

Writing Element

Let us see how to create and write a integer value to a element. And the explanation comes in the next paragraph.

#include "../PortableElmax/Elmax.h"

void WriteElement(std::string& xml)
{
    using namespace Elmax;
    Element root("Products");

    root["Product|Qty"]=1234;

    xml = root.ToPrettyString("    ");

    std::cout << xml << std::endl;

    root.Destroy();
}

First line of the code includes the Elmax.h header which includes all necessary XML classes you need. There is no document class. Each Element object doubles up as a document to read and save XML to file or string. The main important difference with MS Elmax, is that root must be given a name in the constructor, without which will result error when resolving the element to retrieve. Unlike MS Elmax, there is no need to call SetDomDoc or SetConverter; The library use Boost lexical_cast to perform the data type conversion. [] operator always return the first child; to retrieve children, GetChildren should be called. Destroy function must be called on the root and any element which is detached from root. Destroy will delete internal XML tree. Due to the side effect of element chaining, the library cannot automatically call Destroy on element without parent in the destructor. The only string parameter for ToPrettyString function is the indentation for pretty print. The output is listed below.

<Products>
    <Product>
        <Qty>1234</Qty>
    </Product>
</Products>

Reading Element

Next, the xml which is saved from the previous example will be read and qty is displayed.

void ReadElement(const std::string& xml)
{
    using namespace Elmax;
    Element root;
    root.ParseXMLString(xml);

    int qty = root["Product|Qty"];

    std::cout << "Qty:" << qty << std::endl;
    
    root.Destroy();
}

Notice here, the root has no name because it will be set when the xml string is parsed. Even if root is given a name in the constructor, it will be overwritten after it parsed the xml string. Value of qty is displayed below.

Qty:1234

Writing Attribute

Let us see the code to create and write attribute.

void WriteAttr(std::string& xml)
{
    using namespace Elmax;
    Element root("Products");

    Element elem = root["Product"].Create();
    elem.Attr("Qty") = 1234;

    xml = root.ToPrettyString("    ");

    std::cout << xml << std::endl;

    root.Destroy();
}

Before writing to an attribute, the element must exist, therefore it must be created with Create. Please note Create will not create if the element can be found. If there is a need to always create new elements, call CreateNew. The difference between this example and the Writing Element example, is that in latter, there is no need to call Create explicitly; the element will be create automatically if it does not exists. Below is the displayed qty value.

<Products>
    <Product Qty="1234"/>
</Products>

Reading Attribute

void ReadAttr(const std::string& xml)
{
    using namespace Elmax;
    Element root;
    root.ParseXMLString(xml);

    Element elem = root["Product"];

    int qty=0;
    if(elem.Exists())
        qty = elem.Attr("Qty");

    std::cout << "Qty:" << qty << std::endl;

    root.Destroy();
}

Before reading the attribute, care must be taken to ensure the element exists else runtime_error exception will be thrown. Speaking of exception handling, Boost bad_lexical_cast and std::exception derived exception like runtime_error could be thrown so the code should be put in try-catch. The output is displayed below.

Qty:1234

Writing Comment

Comments can be added by calling AddComment. XML comment starts with <!-- and ends with -->

void WriteComment(std::string& xml)
{
    using namespace Elmax;
    Element root("Products");

    Element elem = root["Product"].Create();
    elem.Attr("Qty") = 1234;
    elem.AddComment("Qty must not be less than 100");

    xml = root.ToPrettyString("    ");

    std::cout << xml << std::endl;

    root.Destroy();
}

This is what the comment look like in XML.

<Products>
    <Product Qty="1234">
        <!--Qty must not be less than 100-->
    </Product>
</Products>

Reading Comment

The code example below shows how to retrieve a collection of comments under a element.

void ReadComment(const std::string& xml)
{
    using namespace Elmax;
    Element root;
    root.ParseXMLString(xml);

    Element elem = root["Product"];

    int qty=0;
    if(elem.Exists())
        qty = elem.Attr("Qty");

    std::vector<Comment> vec = elem.GetCommentCollection();

    std::cout << "Qty:" << qty << std::endl;

    if(vec.size()>0)
        std::cout << "Comment:" << vec[0].GetContent() << std::endl;

    root.Destroy();
}
Qty:1234
Comment:Qty must not be less than 100

Writing CDATA Section

CDATA is (Unparsed) Character Data which the text within is ignored by the XML parser. CDATA can be added through AddCData. CDATA in XML starts with <![CDATA[ and ends with ]]>

void WriteCData(std::string& xml)
{
    using namespace Elmax;
    Element root("Products");

    Element elem = root["Product"].Create();
    elem.Attr("Qty") = 1234;
    elem.AddCData("Hello world!");

    xml = root.ToPrettyString("    ");

    std::cout << xml << std::endl;

    root.Destroy();
}

The output is shown below.

<Products>
    <Product Qty="1234">
        <![CDATA[Hello world!]]>
    </Product>
</Products>

For best practice, it is not advisable to store binary data in CDATA Section, as there could be a small possibility that ]]> will be encountered in the data. And due to the way text file library is used to read and write files, carriage return and linefeed has special meanings. Carriage return will be removed from the binary data. This is the limitation of using text file library. To overcome these limitations, it is best to store the data in Base64 format.

Reading CDATA Section

Below an example is shown how to get a CDATA by retrieving the collection first with GetCDataCollection.

void ReadCData(const std::string& xml)
{
    using namespace Elmax;
    Element root;
    root.ParseXMLString(xml);

    Element elem = root["Product"];

    int qty=0;
    if(elem.Exists())
        qty = elem.Attr("Qty");

    std::vector<CData> vec = elem.GetCDataCollection();

    std::cout << "Qty:" << qty << std::endl;

    if(vec.size()>0)
        std::cout << "CData:" << vec[0].GetContent() << std::endl;

    root.Destroy();
}

The above code display these.

Qty:1234
CData:Hello world!

Namespace

Namespace support is minimal. To create an Element under a namespace, call Create or CreateNew with a namespace URI. Element resolution do not take in account of namespace for performance reason. When retrieving element through the [] operator, use the exact names as they appear in the XML.

void NamespaceUri()
{
    using namespace Elmax;
    Element root("Products");

    Element elem = root["Product|Grocery:Item"].Create("http://www.example.com");
    elem=1234;

    std::string xml = root.ToPrettyString("    ");

    std::cout << xml << std::endl;

    root.Destroy();
}

This is the output of the above code example.

<Products>
    <Product>
        <Grocery:Item xmlns:Grocery="http://www.example.com">1234</Grocery:Item>
    </Product>
</Products>

Collection

There are 2 methods to retrieve a group of elements as collection, AsCollection and GetChildren. AsCollection retrieves a collection of elements at the same level and with the same name; something like getting siblings but includes itself as well. GetChildren is self-explanatory.

void AsCollection()
{
    using namespace Elmax;
    Element root("Products");

    Element elem1 = root["Product"].CreateNew();
    elem1.Attr("Qty") = 400;
    elem1 = "Shower Cap";
    Element elem2 = root["Product"].CreateNew();
    elem2.Attr("Qty") = 600;
    elem2 = "Soap";
    Element elem3 = root["Product"].CreateNew();
    elem3.Attr("Qty") = 700;
    elem3 = "Shampoo";

    std::string xml = root.ToPrettyString("    ");
    std::cout << xml << std::endl;

    Element::collection_t vec = root["Product"].AsCollection();

    for(size_t i=0;i<vec.size(); ++i)
    {
        cout << vec[i].GetString("") << ":" << vec[i].Attr("Qty").GetInt32(0) << std::endl;
    }

    root.Destroy();
}

The output is displayed below.

<Products>
    <Product Qty="400">Shower Cap</Product>
    <Product Qty="600">Soap</Product>
    <Product Qty="700">Shampoo</Product>
</Products>

Shower Cap:400
Soap:600
Shampoo:700

We can specify a predicate Lambda or functor to AsCollection or GetChildren to get elements which pass the predicate test.

void AsCollectionLambda()
{
    using namespace Elmax;
    Element root("Products");

    Element elem1 = root["Product"].CreateNew();
    elem1.Attr("Qty") = 400;
    elem1 = "Shower Cap";
    Element elem2 = root["Product"].CreateNew();
    elem2.Attr("Qty") = 600;
    elem2 = "Soap";
    Element elem3 = root["Product"].CreateNew();
    elem3.Attr("Qty") = 700;
    elem3 = "Shampoo";

    std::string xml = root.ToPrettyString("    ");
    std::cout << xml << std::endl;

    Element::collection_t vec = root["Product"].AsCollection([](Element elem){
        return (elem.Attr("Qty").GetInt32(0)>500);
    });


    for(size_t i=0;i<vec.size(); ++i)
    {
        cout << vec[i].GetString("") << ":" << vec[i].Attr("Qty").GetInt32(0) << std::endl;
    }

    root.Destroy();
}

In the output, only those products with quantity more than 500 are displayed.

<Products>
    <Product Qty="400">Shower Cap</Product>
    <Product Qty="600">Soap</Product>
    <Product Qty="700">Shampoo</Product>
</Products>

Soap:600
Shampoo:700

AsCollection and GetChildren are similar in usage so I skip showing GetChildren code example.

Iterators

We can use Element::Iterator, instead of getting back a vector to iterate the collection.

void Iterators()
{
    using namespace Elmax;
    Element root(_TS("Products"));

    Element elem1 = root["Product"].CreateNew();
    elem1.Attr("Qty") = 400;
    elem1 = "Shower Cap";
    Element elem2 = root["Product"].CreateNew();
    elem2.Attr("Qty") = 600;
    elem2 = "Soap";
    Element elem3 = root["Product"].CreateNew();
    elem3.Attr("Qty") = 700;
    elem3 = "Shampoo";

    std::string xml = root.ToPrettyString("    ");
    std::cout << xml << std::endl;

    Element::Iterator it = root.Begin("*");

    for(;it!=root.End(); ++it)
    {
        cout << (*it).GetString("") << ":" << (*it).Attr("Qty").GetInt32(0) << std::endl;
    }

    root.Destroy();
}

By specifying "*" wildcard to Begin, I am telling the element to return all its child elements, regardless of their names. The output is shown below.

<Products>
    <Product Qty="400">Shower Cap</Product>
    <Product Qty="600">Soap</Product>
    <Product Qty="700">Shampoo</Product>
</Products>

Shower Cap:400
Soap:600
Shampoo:700

C++ LINQ

With C++ LINQ by Marten Range, we can now use LINQ to fill up our data structures with data gleamed from XML. In the code example below, we create a group of book and author elements.

void CppLinq()
{
    using namespace Elmax;
    Element root("Bookstore");

    Element Book1 = root["Books|Book"].CreateNew();
    Book1.Attr("AuthorID") = 1255;
    Book1["Title"] = "The Joy Luck Club";
    Element Book2 = root["Books|Book"].CreateNew();
    Book2.Attr("AuthorID") = 2562;
    Book2["Title"] = "The First Phone Call from Heaven";
    Element Book3 = root["Books|Book"].CreateNew();
    Book3.Attr("AuthorID") = 3651;
    Book3["Title"] = "David and Goliath";

    Element Author1 = root["Authors|Author"].CreateNew();
    Author1.Attr("AuthorID") = 1255;
    Author1["Name"] = "Amy Tan";
    Author1["Gender"] = "Female";
    Element Author2 = root["Authors|Author"].CreateNew();
    Author2.Attr("AuthorID") = 2562;
    Author2["Name"] = "Mitch Albom";
    Author2["Gender"] = "Male";
    Element Author3 = root["Authors|Author"].CreateNew();
    Author3.Attr("AuthorID") = 3651;
    Author3["Name"] = "Malcolm Gladwell";
    Author3["Gender"] = "Male";

    std::string xml = root.ToPrettyString("    ");
    std::cout << xml << std::endl;

The XML produced by Elmax is listed below.

<Bookstore>
    <Books>
        <Book AuthorID="1255">
            <Title>The Joy Luck Club</Title>
        </Book>
        <Book AuthorID="2562">
            <Title>The First Phone Call from Heaven</Title>
        </Book>
        <Book AuthorID="3651">
            <Title>David and Goliath</Title>
        </Book>
    </Books>
    <Authors>
        <Author AuthorID="1255">
            <Name>Amy Tan</Name>
            <Gender>Female</Gender>
        </Author>
        <Author AuthorID="2562">
            <Name>Mitch Albom</Name>
            <Gender>Male</Gender>
        </Author>
        <Author AuthorID="3651">
            <Name>Malcolm Gladwell</Name>
            <Gender>Male</Gender>
        </Author>
    </Authors>
</Bookstore>

Using C++ LINQ as shown below, the book and author elements are joined on the common AuthorID attribute. The title and author name will be returned in the vector of BookInfo structure while the gender information is discarded.

    using namespace cpplinq;
    struct BookInfo
    {
        std::string title;
        std::string author;
    };
    
    auto result = 
        from (root["Books"].GetChildren("Book"))
        >> join (
        from (root["Authors"].GetChildren("Author")),
        // Selects the AuthorID on book element to join on
        [](Element b) {return b.Attr("AuthorID").GetInt32(-1);},
        // Selects the AuthorID on author element to join on
        [](Element a) {return a.Attr("AuthorID").GetInt32(-1);},
        // Gets book title and author name
        [](Element b, Element a) -> BookInfo
        { BookInfo info = {b["Title"], a["Name"]}; return info;}
        )
        >> to_vector();

    for(size_t i=0;i<result.size(); ++i)
    {
        std::cout << result[i].title << " is written by " << result[i].author << std::endl;
    }

    root.Destroy();
}

This is the list of BookInfo displayed.

The Joy Luck Club is written by Amy Tan
The First Phone Call from Heaven is written by Mitch Albom
David and Goliath is written by Malcolm Gladwell

Predefined Macros

There are some macros in the config.h to enable some behaviour on Portable Elmax. This section tries to shed light on what macros enable. For example, the macro below should be uncommented if you want to use wide characters for string.

//#define ELMAX_USE_UNICODE

ELMAX_USE_MFC_CSTRING must be defined if you prefer to use MFC CString. Whether it is CStringA or CStringW depends on the presence of macro ELMAX_USE_UNICODE. If this macro is disabled, then STL string is used.

//#define ELMAX_USE_MFC_CSTRING

Define macro below if you only use | as separator, not \\ and / for the element resolution in [] operator.

//#define ELMAX_DISABLE_FORWARD_BACKWARD_SEPARATOR

Below are mutually exclusive macros determine which container class to use for attributes. Available for selection are map, unordered_map, list or vector.

//#define ELMAX_USE_MAP_FOR_ATTRS
//#define ELMAX_USE_UNORDERED_MAP_FOR_ATTRS
//#define ELMAX_USE_LIST_FOR_ATTRS
#define ELMAX_USE_VECTOR_FOR_ATTRS

Conclusion

In this article, we briefly look at how to write and read element, attribute and so on. There are 169 unit tests. When you uncomment any of the predefined macros, remember to build and run the unit tests. The project is hosted at Sourceforge, so users should always download the latest source code from there. Portable Elmax will not be hosted on Nuget due to the many possible configurations, for example to use STL string or MFC CString, use ASCII or Unicode and so on. If any bugs are found, send me a copy of your config.h to help me to narrow down the problem. If the reader have any feature requests, please let me know in the article forum. Thank you for reading!

History

  • 2013-11-26: Initial Release

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Wong Shao Voon
Software Developer McGraw-Hill Financial
Singapore Singapore

Currently into areas like 3D graphics and application security. Hoping to revisit the cryptography and design pattern topics if time permits.

Follow on   Twitter   Google+   LinkedIn

Comments and Discussions

 
NewsVote for which XML validator for Portable Elmax PinmemberWong Shao Voon11-Dec-13 17:12 
SuggestionSeparate version needed for every version of the compiler... PinmemberH.Brydon7-Dec-13 8:54 
GeneralRe: Separate version needed for every version of the compiler... PinmemberWong Shao Voon7-Dec-13 22:44 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140415.2 | Last Updated 27 Nov 2013
Article Copyright 2013 by Wong Shao Voon
Everything else Copyright © CodeProject, 1999-2014
Terms of Use
Layout: fixed | fluid