This article is a short tutorial on a cross-platform C++ XML DOM library called Portable Elmax. We briefly look at how to write and read element, attribute and so on.
Table of Contents
Portable Elmax is a cross-platform, non-validating XML DOM parser written in C++. Prior to this edition, there is another non-portable edition based on MSXML. To avoid confusion, that edition will be referred to as MS Elmax in the article. MS Elmax has superficial MFC CString
support at the API boundary (meaning CString
is converted to STL string
before any string
processing) while Portable Elmax can be flipped to use MFC CString
natively by defining ELMAX_USE_MFC_CSTRING
in the config.h file. This article is a short tutorial on Portable Elmax. While Portable Elmax and MS Elmax are very similar in terms of API calls, Portable Elmax is not a drop-in replacement for MS Elmax; there are some crucial differences that the user must be aware of, to use the library correctly and effectively.
Let us see how to create and write an integer value to an element. And the explanation comes in the next paragraph.
#include "../PortableElmax/Elmax.h"
void WriteElement(std::string& xml)
{
using namespace Elmax;
RootElement root("Products");
root.Create("Product").Create("Qty").SetInt32(1234);
xml = root.ToPrettyString(" ");
std::cout << xml << std::endl;
}
First line of the code includes the Elmax.h header which includes all necessary XML classes you need. There is no document class. Each Element
object doubles up as a document to read and save XML to file or string
. The main important difference with MS Elmax, is that root
must be given a name in the constructor, without which will result error when resolving the element to retrieve. Unlike MS Elmax, there is no need to call SetDomDoc
or SetConverter
; The library uses Boost lexical_cast
to perform the data type conversion. []
operator always return the first child; to retrieve children, GetChildren
should be called. Destroy
function must be called any element which is detached from root. Destroy
will delete internal XML tree. The only string
parameter for ToPrettyString
function is the indentation for pretty print. The output is listed below:
<Products>
<Product>
<Qty>1234</Qty>
</Product>
</Products>
Next, the xml
which is saved from the previous example will be read and qty
is displayed.
void ReadElement(const std::string& xml)
{
using namespace Elmax;
RootElement root;
root.ParseXMLString(xml);
int qty = root["Product"]["Qty"].GetInt32(0);
std::cout << "Qty:" << qty << std::endl;
}
Notice here, the root
has no name because it will be set when the xml
string
is parsed. Even if root
is given a name in the constructor, it will be overwritten after it parsed the xml
string
. Value of qty
is displayed below:
Qty:1234
Let us see the code to create and write attribute.
void WriteAttr(std::string& xml)
{
using namespace Elmax;
RootElement root("Products");
Element elem = root.Create("Product");
elem.SetAttrInt32("Qty", 1234);
xml = root.ToPrettyString(" ");
std::cout << xml << std::endl;
}
Before writing to an attribute, the element must exist, therefore it must be created with Create
. Below is the displayed qty
value.
<Products>
<Product Qty="1234"/>
</Products>
void ReadAttr(const std::string& xml)
{
using namespace Elmax;
RootElement root;
root.ParseXMLString(xml);
Element elem = root["Product"];
int qty = elem.GetAttrInt32("Qty", 0);
std::cout << "Qty:" << qty << std::endl;
}
Before reading the attribute, care must be taken to ensure the element exists else runtime_error
exception will be thrown. Speaking of exception handling, Boost bad_lexical_cast
and std::exception
derived exception like runtime_error
could be thrown so the code should be put in try-catch
. The output is displayed below:
Qty:1234
Comments can be added by calling AddComment
. XML comment starts with <!--
and ends with -->
:
void WriteComment(std::string& xml)
{
using namespace Elmax;
RootElement root("Products");
Element elem = root.Create("Product");
elem.SetAttrInt32("Qty", 1234);
elem.AddComment("Qty must not be less than 100");
xml = root.ToPrettyString(" ");
std::cout << xml << std::endl;
}
This is what the comment looks like in XML:
<Products>
<Product Qty="1234">
</Product>
</Products>
The code example below shows how to retrieve a collection of comments under an element:
void ReadComment(const std::string& xml)
{
using namespace Elmax;
RootElement root;
root.ParseXMLString(xml);
Element elem = root["Product"];
int qty = elem.GetAttrInt32("Qty", 0);
std::vector<Comment> vec = elem.GetCommentCollection();
std::cout << "Qty:" << qty << std::endl;
if(vec.size()>0)
std::cout << "Comment:" << vec[0].GetContent() << std::endl;
}
Qty:1234
Comment:Qty must not be less than 100
CDATA
is (Unparsed) Character Data which the text within is ignored by the XML parser. CDATA
can be added through AddCData
. CDATA
in XML starts with <![CDATA[
and ends with ]]>
.
void WriteCData(std::string& xml)
{
using namespace Elmax;
RootElement root("Products");
Element elem = root.Create("Product");
elem.SetAttrInt32("Qty", 1234);
elem.AddCData("Hello world!");
xml = root.ToPrettyString(" ");
std::cout << xml << std::endl;
}
The output is shown below:
<Products>
<Product Qty="1234">
<![CDATA[
</Product>
</Products>
For best practice, it is not advisable to store binary data in CDATA
Section, as there could be a small possibility that ]]>
will be encountered in the data. And due to the way in which text file library is used to read and write files, carriage return and linefeed have special meanings. Carriage return will be removed from the binary data. This is the limitation of using text file library. To overcome these limitations, it is best to store the data in Base64 format.
Below, an example is shown how to get a CDATA
by retrieving the collection first with GetCDataCollection
.
void ReadCData(const std::string& xml)
{
using namespace Elmax;
RootElement root;
root.ParseXMLString(xml);
Element elem = root["Product"];
int qty = elem.GetAttrInt32("Qty", 0);
std::vector<CData> vec = elem.GetCDataCollection();
std::cout << "Qty:" << qty << std::endl;
if(vec.size()>0)
std::cout << "CData:" << vec[0].GetContent() << std::endl;
}
The above code displays these.
Qty:1234
CData:Hello world!
Namespace
support is minimal. To create an Element
under a namespace
, call Create
with a namespace
URI. Element
resolution does not take in account of namespace
for performance reasons. When retrieving element
through the []
operator, use the exact names as they appear in the XML.
void NamespaceUri()
{
using namespace Elmax;
RootElement root("Products");
Element elem = root.Create("Product").Create("Grocery:Item", "http://www.example.com");
elem.SetInt32(1234);
std::string xml = root.ToPrettyString(" ");
std::cout << xml << std::endl;
}
This is the output of the above code example:
<Products>
<Product>
<Grocery:Item xmlns:Grocery="http://www.example.com">1234</Grocery:Item>
</Product>
</Products>
There are two methods to retrieve a group of elements as collection, AsCollection
and GetChildren
. AsCollection
retrieves a collection of elements at the same level and with the same name; something like getting siblings but includes itself as well. GetChildren
is self-explanatory.
void AsCollection()
{
using namespace Elmax;
RootElement root("Products");
Element elem1 = root.Create("Product");
elem1.SetAttrInt32("Qty", 400);
elem1.SetString("Shower Cap");
Element elem2 = root.Create("Product");
elem2.SetAttrInt32("Qty", 600);
elem2.SetString("Soap");
Element elem3 = root.Create("Product");
elem3.SetAttrInt32("Qty", 700);
elem3.SetString("Shampoo");
std::string xml = root.ToPrettyString(" ");
std::cout << xml << std::endl;
Element::collection_t vec = root["Product"].AsCollection();
for(size_t i=0;i<vec.size(); ++i)
{
cout << vec[i].GetString("") << ":" << vec[i].GetAttrInt32("Qty", 0) << std::endl;
}
}
The output is displayed below:
<Products>
<Product Qty="400">Shower Cap</Product>
<Product Qty="600">Soap</Product>
<Product Qty="700">Shampoo</Product>
</Products>
Shower Cap:400
Soap:600
Shampoo:700
We can specify a predicate Lambda or functor to AsCollection
or GetChildren
to get elements which pass the predicate test.
void AsCollectionLambda()
{
using namespace Elmax;
RootElement root("Products");
Element elem1 = root.Create("Product");
elem1.SetAttrInt32("Qty", 400);
elem1.SetString("Shower Cap");
Element elem2 = root.Create("Product");
elem2.SetAttrInt32("Qty", 600);
elem2.SetString("Soap");
Element elem3 = root.Create("Product");
elem3.SetAttrInt32("Qty", 700);
elem3.SetString("Shampoo");
std::string xml = root.ToPrettyString(" ");
std::cout << xml << std::endl;
Element::collection_t vec = root["Product"].AsCollection([](Element elem){
return (elem.GetAttrInt32("Qty", 0)>500);
});
for(size_t i=0;i<vec.size(); ++i)
{
cout << vec[i].GetString("") << ":" << vec[i].GetAttrInt32("Qty", 0) << std::endl;
}
}
In the output, only those products with quantity more than 500 are displayed.
<Products>
<Product Qty="400">Shower Cap</Product>
<Product Qty="600">Soap</Product>
<Product Qty="700">Shampoo</Product>
</Products>
Soap:600
Shampoo:700
AsCollection
and GetChildren
are similar in usage so I skip showing GetChildren
code example.
We can use Element::Iterator
, instead of getting back a vector
to iterate the collection.
void Iterators()
{
using namespace Elmax;
RootElement root(_TS("Products"));
Element elem1 = root.Create("Product");
elem1.SetAttrInt32("Qty", 400);
elem1.SetString("Shower Cap");
Element elem2 = root.Create("Product");
elem2.SetAttrInt32("Qty", 600);
elem2.SetString("Soap");
Element elem3 = root.Create("Product");
elem3.SetAttrInt32("Qty", 700);
elem3.SetString("Shampoo");
std::string xml = root.ToPrettyString(" ");
std::cout << xml << std::endl;
Element::Iterator it = root.Begin("*");
for(;it!=root.End(); ++it)
{
cout << (*it).GetString("") <<
":" << (*it).GetAttrInt32("Qty", 0) << std::endl;
}
}
By specifying "*
" wildcard to Begin
, I am telling the element to return all its child elements, regardless of their names. The output is shown below:
<Products>
<Product Qty="400">Shower Cap</Product>
<Product Qty="600">Soap</Product>
<Product Qty="700">Shampoo</Product>
</Products>
Shower Cap:400
Soap:600
Shampoo:700
With C++ LINQ by Marten Range, we can now use LINQ to fill up our data structures with data gleamed from XML. In the code example below, we create a group of book
and author
elements.
void CppLinq()
{
using namespace Elmax;
RootElement root("Bookstore");
Element Books = root.Create("Books");
Element Book1 = Books.Create("Book");
Book1.SetAttrInt32("AuthorID", 1255);
Book1["Title"].SetString("The Joy Luck Club");
Element Book2 = Books.Create("Book");
Book2.SetAttrInt32("AuthorID", 2562);
Book2["Title"].SetString("The First Phone Call from Heaven");
Element Book3 = Books.Create("Book");
Book3.SetAttrInt32("AuthorID", 3651);
Book3["Title"].SetString("David and Goliath");
Element Authors = root.Create("Authors");
Element Author1 = Authors.Create("Author");
Author1.SetAttrInt32("AuthorID", 1255);
Author1["Name"].SetString("Amy Tan");
Author1["Gender"].SetString("Female");
Element Author2 = Authors.Create("Author");
Author2.SetAttrInt32("AuthorID", 2562);
Author2["Name"].SetString("Mitch Albom");
Author2["Gender"].SetString("Male");
Element Author3 = Authors.Create("Author");
Author3.SetAttrInt32("AuthorID", 3651);
Author3["Name"].SetString("Malcolm Gladwell");
Author3["Gender"].SetString("Male");
std::string xml = root.ToPrettyString(" ");
std::cout << xml << std::endl;
The XML produced by Elmax is listed below:
<Bookstore>
<Books>
<Book AuthorID="1255">
<Title>The Joy Luck Club</Title>
</Book>
<Book AuthorID="2562">
<Title>The First Phone Call from Heaven</Title>
</Book>
<Book AuthorID="3651">
<Title>David and Goliath</Title>
</Book>
</Books>
<Authors>
<Author AuthorID="1255">
<Name>Amy Tan</Name>
<Gender>Female</Gender>
</Author>
<Author AuthorID="2562">
<Name>Mitch Albom</Name>
<Gender>Male</Gender>
</Author>
<Author AuthorID="3651">
<Name>Malcolm Gladwell</Name>
<Gender>Male</Gender>
</Author>
</Authors>
</Bookstore>
Using C++ LINQ as shown below, the book
and author
elements are joined on the common AuthorID
attribute. The title
and author
name will be returned in the vector
of BookInfo
structure while the gender information is discarded.
using namespace cpplinq;
struct BookInfo
{
std::string title;
std::string author;
};
auto result =
from (root["Books"].GetChildren("Book"))
>> join (
from (root["Authors"].GetChildren("Author")),
[](const Element& b) {return b.GetAttrInt32("AuthorID", -1);},
[](const Element& a) {return a.GetAttrInt32("AuthorID", -1);},
[](const Element& b, const Element& a) -> BookInfo
{ BookInfo info = {b["Title"].GetString(""),
a["Name"].GetString("")}; return info;}
)
>> to_vector();
for(size_t i=0;i<result.size(); ++i)
{
std::cout << result[i].title << " is written by " << result[i].author << std::endl;
}
}
This is the list of BookInfo
displayed.
The Joy Luck Club is written by Amy Tan
The First Phone Call from Heaven is written by Mitch Albom
David and Goliath is written by Malcolm Gladwell
There are some macros in the config.h to enable some behaviour on Portable Elmax. This section tries to shed light on what macros enable. For example, the macro below should be uncommented if you want to use wide characters for string.
ELMAX_USE_MFC_CSTRING
must be defined if you prefer to use MFC CString
. Whether it is CStringA
or CStringW
depends on the presence of macro ELMAX_USE_UNICODE
. If this macro is disabled, then STL string
is used.
Below are mutually exclusive macros that determine which container class to use for attributes. Available for selection are map
, unordered_map
, list
or vector
.
#define ELMAX_USE_VECTOR_FOR_ATTRS
In this article, we briefly looked at how to write and read element, attribute and so on. There are 105 unit tests. When you uncomment any of the predefined macros, remember to build and run the unit tests. The project is hosted at Github, so users should always download the latest source code from there. Portable Elmax will not be hosted on Nuget due to the many possible configurations, for example, to use STL string
or MFC CString
, use ASCII or Unicode and so on. If any bugs are found, send me a copy of your config.h to help me to narrow down the problem. If the reader has any feature requests, please let me know in the article forum. Thank you for reading!
- 2024-04-27: Version 0.9.9 fixed
RawElement::ReadAttributeValue
with PJ Arends solution to accept single quote enclosed attributes and the unit test compiler and linker errors. - 2022-06-28: Version 0.9.8 removed Boost
lexical_cast
- 2020-08-11: Version 0.9.7 fixed the
RawElement::PrettyTraverse
reported by PJ Arends by checking the start element is written before writing the closing >
- 2020-05-03: Version 0.9.6 with missing function implementations and bug fixes by PJ Arends
- 2015-06-14: Version 0.9.5 Beta. Migrated to Github
- 2013-11-26: Initial release
- No implicit type conversion: Implicit type accessor and mutator is removed. Accessor and mutator has to be called explicitly.
- Use RootElement: Use
RootElement
for your root element to gain RAII destruction. RootElement
is derived from Element
. - Element is simplified:
Element
removed all other data members to be lightweight wrapper and only has one data member which is the RawElement
pointer. - Attribute class is removed: User cannot call the
Attr
method to get the Attribute
. Use instead the Attribute
data accessors and mutators on Element
class. - [] does not support query: User cannot retrieve element by querying, for example
elem["Products|Books"]
, use elem["Products"]["Books"]
. - [] operator is const: Since
[]
operator does not modify data members, now it respects const
correctness (in cpplinq
). - Create and CreateNew behaviour changed:
Create
and CreateNew
used to create itself if the node does not exist. Now the behaviour is changed: Create
is to create new child element and a name must be supplied. CreateNew
is no more.