Introduction
Everyone needs to parse XML nowadays. I found it hard to find good example source code in C++ -- most of the code seemed written in an old-fashioned style without templates, or were aimed at C# or Visual Basic. Hence, this article provides an example.
Parsing is done using MSXML, and I use ATL "smart pointers" to avoid the need to manually release everything. Note that MSXML is Unicode, through and through. It's a big waste of effort trying to use it with multi-byte/ASCII.
The accompanying source code has project files for embedded Visual C++ (.vcw .vcp), Visual C++ .NET (.sln .vcproj) and Borland C++Builder5 (.bpr .bpf). But not for Visual C++6, since that didn't ship with recent-enough MSXML headers.
PocketPC considerations: I use XML to store my configuration files. They have grown to about 80k each, and on the PocketPC, it takes 2 seconds to parse them. Therefore, I actually parse it into a more efficient memory-block structure, and write this memory block to disk. That way, I only need to re-parse if there have been any changes.
Preliminaries
Setup depends on which development environment you're using:
- Visual Studio .NET -- fine as it is
- Borland C++ Builder -- under Project > Options > Directories, add ($BCB)\include\atl
- eMbedded Visual C++ (EVC) -- download the free STL port made by Giuseppe Govi, and put it in a subdirectory "stl_eVC" of your project
#include <windows.h>
#include <msxml.h>
#include <objsafe.h>
#include <objbase.h>
#include <atlbase.h>
#pragma warning( push )
#pragma warning( disable: 4018 4786)
#include <string>
#pragma warning( pop )
using namespace std;
(The warning-disabler is just for EVC, which generates spurious warnings otherwise.)
Also, CoInitializeEx(NULL,COINIT_MULTITHREADED);
beforehand (normally at the start of WinMain
), and CoUninitialize();
afterwards (normally at the end of WinMain
).
Actually, CoInitialize(NULL)
is easier when compiling for desktop win32, since it works on Win'95 and hence doesn't require you to define _WIN32_WINNT
. But it's not available on PocketPC.
XML Parsing
This is how to load the XML document. It uses the magic of ATL's safe pointers, to avoid the need to Release()
everything afterwards. (For simplicity, error-checking has been omitted.)
CComPtr<IXMLDOMDocument> iXMLDoc;
iXMLDoc.CoCreateInstance(__uuidof(DOMDocument));
#ifdef UNDER_CE
iXMLDoc->put_async(VARIANT_FALSE);
CComQIPtr<IObjectSafety,&IID_IObjectSafety> isafe(iXMLDoc);
if (iSafety)
{ DWORD dwSupported, dwEnabled;
isafe->GetInterfaceSafetyOptions(IID_IXMLDOMDocument,
&dwSupported,&dwEnabled);
isafe->SetInterfaceSafetyOptions(IID_IXMLDOMDocument,
dwSupported,0);
}
#endif
VARIANT_BOOL bSuccess=false;
iXMLDoc->load(CComVariant(url),&bSuccess);
CComPtr<IXMLDOMElement> iRootElm;
iXMLDoc->get_documentElement(&iRootElm);
As for accessing the elements and iterating over them, I wrote a tiny helper class TElem
. Here's the example XML document that I'll demonstrate it with:
="1.0"="utf-16"
<root desc="Simple Prog">
<text>Hello World</text>
<layouts>
<lay pos="15" bold="true"/>
<layoff pos="12"/>
<layin pos="17"/>
</layouts>
</root>
And this is how to use TElem
:
TElem eroot(iRootElm);
wstring desc = eroot.attr(L"desc");
TElem etext = eroot.subnode(L"text");
wstring s = etext.val();
s = eroot.subval(L"text");
TElem elays = eroot.subnode(L"layouts");
for (TElem e=elays.begin(); e!=elays.end(); e++)
{ int pos = e.attrInt(L"pos",-1);
bool bold = e.attrBool(L"bold",false);
wstring id = e.name();
}
Again, there's no need to release TElem
- that's done automatically. The full list of methods in TElem
:
wstring TElem::name() const;
wstring TElem::val() const;
wstring TElem::attr(const wstring name) const;
TElem TElem::subnode(const wstring name) const;
wstring TElem::subval(const wstring name) const;
for (TElem c=e.begin(); c!=e.end(); c++) {...}
Source Code for TElem
Note in this source code the use of CComPtr
and CComQIPtr
and CComBSTR
. These are lovely "safe-pointers" provided by the ATL, and mean that we needn't bother with Release()
.
I'm a bit of a miser, and so included iterator functionality in TElem
, rather than writing a separate TElemIterator
class.
struct TElem
{ CComPtr<IXMLDOMElement> elem;
CComPtr<IXMLDOMNodeList> nlist; int pos; long clen;
TElem() :
elem(0), nlist(0), pos(-1), clen(0) {}
TElem(int _clen) :
elem(0),nlist(0),pos(-1),clen(_clen) {}
TElem(CComPtr<IXMLDOMElement> _elem) :
elem(_elem), nlist(0), pos(-1), clen(0) {get();}
TElem(CComPtr<IXMLDOMNodeList> _nlist) :
elem(0), nlist(_nlist), pos(0), clen(0) {get();}
void get()
{ if (pos!=-1)
{ elem=0;
CComPtr<IXMLDOMNode> inode;
nlist->get_item(pos,&inode);
if (inode==0) return;
DOMNodeType type; inode->get_nodeType(&type);
if (type!=NODE_ELEMENT) return;
CComQIPtr<IXMLDOMElement> e(inode);
elem=e;
}
clen=0; if (elem!=0)
{ CComPtr<IXMLDOMNodeList> iNodeList;
elem->get_childNodes(&iNodeList);
iNodeList->get_length(&clen);
}
}
wstring name() const
{ if (!elem) return L"";
CComBSTR bn; elem->get_tagName(&bn);
return wstring(bn);
}
wstring attr(const wstring name) const
{ if (!elem) return L"";
CComBSTR bname(name.c_str());
CComVariant val(VT_EMPTY);
elem->getAttribute(bname,&val);
if (val.vt==VT_BSTR) return val.bstrVal;
return L"";
}
bool attrBool(const wstring name,bool def) const
{ wstring a = attr(name);
if (a==L"true" || a==L"TRUE") return true;
else if (a==L"false" || a==L"FALSE") return false;
else return def;
}
int attrInt(const wstring name, int def) const
{ wstring a = attr(name);
int i, res=swscanf(a.c_str(),L"%i",&i);
if (res==1) return i; else return def;
}
wstring val() const
{ if (!elem) return L"";
CComVariant val(VT_EMPTY);
elem->get_nodeTypedValue(&val);
if (val.vt==VT_BSTR) return val.bstrVal;
return L"";
}
TElem subnode(const wstring name) const
{ if (!elem) return TElem();
for (TElem c=begin(); c!=end(); c++)
{ if (c.name()==name) return c;
}
return TElem();
}
wstring subval(const wstring name) const
{ if (!elem) return L"";
TElem c=subnode(name);
return c.val();
}
TElem begin() const
{ if (!elem) return TElem();
CComPtr<IXMLDOMNodeList> iNodeList;
elem->get_childNodes(&iNodeList);
return TElem(iNodeList);
}
TElem end() const
{ return TElem(clen);
}
TElem operator++(int)
{ if (pos!=-1) {pos++; get();}
return *this;
}
bool operator!=(const TElem &e) const
{ return pos!=e.clen;
}
};
Lucian studied theoretical computer science in Cambridge and Bologna, and then moved into the computer industry. Since 2004 he's been paid to do what he loves -- designing and implementing programming languages! The articles he writes on CodeProject are entirely his own personal hobby work, and do not represent the position or guidance of the company he works for. (He's on the VB/C# language team at Microsoft).