|
|||||||||||||||||||||
|
|||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionEveryone needs to parse XML nowadays. I found it hard to find good example source code in C++ -- most of the code seemed written in an old-fashioned style without templates, or were aimed at C# or Visual Basic. Hence, this article provides an example. Parsing is done using MSXML, and I use ATL "smart pointers" to avoid the need to manually release everything. Note that MSXML is Unicode, through and through. It's a big waste of effort trying to use it with multi-byte/ASCII. The accompanying source code has project files for embedded Visual C++ (.vcw .vcp), Visual C++ .NET (.sln .vcproj) and and Borland C++Builder5 (.bpr .bpf). But not for Visual C++6, since that didn't ship with recent-enough MSXML headers. PocketPC considerations: I use XML to store my configuration files. They have grown to about 80k each, and on the PocketPC it takes 2 seconds to parse them. Therefore, I actually parse it into a more efficient memory-block structure, and write this memory block to disk. That way, I only need to re-parse if there have been any changes. PreliminariesSetup depends on which development environment you're using:
#include <windows.h> #include <msxml.h> #include <objsafe.h> #include <objbase.h> #include <atlbase.h> #pragma warning( push ) #pragma warning( disable: 4018 4786) #include <string> #pragma warning( pop ) using namespace std; (The warning-disabler is just for EVC, which generates spurious warnings otherwise.) Also, Actually, XML ParsingThis is how to load the XML document. It uses the magic of ATL's safe pointers, to avoid the need to CComPtr<IXMLDOMDocument> iXMLDoc; iXMLDoc.CoCreateInstance(__uuidof(DOMDocument)); #ifdef UNDER_CE // Following is a bugfix for PocketPC. iXMLDoc->put_async(VARIANT_FALSE); CComQIPtr<IObjectSafety,&IID_IObjectSafety> isafe(iXMLDoc); if (iSafety) { DWORD dwSupported, dwEnabled; isafe->GetInterfaceSafetyOptions(IID_IXMLDOMDocument, &dwSupported,&dwEnabled); isafe->SetInterfaceSafetyOptions(IID_IXMLDOMDocument, dwSupported,0); } #endif // Load the file. VARIANT_BOOL bSuccess=false; // Can load it from a url/filename... iXMLDoc->load(CComVariant(url),&bSuccess); // or from a BSTR... //iXMLDoc->loadXML(CComBSTR(s),&bSuccess); // Get a pointer to the root CComPtr<IXMLDOMElement> iRootElm; iXMLDoc->get_documentElement(&iRootElm); // Thanks to the magic of CComPtr, we never need call // Release() -- that gets done automatically. As for accessing the elements and iterating over them, I wrote a tiny helper class <?xml version="1.0" encoding="utf-16"?>
<root desc="Simple Prog">
<text>Hello World</text>
<layouts>
<lay pos="15" bold="true"/>
<layoff pos="12"/>
<layin pos="17"/>
</layouts>
</root>
And this is how to use TElem eroot(iRootElm); wstring desc = eroot.attr(L"desc"); // returns "Simple Prog" TElem etext = eroot.subnode(L"text"); wstring s = etext.val(); // returns "Hello World" s = eroot.subval(L"text"); // This is a shorter way to achieve the same thing TElem elays = eroot.subnode(L"layouts"); for (TElem e=elays.begin(); e!=elays.end(); e++) { int pos = e.attrInt(L"pos",-1); bool bold = e.attrBool(L"bold",false); // we suggest defaults, in case the attribute is missing wstring id = e.name(); // returns "lay" or "layoff" or "layin" } Again, there's no need to release // TElem -- a simple class to wrap up IXMLDomElement // and to iterate its children. wstring TElem::name() const; // in <item>stuff</item> it returns "item" wstring TElem::val() const; // in <item>stuff</item> it returns "stuff" wstring TElem::attr(const wstring name) const; // in <item name="hello">stuff</item> it returns "hello" // int x=e.attrInt(L"a",2) // bool b=e.attrBool(L"a",true), // We supply defaults in case the attribute was absent. TElem TElem::subnode(const wstring name) const; // in <item><a>hello</a><name>there</name></item> // it returns the TElem <name>there</name> wstring TElem::subval(const wstring name) const; // in <item><a>hello</a><name>there</name></item> // it returns "there" for (TElem c=e.begin(); c!=e.end(); c++) {...} // iterates over the subnodes Source code for TElemNote in this source code the use of I'm a bit of a miser, and so included iterator functionality in struct TElem { CComPtr<IXMLDOMElement> elem; CComPtr<IXMLDOMNodeList> nlist; int pos; long clen; TElem() : elem(0), nlist(0), pos(-1), clen(0) {} TElem(int _clen) : elem(0),nlist(0),pos(-1),clen(_clen) {} TElem(CComPtr<IXMLDOMElement> _elem) : elem(_elem), nlist(0), pos(-1), clen(0) {get();} TElem(CComPtr<IXMLDOMNodeList> _nlist) : elem(0), nlist(_nlist), pos(0), clen(0) {get();} void get() { if (pos!=-1) { elem=0; CComPtr<IXMLDOMNode> inode; nlist->get_item(pos,&inode); if (inode==0) return; DOMNodeType type; inode->get_nodeType(&type); if (type!=NODE_ELEMENT) return; CComQIPtr<IXMLDOMElement> e(inode); elem=e; } clen=0; if (elem!=0) { CComPtr<IXMLDOMNodeList> iNodeList; elem->get_childNodes(&iNodeList); iNodeList->get_length(&clen); } } // wstring name() const { if (!elem) return L""; CComBSTR bn; elem->get_tagName(&bn); return wstring(bn); } wstring attr(const wstring name) const { if (!elem) return L""; CComBSTR bname(name.c_str()); CComVariant val(VT_EMPTY); elem->getAttribute(bname,&val); if (val.vt==VT_BSTR) return val.bstrVal; return L""; } bool attrBool(const wstring name,bool def) const { wstring a = attr(name); if (a==L"true" || a==L"TRUE") return true; else if (a==L"false" || a==L"FALSE") return false; else return def; } int attrInt(const wstring name, int def) const { wstring a = attr(name); int i, res=swscanf(a.c_str(),L"%i",&i); if (res==1) return i; else return def; } wstring val() const { if (!elem) return L""; CComVariant val(VT_EMPTY); elem->get_nodeTypedValue(&val); if (val.vt==VT_BSTR) return val.bstrVal; return L""; } TElem subnode(const wstring name) const { if (!elem) return TElem(); for (TElem c=begin(); c!=end(); c++) { if (c.name()==name) return c; } return TElem(); } wstring subval(const wstring name) const { if (!elem) return L""; TElem c=subnode(name); return c.val(); } TElem begin() const { if (!elem) return TElem(); CComPtr<IXMLDOMNodeList> iNodeList; elem->get_childNodes(&iNodeList); return TElem(iNodeList); } TElem end() const { return TElem(clen); } TElem operator++(int) { if (pos!=-1) {pos++; get();} return *this; } bool operator!=(const TElem &e) const { return pos!=e.clen; } }; | ||||||||||||||||||||