Click here to Skip to main content
Click here to Skip to main content

CXMLFile - A Simple C++ XML Parser

By , 19 Mar 2008
 

Introduction

This article is about a simple and fast C++ XML parser class. There is often a need for an effective XML parser that is able to load the XML document, validate it, and browse it. In .NET environment there is a large native support for handling a lot of types of XML documents, but the same native support is missing from the original C++, MFC etc. There is, however, a COM alternative for XML file parsing and handling but it takes some time to learn it, and to use it in the right way.

This article is a simple attempt to make a C++ developer's life a bit easier than it usually is. This is support for handling the well-formed XML documents in the simplest possible way: load it, validate it, and browse it. This supports the following XML elements:

  • A simple TAG element, like <Element>
  • A simple ATTRIBUTE element, like Attribute="Value"
  • A simple TEXT element, like [Text]

Below is an example of a simple XML file that is supported:

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <note>
        <to>Tove</to>
        <from>Jani</from>

        <heading>Reminder</heading>
        <body>Don't forget me this weekend!</body>
    </note>

The presented XML classes are able to load this type of XML document, check if it is well-formed, and browse throughout its content. There are only two classes that provide this functionality.

The first class is called the CXMLFile class, and its main purpose is to load an XML file, validate its structure, and create an XML element collection out of its content. This collection of XML elements will represent the loaded XML file in the system memory. Its easy then to modify the inner struture of this collection, that is, to modify the XML file itself. This class also supports the loading of XML files from the hard-disk or from the memory stream, which is a special usage (ie. on some web server). The CXMLFile class can also output the XML element collection from the system memory to the file on the hard-disk.

The second class is called the CXMLElement class. It is used by the previous class, and will be used by the developer when browsing or modifying the inner structure of an XML file in the system memory, that is, when modifying the inner structure of the XML element collection. It has the basic support for the appending of this collection, and browsing it. It can provide information regarding the name, type or value of the current XML element from the collection.

Background

There are many articles on the CodeProject considering this topic, and this is a small contribution to these articles population. Hope that the readers and developers will find it useful in their everyday work.

Using the Code

It's quite easy to load an XML document from the hard-disk. See an example below:

#include "XMLFile.h"

...

_TCHAR lpszXMLFilePath[] = _T("A path to the XML file here...");
CXMLFile xmlFile;
if (xmlFile.LoadFromFile(lpszXMLFilePath))
{
   // Success
}
else
{
   // Error
}

To load an XML document from the memory stream:

...

// lpData and dwDataSize are obtained elsewhere

CXMLFile xmlFile;
if (xmlFile.LoadFromStream(lpData, dwDataSize))
{
    // Success
}
else
{
    // Error
}

To save the XML element collection to the file on the hard-disk, do the following:

if (xmlFile.SaveToFile(lpszXMLFilePath))
{
    // Success
}
else
{
    // Error
}

After the call to LoadFromFile(), a method of the CXMLFile class, the validation and parsing of the custom XML file will be done. If the XML file is well-formed, it will be loaded in the system memory as collection of CXMLElement elements. One can gain access to this collection using another method of the CXMLFile class called GetRoot(). See below:

CXMLEElement* pRoot = xmlFile.GetRoot();

Having the pointer to the root-element of the XML collection in the system memory, there are some things that can be done here. The root-element of the collection is of the CXMLEElement class type. Here are the methods available:

// Returns the name of the current XML element
LPTSTR GetElementName();
// Returns the type of the current XML element
XML_ELEMENT_TYPE GetElementType();
// Returns the number of child elements of the current XML element
int GetChildNumber();
// Returns the first child element of the current XML element
CXMLElement* GetFirstChild();
// Returns the current child element of the current XML element
CXMLElement* GetCurrentChild();
// Returns the next child element of the current XML element
CXMLElement* GetNextChild();
// Returns the last child element of the current XML element
CXMLElement* GetLastChild();
// Sets the value of the current XML element (valid only for attribute elements)
void SetValue(LPTSTR lpszValue);
// Gets the value of the current XML element (valid only for attribute elements)
LPTSTR GetValue();

Modify the inner structure of the XML element collection using the following methods:

// Create the new XML element of the specified type
void Create(LPTSTR lpszElementName, XML_ELEMENT_TYPE type);
// Appends the new XML element to the end of the collection of the current XML element
void AppendChild(CXMLElement* lpXMLChild);

Using the first group of CXMLEElement class methods, one can browse the XML element collection. Using the second group of CXMLEElement class methods, one can create new XML elements of different types and append them to existing ones.

Speaking about the types of XML elements, here are they listed:

XET_TAG // TAG element
XET_ATTRIBUTE // ATTRIBUTE element
XET_TEXT // TEXT element

Points of Interest

I always had a problem with loading XML documents easily and manipulating with them. Now, I have useful classes that decrease my future development time when this type of work is required. I am also able now to easily parse RSS feeds that are used all over the Web. I am planning to extend this basic support to HTML, or XML documents that are not-so-well-formed, soon (when I find some more free time).

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

darkoman
Software Developer (Senior) Elektromehanika d.o.o. Nis
Serbia Serbia
Member
He has a master degree in Computer Science at Faculty of Electronics in Nis (Serbia), and works as a C++/C# application developer for Windows platforms since 2001. He likes traveling, reading and meeting new people and cultures.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneraltinyXMLmemberAs_Sanya15 Dec '10 - 21:41 
Maybe it is better to investigate tinyXML or the similar library instead writing the new one?
 
Respect to darkoman, but why are you trying to discover a new wheel?
AnswerRe: tinyXMLmemberdarkoman29 Dec '10 - 20:04 
Hello,
 
this is just an example of how something can be done.
You might ask then next: why all these articles on the Code Project when you have the standard solutions?
It is not always about to "discover the wheel" but to help yourself or the others.
 
Or, let me put it this way: have you ever wandered how something works?
How to build your own scripting language or custom button control?
You have standard solutions here but you somehow, you always decide to do some part of the work by yourself?
 
Why?
Because you learn by doing it and you practice your skills also.
 

Regards,
Darkoman
"Avaritia est radix omnium malorum..."

GeneralOne type of tag is not handled...memberbala66629 Nov '10 - 23:18 
if xml looks like:
<bala x=":)" />
Then the parser fails. If i expand it to:
<bala x=":)">babaroga</bala>
Then it succeeds. I tryed to fix it but i must say that your code is quite confusing,
using offsets instead of pointers is... ah never mind.
 
It is great as drop in code (although you could really handle attributes with separate
functions, this "child/type" story is a bit annoying) but from coding perspective i would rather
use something else then try to fix your code. To be fair i am giving you 3.
 
Srdacne pozdrave iz Slovenije... Wink | ;)
AnswerRe: One type of tag is not handled...memberdarkoman12 Dec '10 - 9:20 
Hvala,
 
najlepse na analizi i predlozima za poboljsanje.
U planu mi je da izbacim novu klasu (i prateci clanak naravno) za rad sa HTML/XML dokumentima koja je nedavno zavrsena.
Nadam se da ce ona biti od vece koristi C++ programerima nego ova koja (objektivno) ima puno nedostataka.
 

Pozdrav,
Darkoman
"Avaritia est radix omnium malorum..."

GeneralMy vote of 5memberRajaManikandan_R23 Sep '10 - 18:15 
nice one
GeneralUsing nested elements with the same names - unable to load filememberGYuval6 Jul '10 - 21:55 
Hi,
I can't load this xml file:

Big Test

Small Test


 
when I'm changing the inner to it works fine.
 
Does it support nested elements with the same name?
 
Thanks,
Yuval
GeneralA couple of problemsmemberLEKV16 Feb '10 - 5:17 
I ran into a couple problems during LoadFromFile in .NET 2003.
 
1. dwAttributeOffset was not initialized before use in ParseXMLElement(). I initialized it to zero at definition with no obvious problems.
 
2. Corrupted the stack. I haven't tracked this down yet. It was detected by the debug runtime at ParseXMLElement() exit.
 
In fairness, the XML file I tried was large and maybe not quite the model in mind during the code design. Nothing fancy in the XML. All simple tags (e.g., aaaaa) other than the first line.
 
Larry
AnswerRe: A couple of problemsmemberdarkoman16 Feb '10 - 9:25 
Hello,
 
can you please submit the XML file you have tried to parse, or just a part of it?
Thank you.
 

Best regards,
Darkoman
"Avaritia est radix omnium malorum..."

GeneralRe: A couple of problemsmemberLEKV16 Feb '10 - 12:42 
I'll try to get a piece of the file that will still produce the problems. The file is currently about 15MB (I did say large).
 
Larry
GeneralRe: A couple of problemsmemberLEKV17 Feb '10 - 4:44 
If you have iTunes (or know someone locally that has it) you may be able to create a similar XML file. On iTunes File=>Export->Library will produce an XML file with data about the items in the library (title, artist, composer, etc) and playlists. I tried another small XML parser and it didn't like the <!DOCTYPE line. I got the same failure with and without the DOCTYPE line with this implementation.
 
I will still try to get a smaller file that still has the problems.
 
Larry
Generalnow i saved to file, andmemberliaterez18 Nov '09 - 3:43 
1. there was no indentation
2. I needed to create the root by myself. didn't get it automatically. did I miss something?
3. the root was not written to file, (so internet explorer could not show it)
4. if I open it as a text file, there is no indentation
Generalsorry for confused (- 1 and 4 are the same)memberliaterez18 Nov '09 - 3:44 
sorry
Generalxml reader performancememberliaterez17 Nov '09 - 5:01 
Hello, Darko
I debugged your code and found myself repeating on loops every inner tag.
I did not try to rewrite yet, but I wonder if you missed the performance issue and it can be done on better performance or I am the one who misses something.
 
thanks
AnswerRe: xml reader performancememberdarkoman17 Nov '09 - 10:43 
Hello,
 
thanks for the interest in this work.
Yes, you are right, I am using recursion to parse the XML file, although it could be done using just plain stack.
Using the stack would for sure increase the overall performance of this XML parser.
I will try to find the time to implement this second solution you have suggested.
 

Best regards,
Darkoman
 
"Avaritia est radix omnium malorum..."

Generalnice but useless for the rest of usmemberrajeshFeb028 Apr '09 - 22:39 
It would be nice, if the authors assume ansi-c++ standard to produce articles like this. In fact, then "the rest of us" will also be able to use it. I do not hate MS Windows and its' product, it is just the matter of taste. But, as you see, a brilliant author, Mr darkoman, who has written this beautiful article, very nice article, but you know, it is useless for "the rest of the us" who do not use CString and DWORD and many more macros like these.
 
Rajesh Karan

GeneralRe: nice but useless for the rest of usmemberdarkoman15 Apr '09 - 19:49 
Hello,
 
thanks for the interest in this article.
The main goal in this article was not to produce an "ansi-c++ standard XML file parser", but to help certain number of developers working with MS Visual Studio. This might be, however, a mistake not to take care about other developers not using this tool.
But, this is also a "pilot-article". If it lives long enough to see the future versions, the final one would for sure be a non-MFC dependent one.
That was my goal many times, writing other articles for the CodeProject.
I thank you for the positive criticism...
 

Best regards,
Darkoman
 
"Avaritia est radix omnium malorum..."

GeneralRe: XML Mis-parsing [modified]memberMember 603819625 Mar '09 - 22:59 
Hello friend ,
I used your class in my code. it is simple and easy to understand.
by using ur class, i added some nodes..
But now i have a problem...
 
1. how can i delete a node
2. how can i modify the content of a node..
for eg.
 
<main>
<add1>
<new1>10</new1>
<new1>50</new1>
</add1>
</main>
 
How to edit content tag like 10 to 20.
How to delete node tag.

GeneralXML Mis-parsingmemberowl-len11 Mar '09 - 13:01 
It looks like the parser can not work through reserved characters in the data of an element. Notice that the "=" at the end of the GUID will cause the parser to terminate the data collection and enter the Attribute collection.
 

<objects xmlns="http://yahoo.com/objects">
kzpccGRdnX7t71ROn4PG0Q==
</objects>
GeneralRemove nodes from file [modified]memberbijumanjeri8 Jan '09 - 3:03 
Hi dude, nice to see ur code.. so simple and easy to understand.
by using ur class, i added some nodes.. Smile | :)
But now i have a problem...how can i delete a node..
1. how can i delete a node
2. how can i modify the content of a node..
Thanks in advance... Smile | :) Big Grin | :-D
 
modified on Thursday, January 8, 2009 9:10 AM

GeneralCan't parser the XML filemembergu@z23 Nov '08 - 16:16 
Hi, i am very interesting on this XML parser class.. currently i am doing my school project and i try to use this parser to parse my XML file.
 
But it fail.. My XML file able to open in IE browser so means the format should be correct. Izit there is any congfiguration i need to change?
 
XML file content:
<?xml version="1.0" encoding="UTF-8"?>
<skin>
<manifest name="SONcommunicator Gray" author="CaryCui" description="System defulat skin " version="1.0" type="Skin" />
<fonts>
<font name="Panel.Caption" face="Tohoma" size="12" weight="plain"/>
</fonts>
<colourScheme>
<colour name="System.Base.Window" value="FFFFFF"/>
<colour name="System.Base.Midtone" value="FFFFFF"/>
<colour name="System.Back.Selected" value="B9CFFF"/>
<colour name="System.Back.Checked.Selected" value="FFFFFF"/>
<colour name="System.Margin" value="FFFFFF"/>
<colour name="Panel.Caption.Text" value="000000"/>
<colour name="Panel.Caption.Back" value="000000"/>
<colour name="TaskPanel.Back" value="EEEEEE"/>
<colour name="System.Shadow" value="D6D4D7"/> <!-- menu icon mouseover hovers -->
<colour name="System.Border" value="7C7E7C"/> <!-- menu hover border colour -->
<colour name="System.Disabled" value="7C7E7C"/> <!-- item inactive/disabled colour -->
<colour name="System.Text" value="000000"/> <!-- active menu / item text colour -->
<colour name="MainDialog.Backgroud" value="FFFFFF"/>
<colour name="Dialog.Background" value="FFFFFF"/>
<colour name="MainHeader.Board" value="6A9FE1" />
<colour name="MainHeader.BodyStart" value="C2DAFC" />
<colour name="MainHeader.BodyEnd" value="E9F0FF" />
<colour name="MainHeader.BodyShadow" value="C2DAFC" />
<colour name="MainHeader.BodyGrid" value="E9F0FF" />
<colour name="MainTab.Board" value="E2E2DA" />
<colour name="MainTab.BodyFill" value="FFFFFF" />
<colour name="MainTab.BodyShadow" value="ECEBE6" />
<colour name="MainTab.ItemBoard" value="C8C6B7" />
<colour name="MainTab.ItemFill" value="E8E8E0" />
<colour name="FolderBarCtrl.BodyStart" value="C4DBFC" />
<colour name="FolderBarCtrl.BodyEnd" value="E9F0FF" />
</colourScheme>
<watermarks>
<watermark target="MainMenuBarBK" path="DialogToolBarBK.bmp"/>
<watermark target="DialogToolBarBK" path="DialogToolBarBK.bmp"/>
<watermark target="MainDialogBK" path="MainDialogBK.bmp"/>
</watermarks>
<WindowSkins>
<WindowSkin>
<image path="SONmobileGUI.bmp"/>
<image path="DialogBK.bmp" type="watermark"/>
<target window="MainDialog"/>
<target window="ChatDialog"/>
<parts>
<part name="TopLeft" rect="0,0,26,26"/>
<part name="Top" rect="30,0,60,26"/>
<part name="TopRight" rect="212,0,70,26"/>
<part name="Left" rect="0,26,2,530"/>
<part name="Right" rect="280,26,2,530"/>
<part name="BottomLeft" rect="0,561,10,3"/>
<part name="Bottom" rect="10,561,60,3"/>
<part name="BottomRight" rect="272,561,10,3"/>
<part name="MinimiseDown" rect="224,5,16,16"/>
<part name="MinimiseHover" rect="224,223,16,16"/>
<part name="MaximiseDown" rect="241,5,16,16"/>
<part name="MaximiseHover" rect="241,223,16,16"/>
<part name="CloseDown" rect="259,8,16,16"/>
<part name="CloseHover" rect="259,226,16,16"/>
</parts>
<anchors>
<anchor name="Icon" rect="8,6,16,16"/>
<anchor name="Close" rect="-23,8,16,16"/>
<anchor name="Maximise" rect="-41,5,16,16"/>
<anchor name="Minimise" rect="-58,5,16,16"/>
</anchors>
<region>
<shape type="rectangle" rect="0,5,-1,-1"/>
<shape type="roundRect" rect="0,0,-1,-1" size="5,5" combine="or"/>
<shape type="roundRect" rect="0,0,-1,-1" size="5,5" combine="and"/>
</region>
<caption rect="28,5,400,16" fontFace="Tahoma" fontSize="11" colour="FFFFFF" inactiveColour="CCCCFF" outlineColour="000080"/>
</WindowSkin>
<WindowSkin>
<image path="SONmobileGUI.bmp"/>
<image path="DialogBK.bmp" type="watermark"/>
 
<target window="PopDialog"/>
<target window="Dialog"/>
<parts>
<part name="TopLeft" rect="0,0,26,26"/>
<part name="Top" rect="30,0,60,26"/>
<part name="TopRight" rect="254,247,24,26"/>
<part name="Left" rect="0,26,2,530"/>
<part name="Right" rect="280,26,2,530"/>
<part name="BottomLeft" rect="0,561,10,3"/>
<part name="Bottom" rect="10,561,60,3"/>
<part name="BottomRight" rect="272,561,10,3"/>
<part name="CloseDown" rect="259,8,16,16"/>
<part name="CloseHover" rect="259,226,16,16"/>
</parts>
<anchors>
<anchor name="Icon" rect="8,6,16,16"/>
<anchor name="Close" rect="-23,8,16,16"/>
</anchors>
<region>
<shape type="rectangle" rect="0,5,-1,-1"/>
<shape type="roundRect" rect="0,0,-1,-1" size="5,5" combine="or"/>
<shape type="roundRect" rect="0,0,-1,-1" size="5,5" combine="and"/>
</region>
<caption rect="28,5,400,16" fontFace="Tahoma" fontSize="11" colour="FFFFFF" inactiveColour="CCCCFF" outlineColour="000080"/>
</WindowSkin>
<WindowSkin>
<image path="SONmobileGUI.bmp"/>
<image path="DialogBKMakeCall.bmp" type="watermark"/>
 
<target window="PopDialogMakeCall"/>
<parts>
<part name="TopLeft" rect="0,0,26,26"/>
<part name="Top" rect="30,0,60,26"/>
<part name="TopRight" rect="254,247,24,26"/>
<part name="Left" rect="0,26,2,530"/>
<part name="Right" rect="280,26,2,530"/>
<part name="BottomLeft" rect="0,561,10,3"/>
<part name="Bottom" rect="10,561,60,3"/>
<part name="BottomRight" rect="272,561,10,3"/>
<part name="CloseDown" rect="259,8,16,16"/>
<part name="CloseHover" rect="259,226,16,16"/>
</parts>
<anchors>
<anchor name="Icon" rect="8,6,16,16"/>
<anchor name="Close" rect="-23,8,16,16"/>
</anchors>
<region>
<shape type="rectangle" rect="0,5,-1,-1"/>
<shape type="roundRect" rect="0,0,-1,-1" size="5,5" combine="or"/>
<shape type="roundRect" rect="0,0,-1,-1" size="5,5" combine="and"/>
</region>
<caption rect="28,5,400,16" fontFace="Tahoma" fontSize="11" colour="FFFFFF" inactiveColour="CCCCFF" outlineColour="000080"/>
</WindowSkin>
<WindowSkin>
<image path="SONmobileGUI.bmp"/>
<target window="MsgBox"/>
<parts>
<part name="TopLeft" rect="0,0,26,26"/>
<part name="Top" rect="30,0,60,26"/>
<part name="TopRight" rect="254,247,24,26"/>
<part name="Left" rect="0,26,2,530"/>
<part name="Right" rect="280,26,2,530"/>
<part name="BottomLeft" rect="0,561,10,3"/>
<part name="Bottom" rect="10,561,60,3"/>
<part name="BottomRight" rect="272,561,10,3"/>
<part name="CloseDown" rect="259,8,16,16"/>
<part name="CloseHover" rect="259,226,16,16"/>
</parts>
<anchors>
<anchor name="Icon" rect="8,6,16,16"/>
</anchors>
<region>
<shape type="rectangle" rect="0,5,-1,-1"/>
<shape type="roundRect" rect="0,0,-1,-1" size="5,5" combine="or"/>
<shape type="roundRect" rect="0,0,-1,-1" size="5,5" combine="and"/>
</region>
<caption rect="28,5,400,16" fontFace="Tahoma" fontSize="11" colour="FFFFFF" inactiveColour="CCCCFF" outlineColour="000080"/>
</WindowSkin>
</WindowSkins>
</skin>
GeneralRe: Can't parser the XML filememberdarkoman25 Nov '08 - 6:10 
Hello,
 
the following closing tag is not supported: />.
 

Best regards,
Darkoman
 
"Avaritia est radix omnium malorum..."

QuestionIs this a MFC dependent? Can you make it ANSI C/C++ class?memberkedanz8 May '08 - 3:43 
This is a very useful class. Thanks, good work
I would like to know how difficult to translate it to ANSI C/C++ class?
Is there are dependent must run under Windows?
AnswerRe: Is this a MFC dependent? Can you make it ANSI C/C++ class?memberdarkoman8 May '08 - 6:45 
Hello,
 
thanks for your interest for CXMLFile class.
No, it is not dependent in any way to MFC, .NET Framework or any other 3rd part library.
It it written in ANSI C/C++.
 

Best regards,
Darkoman
 
"Avaritia est radix omnium malorum..."

GeneralRe: Is this a MFC dependent? Can you make it ANSI C/C++ class?memberDallin Wellington3 Sep '08 - 16:53 
Actually... you may not have noticed, but you used types that are in the MFC header file.
 
BOOL
LPTSTR
LPBYTE
ect...
AnswerRe: Is this a MFC dependent? Can you make it ANSI C/C++ class?memberdarkoman4 Sep '08 - 22:46 
Hello,
 
yes, but it can easily be re-defined by custom types.
 

Regards,
Darkoman
 
"Avaritia est radix omnium malorum..."

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130523.1 | Last Updated 19 Mar 2008
Article Copyright 2008 by darkoman
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid