|
Introduction
This article is about a simple and fast C++ XML parser class. There is often a need for an effective XML parser that is able to load the XML document, validate it, and browse it. In .NET environment there is a large native support for handling a lot of types of XML documents, but the same native support is missing from the original C++, MFC etc. There is, however, a COM alternative for XML file parsing and handling but it takes some time to learn it, and to use it in the right way.
This article is a simple attempt to make a C++ developer's life a bit easier than it usually is. This is support for handling the well-formed XML documents in the simplest possible way: load it, validate it, and browse it. This supports the following XML elements:
- A simple TAG element, like <Element>
- A simple ATTRIBUTE element, like Attribute="Value"
- A simple TEXT element, like [Text]
Below is an example of a simple XML file that is supported:
="1.0" ="ISO-8859-1"
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The presented XML classes are able to load this type of XML document, check if it is well-formed, and browse throughout its content. There are only two classes that provide this functionality.
The first class is called the CXMLFile class, and its main purpose is to load an XML file, validate its structure, and create an XML element collection out of its content. This collection of XML elements will represent the loaded XML file in the system memory. Its easy then to modify the inner struture of this collection, that is, to modify the XML file itself. This class also supports the loading of XML files from the hard-disk or from the memory stream, which is a special usage (ie. on some web server). The CXMLFile class can also output the XML element collection from the system memory to the file on the hard-disk.
The second class is called the CXMLElement class. It is used by the previous class, and will be used by the developer when browsing or modifying the inner structure of an XML file in the system memory, that is, when modifying the inner structure of the XML element collection. It has the basic support for the appending of this collection, and browsing it. It can provide information regarding the name, type or value of the current XML element from the collection.
Background
There are many articles on the CodeProject considering this topic, and this is a small contribution to these articles population. Hope that the readers and developers will find it useful in their everyday work.
Using the Code
It's quite easy to load an XML document from the hard-disk. See an example below:
#include "XMLFile.h"
...
_TCHAR lpszXMLFilePath[] = _T("A path to the XML file here...");
CXMLFile xmlFile;
if (xmlFile.LoadFromFile(lpszXMLFilePath))
{
}
else
{
}
To load an XML document from the memory stream:
...
CXMLFile xmlFile;
if (xmlFile.LoadFromStream(lpData, dwDataSize))
{
}
else
{
}
To save the XML element collection to the file on the hard-disk, do the following:
if (xmlFile.SaveToFile(lpszXMLFilePath))
{
}
else
{
}
After the call to LoadFromFile(), a method of the CXMLFile class, the validation and parsing of the custom XML file will be done. If the XML file is well-formed, it will be loaded in the system memory as collection of CXMLElement elements. One can gain access to this collection using another method of the CXMLFile class called GetRoot(). See below:
CXMLEElement* pRoot = xmlFile.GetRoot();
Having the pointer to the root-element of the XML collection in the system memory, there are some things that can be done here. The root-element of the collection is of the CXMLEElement class type. Here are the methods available:
LPTSTR GetElementName();
XML_ELEMENT_TYPE GetElementType();
int GetChildNumber();
CXMLElement* GetFirstChild();
CXMLElement* GetCurrentChild();
CXMLElement* GetNextChild();
CXMLElement* GetLastChild();
void SetValue(LPTSTR lpszValue);
LPTSTR GetValue();
Modify the inner structure of the XML element collection using the following methods:
void Create(LPTSTR lpszElementName, XML_ELEMENT_TYPE type);
void AppendChild(CXMLElement* lpXMLChild);
Using the first group of CXMLEElement class methods, one can browse the XML element collection. Using the second group of CXMLEElement class methods, one can create new XML elements of different types and append them to existing ones.
Speaking about the types of XML elements, here are they listed:
XET_TAG
XET_ATTRIBUTE
XET_TEXT
Points of Interest
I always had a problem with loading XML documents easily and manipulating with them. Now, I have useful classes that decrease my future development time when this type of work is required. I am also able now to easily parse RSS feeds that are used all over the Web. I am planning to extend this basic support to HTML, or XML documents that are not-so-well-formed, soon (when I find some more free time).
| You must Sign In to use this message board. |
|
| | Msgs 1 to 25 of 25 (Total in Forum: 25) (Refresh) | FirstPrevNext |
|
 |
|
|
Hi, i am very interesting on this XML parser class.. currently i am doing my school project and i try to use this parser to parse my XML file.
But it fail.. My XML file able to open in IE browser so means the format should be correct. Izit there is any congfiguration i need to change?
XML file content: <?xml version="1.0" encoding="UTF-8"?> <skin> <manifest name="SONcommunicator Gray" author="CaryCui" description="System defulat skin " version="1.0" type="Skin" /> <fonts> <font name="Panel.Caption" face="Tohoma" size="12" weight="plain"/> </fonts> <colourScheme> <colour name="System.Base.Window" value="FFFFFF"/> <colour name="System.Base.Midtone" value="FFFFFF"/> <colour name="System.Back.Selected" value="B9CFFF"/> <colour name="System.Back.Checked.Selected" value="FFFFFF"/> <colour name="System.Margin" value="FFFFFF"/> <colour name="Panel.Caption.Text" value="000000"/> <colour name="Panel.Caption.Back" value="000000"/> <colour name="TaskPanel.Back" value="EEEEEE"/> <colour name="System.Shadow" value="D6D4D7"/> <!-- menu icon mouseover hovers --> <colour name="System.Border" value="7C7E7C"/> <!-- menu hover border colour --> <colour name="System.Disabled" value="7C7E7C"/> <!-- item inactive/disabled colour --> <colour name="System.Text" value="000000"/> <!-- active menu / item text colour --> <colour name="MainDialog.Backgroud" value="FFFFFF"/> <colour name="Dialog.Background" value="FFFFFF"/> <colour name="MainHeader.Board" value="6A9FE1" /> <colour name="MainHeader.BodyStart" value="C2DAFC" /> <colour name="MainHeader.BodyEnd" value="E9F0FF" /> <colour name="MainHeader.BodyShadow" value="C2DAFC" /> <colour name="MainHeader.BodyGrid" value="E9F0FF" /> <colour name="MainTab.Board" value="E2E2DA" /> <colour name="MainTab.BodyFill" value="FFFFFF" /> <colour name="MainTab.BodyShadow" value="ECEBE6" /> <colour name="MainTab.ItemBoard" value="C8C6B7" /> <colour name="MainTab.ItemFill" value="E8E8E0" /> <colour name="FolderBarCtrl.BodyStart" value="C4DBFC" /> <colour name="FolderBarCtrl.BodyEnd" value="E9F0FF" /> </colourScheme> <watermarks> <watermark target="MainMenuBarBK" path="DialogToolBarBK.bmp"/> <watermark target="DialogToolBarBK" path="DialogToolBarBK.bmp"/> <watermark target="MainDialogBK" path="MainDialogBK.bmp"/> </watermarks> <WindowSkins> <WindowSkin> <image path="SONmobileGUI.bmp"/> <image path="DialogBK.bmp" type="watermark"/> <target window="MainDialog"/> <target window="ChatDialog"/> <parts> <part name="TopLeft" rect="0,0,26,26"/> <part name="Top" rect="30,0,60,26"/> <part name="TopRight" rect="212,0,70,26"/> <part name="Left" rect="0,26,2,530"/> <part name="Right" rect="280,26,2,530"/> <part name="BottomLeft" rect="0,561,10,3"/> <part name="Bottom" rect="10,561,60,3"/> <part name="BottomRight" rect="272,561,10,3"/> <part name="MinimiseDown" rect="224,5,16,16"/> <part name="MinimiseHover" rect="224,223,16,16"/> <part name="MaximiseDown" rect="241,5,16,16"/> <part name="MaximiseHover" rect="241,223,16,16"/> <part name="CloseDown" rect="259,8,16,16"/> <part name="CloseHover" rect="259,226,16,16"/> </parts> <anchors> <anchor name="Icon" rect="8,6,16,16"/> <anchor name="Close" rect="-23,8,16,16"/> <anchor name="Maximise" rect="-41,5,16,16"/> <anchor name="Minimise" rect="-58,5,16,16"/> </anchors> <region> <shape type="rectangle" rect="0,5,-1,-1"/> <shape type="roundRect" rect="0,0,-1,-1" size="5,5" combine="or"/> <shape type="roundRect" rect="0,0,-1,-1" size="5,5" combine="and"/> </region> <caption rect="28,5,400,16" fontFace="Tahoma" fontSize="11" colour="FFFFFF" inactiveColour="CCCCFF" outlineColour="000080"/> </WindowSkin> <WindowSkin> <image path="SONmobileGUI.bmp"/> <image path="DialogBK.bmp" type="watermark"/>
<target window="PopDialog"/> <target window="Dialog"/> <parts> <part name="TopLeft" rect="0,0,26,26"/> <part name="Top" rect="30,0,60,26"/> <part name="TopRight" rect="254,247,24,26"/> <part name="Left" rect="0,26,2,530"/> <part name="Right" rect="280,26,2,530"/> <part name="BottomLeft" rect="0,561,10,3"/> <part name="Bottom" rect="10,561,60,3"/> <part name="BottomRight" rect="272,561,10,3"/> <part name="CloseDown" rect="259,8,16,16"/> <part name="CloseHover" rect="259,226,16,16"/> </parts> <anchors> <anchor name="Icon" rect="8,6,16,16"/> <anchor name="Close" rect="-23,8,16,16"/> </anchors> <region> <shape type="rectangle" rect="0,5,-1,-1"/> <shape type="roundRect" rect="0,0,-1,-1" size="5,5" combine="or"/> <shape type="roundRect" rect="0,0,-1,-1" size="5,5" combine="and"/> </region> <caption rect="28,5,400,16" fontFace="Tahoma" fontSize="11" colour="FFFFFF" inactiveColour="CCCCFF" outlineColour="000080"/> </WindowSkin> <WindowSkin> <image path="SONmobileGUI.bmp"/> <image path="DialogBKMakeCall.bmp" type="watermark"/>
<target window="PopDialogMakeCall"/> <parts> <part name="TopLeft" rect="0,0,26,26"/> <part name="Top" rect="30,0,60,26"/> <part name="TopRight" rect="254,247,24,26"/> <part name="Left" rect="0,26,2,530"/> <part name="Right" rect="280,26,2,530"/> <part name="BottomLeft" rect="0,561,10,3"/> <part name="Bottom" rect="10,561,60,3"/> <part name="BottomRight" rect="272,561,10,3"/> <part name="CloseDown" rect="259,8,16,16"/> <part name="CloseHover" rect="259,226,16,16"/> </parts> <anchors> <anchor name="Icon" rect="8,6,16,16"/> <anchor name="Close" rect="-23,8,16,16"/> </anchors> <region> <shape type="rectangle" rect="0,5,-1,-1"/> <shape type="roundRect" rect="0,0,-1,-1" size="5,5" combine="or"/> <shape type="roundRect" rect="0,0,-1,-1" size="5,5" combine="and"/> </region> <caption rect="28,5,400,16" fontFace="Tahoma" fontSize="11" colour="FFFFFF" inactiveColour="CCCCFF" outlineColour="000080"/> </WindowSkin> <WindowSkin> <image path="SONmobileGUI.bmp"/> <target window="MsgBox"/> <parts> <part name="TopLeft" rect="0,0,26,26"/> <part name="Top" rect="30,0,60,26"/> <part name="TopRight" rect="254,247,24,26"/> <part name="Left" rect="0,26,2,530"/> <part name="Right" rect="280,26,2,530"/> <part name="BottomLeft" rect="0,561,10,3"/> <part name="Bottom" rect="10,561,60,3"/> <part name="BottomRight" rect="272,561,10,3"/> <part name="CloseDown" rect="259,8,16,16"/> <part name="CloseHover" rect="259,226,16,16"/> </parts> <anchors> <anchor name="Icon" rect="8,6,16,16"/> </anchors> <region> <shape type="rectangle" rect="0,5,-1,-1"/> <shape type="roundRect" rect="0,0,-1,-1" size="5,5" combine="or"/> <shape type="roundRect" rect="0,0,-1,-1" size="5,5" combine="and"/> </region> <caption rect="28,5,400,16" fontFace="Tahoma" fontSize="11" colour="FFFFFF" inactiveColour="CCCCFF" outlineColour="000080"/> </WindowSkin> </WindowSkins> </skin>
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hello,
the following closing tag is not supported: />.
Best regards, Darkoman
"Avaritia est radix omnium malorum..."
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
This is a very useful class. Thanks, good work I would like to know how difficult to translate it to ANSI C/C++ class? Is there are dependent must run under Windows?
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hello,
thanks for your interest for CXMLFile class. No, it is not dependent in any way to MFC, .NET Framework or any other 3rd part library. It it written in ANSI C/C++.
Best regards, Darkoman
"Avaritia est radix omnium malorum..."
|
| Sign In·View Thread·PermaLink | 3.25/5 (3 votes) |
|
|
|
 |
|
|
Actually... you may not have noticed, but you used types that are in the MFC header file.
BOOL LPTSTR LPBYTE ect...
|
| Sign In·View Thread·PermaLink | 2.00/5 (1 vote) |
|
|
|
 |
|
|
Hello,
yes, but it can easily be re-defined by custom types.
Regards, Darkoman
"Avaritia est radix omnium malorum..."
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi! Your classes are still very helpful to me. But I have a doubt. Suppose I want to create and append a new element. I do this:
I find one specific element of the xml file and want to append a new element:
CXMLElement* selected_element; CXMLElement* new_element; LPTSTR name;
(...) new_element->Create(name,XET_TAG); selected_element->AppendChild(new_element); ------------------------------------------------------------------- But just before the functions ends, the debugging it's interrupted and it shows me the Destroy method of the CXMLElement class:
Unhandled exception at 0x00415f12 in XMLTestMFC.exe: 0xC0000005: Access violation reading location 0xfeeefeee.
Why is it happening?? Thanks again for your help!
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hello,
can you please send me the complete body of your function (I see only a part now)? Thank you in advance!
Regards, Darkoman
"Avaritia est radix omnium malorum..."
|
| Sign In·View Thread·PermaLink | 2.00/5 (1 vote) |
|
|
|
 |
|
|
 |
|
|
First of all thanks a lot for this classes. They seem to be a very neat approach to the XML typical needs.
But about the question... I've read your article, the comments and the source code and it seems there is no way to using and encoding different from "ISO-8859-1".
Hope my English is clear enough.
Greetings.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hello,
thanks for the interest for this work. If I am right, there IS a UNICODE support available for methods like GetValue() and SetValue(), so any encoding should be possible to use. Or, I am wrong?
Regards, Darkoman
"Avaritia est radix omnium malorum..."
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi darkoman,
I'm not sure about it. I think that compiling for UNICODE will give UTF-16 encoding... but UTF-8...?
Anyway, your classes are very useful since they don't depend on external libraries or dlls or the like
I'll try to use this classes during the next days and will post here any remarkable findings.
Cheers.
|
| Sign In·View Thread·PermaLink | 1.50/5 (2 votes) |
|
|
|
 |
|
|
Hi, I have to do an Mfc application to read and process xml files with vc++05. I am a total beginner and your work seemed to me very clear, so I am trying to use it. I have no problem in open and validate the xml file the problem is qhen I try to get an element name or count the child nodes of an element. I use the next code:
void CXMLTestMFCView::OnFileOpen() { CFileDialog fileDialog(TRUE, _T("*.xml"), NULL, OFN_HIDEREADONLY | OFN_OVERWRITEPROMPT, _T("XML files (*.xml)|*.xml||"), this); CXMLFile xmlFile; if (fileDialog.DoModal() == IDOK){ if (xmlFile.LoadFromFile((LPTSTR)((LPCTSTR)fileDialog.GetPathName()))) { MessageBox(_T("File opened"), _T("Informació"), MB_OK); //Carregar col-lecció d'elements a l'arxiu _TCHAR lpszCurrentDirectory[_MAX_PATH]; GetCurrentDirectory(_MAX_PATH, lpszCurrentDirectory); _TCHAR lpszOutputFileName[_MAX_PATH]; _stprintf(lpszOutputFileName, _T("%s\\output_file.xml"), lpszCurrentDirectory); if (xmlFile.SaveToFile(lpszOutputFileName)) { MessageBox(_T("XML saved"), _T("Informació"), MB_OK); } else { MessageBox(_T("can't save it"), _T("Informació"), MB_OK); } }else{ MessageBox(_T("Can't open it"), _T("Informació"), MB_OK); } } CXMLElement* pRoot = xmlFile.GetRoot(); LPTSTR name; name=pRoot->;GetElementName();
int number=0;
number=pRoot->GetChildNumber(); CString str; str.Format(_T("Element's child number= %d"),number); str.Format(_T("Element Name= %s"),name); }
And the simple xml file is:
<?xml version="1.0" encoding="ISO-8859-1"?> <nota> <to_>Tove</to_> <from_>Jani</from_> <heading>Reminder</heading> <body>Don't forget me this weekend</body> </nota>
The result after debugging is that I open the file and save it only if its a valid xml file, but this is wrong:
"Element's child number = 3 " //It should be 4 "Element's name = XML:Root" //It shoul be note ???
Please what I'm doing wrong??
Thanks for your time!
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hello,
thanks for your interest for this work. I will try to explain where the wrong assumption is placed (it's not you mistake, it's more of "lack of full information" of its kind). So here it is:
When you call the xmlFile.GetRoot() method you will get the first element of the collection. It is the XML:ROOT element. It has also 3 children: version attribute, encoding attribute and a nota element. You should take the last element called nota. It has now 4 child elements, in order: to_ element, from element, heading element and body element.
Always use GetElementName() method and GetElementType() method to get the correct information about the current element.
Regards, Darkoman
"Avaritia est radix omnium malorum..."
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
Hi, I'm still working with your classes, but I have a problem. My application copies the xml file in a List Box, so the user can select one element and check its attributes and values. But what if I want to display the content of an specific element child? I mean:
<?xml version="1.0" encoding="ISO-8859-1"?> <nota> <to_>Tove</to_> <from_>Jani</from_> <heading>Reminder</heading> <body>Don't forget me this weekend</body> </nota>
In the example I can display that there's a child element called <body>, but I want to display its content, "Don't forget me this weekend". I think this is not possible with your classes, I'm wrong? Do you know a way to do this?
Thanks again! </body>
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hello,
to display the content of the child element <body></body> use the GetFirstChild() method of the <body></body> child element. You will get the element of the type TEXT, with the name "Don't forget me this weekend".
Regards, Darkoman
"Avaritia est radix omnium malorum..."
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hello! I think good be able add in processing XML_ESC else XML_TAB:
1. In XMLFile.h ..... #define XML_ESC ' ' // Escape symbol #define XML_TAB '\t' // Tab symbol ....
2. In XMLFile.cpp 2.1.ParseXMLElement procedure ..... case XML_ESC: case XML_TAB: // !!!!! { // Update flags dwFlags |= FLG_ESC; // Set new mode if (dwMode != MOD_DEF) { dwMode = MOD_CLS; } } break;
.....
2.2. ExtractAttribute procedure: while( (dwNameOffset >= 0) (m_lpData[dwNameOffset] != XML_ESC) && (m_lpData[dwNameOffset] != XML_TAB) && // !!!! (m_lpData[dwNameOffset] != XML_SOT) && (m_lpData[dwNameOffset] != XML_EOT) && (m_lpData[dwNameOffset] != XML_PRT) && (m_lpData[dwNameOffset] != XML_SPT) ) { if ((m_lpData[dwNameOffset] != XML_SQT) && (m_lpDatadwNameOffset] != XML_DQT)) { lpszAttributeName[k++] = m_lpData[dwNameOffset]; } dwNameOffset--; }
May be I'm not right?
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
I have been looking for a tool that would enable me to read/write XML files from C++, so your article sparked my interest.
I guess you do not cater for the following XML file structure (hole element)?
<?xml version="1.0" encoding="ISO-8859-1" ?> <tourn> <course units="METRES"> <hole id="1" par="4" yards="360" name="" /> <hole id="2" par="4" yards="348" name="" /> </course> <miscellaneous> <mainmenuline1>Hyatt Regency Coolum</mainmenuline> </miscellaneous> </tourn>
Rob rmenegon@menmac.com.au
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hello,
well, in this release, the structure would have to be like this:
<tourn> <course units="METRES"> <hole id="1" par="4" yards="360" name=""></hole> <hole id="2" par="4" yards="348" name=""></hole> </course> <miscellaneous> <mainmenuline>Hyatt Regency Coolum</mainmenuline> </miscellaneous> </tourn>
Regards, Darkoman
"Avaritia est radix omnium malorum..." modified on Wednesday, March 26, 2008 6:51 AM
<div class="ForumMod">modified on Wednesday, March 26, 2008 6:52 AM</div>
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi Darkoman
You last post does not seem to appear correctly it just shows
Hyatt Regency Coolum
it seems to have lost your markup, looking at the page source its there 
supporting <tag/> etc would be a real bonus.
Brian
|
| Sign In·View Thread·PermaLink | 2.00/5 (2 votes) |
|
|
|
 |
|
|
Hello,
I have altered it. Thank you. Yes, I will try to include support for empty closing tags as soon as possible.
Regards, Darkoman
"Avaritia est radix omnium malorum..."
|
| Sign In·View Thread·PermaLink | 3.00/5 (3 votes) |
|
|
|
 |
|
|
Hi.
I'm successfully using your code to parse simple XML strings and I'd like to improve it to support <tag /> and <tag attr="" />, because my application will need it.
Could you shed any light on where to apply changes? Any hint is welcome.
When I finish that change, I send you the patch so you can update your own code.
Thanks a lot.
Alexandre
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hello,
thanks for your interest in this class. You would need to improve the following methods of the CXMLFile class:
BOOL ParseXMLElement(CXMLElement* lpXMLElement, DWORD dwStartOffset, DWORD dwEndOffset); DWORD ExtractAttribute(DWORD offset, LPTSTR lpszAttributeName, LPTSTR lpszAttributeValue);
Regards, Darkoman
"Avaritia est radix omnium malorum..."
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
General News Question Answer Joke Rant Admin
|