Click here to Skip to main content
15,881,559 members
Articles / Programming Languages / XML

Parsing an XML file in a C/C++ program

Rate me:
Please Sign up or sign in to vote.
4.59/5 (17 votes)
15 Apr 2011CPOL3 min read 178.8K   11.1K   37   10
A DOM based XML parser with C/C++.

Introduction

I often use CodeProject's articles and code samples as a source of information on programming related subjects. Probably, this is the time to share some knowledge and experience. With so many relatively close technologies, nowadays it is not easy to select the correct way to solve coding problems and to find effective solutions - from the right choice of platform and programming language, to the many variants that almost any programming task can be implemented. The best and unique feature of the CodeProject model is that a text on a given subject (not only on Microsoft’s technologies as in MSDN, for example) written by a programmer is accompanied by code samples on a given programming language. Thus, the following article with code sample (console application) is supposed to help you in time when you need to process a large volume of structured information. Of course, slightly adapted to your needs, this code can be used in your C/C++ project to read and process a large volume of structured data from an XML file.

Background

There are many ways to read a large volume of data and to process it in your program. For example, you can use SQL to read structured information from a database. You can use programming technologies and a given language functions to read data from an Excel or CSV file. In all these cases, you would have to write your own code to process (parse) the data, which can be an ineffective and error prone solution most of the time. The Document Object Model (DOM) provides a standardized way to read and parse structured data read from an XML file. This is “a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure, and style of documents” - (www.w3.org/DOM/).

Using the code

DOM provides standard APIs to access an XML document's objects by going through the tree of "nodes" created by the parser. MSDN says: “The DOM implementation is a part of the MSXML parser” (here we use the DOM implementation in Microsoft XML Core Services or MSXML) where the parser creates a physical, tree-like, structure of the document, checks if it is well-formed, and validates, if requested.

These is the elements and attributes structure in the XML file provided with the code sample (this is the stocks.xml file from the MSXML 4.0 SDK):

Image 1

Sample XML file

MSDN, MSXML 4.0 SDK says: “After a document is parsed, its nodes can be explored in any direction; they are not limited to straight-through text file processing. The DOMDocument object exposes properties and methods that allow you to navigate, query, and modify the content and structure of an XML document. Each of the following objects exposes methods and properties that enable you to gather information about the instance of the object, manipulate the value and structure of the object, and navigate to other objects within the tree. For developers using C, C++, these objects are exposed as the following COM interfaces”. Here is the corresponding code:

C++
//Initialize COM Library:
CoInitialize(NULL);

//Create an instance of the DOMDocument object:
docPtr.CreateInstance(__uuidof(DOMDocument30));

Then load the document (XML file):

C++
_variant_t varXml(szFileName);//XML file to load
varResult = docPtr->load(varXml);

Collect all or selected nodes by tag name:

C++
NodeListPtr = docPtr->getElementsByTagName(strFindText);

//Output root node:
docPtr->documentElement->get_nodeName(&bstrItemText);

In the DOM model, “the node can be used to represent elements, attributes, textual content, comments, processing instructions, entities, CDATA sections, and document fragments”. In our sample, we work with elements and attributes only.

Finally, in the for loop, we go through every node of type element and then through the attributes of every element (if exist):

C++
for(i = 0; i < (NodeListPtr->length); i++)
{
    if (pIDOMNode) pIDOMNode->Release();
    NodeListPtr->get_item(i, &pIDOMNode);
    //Loop through the nodes:
    if(pIDOMNode )
    {
        pIDOMNode->get_nodeTypeString(&bstrNodeType);

        //We process only elements (nodes of "element" type): 
        BSTR temp = L"element";
        //some code here...

        //Loop through the number of attributes:
        for(j = 0; j < length; j++)
        {
            //some code here...
        }
    }
}

This is the output of the parsing:

Image 2

Output of parsing the stocks.xml file

References

History

  • 08.2009 - Built-in Visual Studio C++ 6.0
  • 29.08.2010 - Recompiled in Visual Studio C++ 2008
  • 15.04.2011 - Added images and updated text

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Russian Federation Russian Federation
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionCommand line arguments. Pin
ruwanrodrigo4-Feb-20 9:52
ruwanrodrigo4-Feb-20 9:52 
GeneralMy vote of 5 Pin
Cory Shirts7-Jun-13 9:45
Cory Shirts7-Jun-13 9:45 
GeneralRe: My vote of 5 Pin
Sergey Chepurin10-Jun-13 19:15
Sergey Chepurin10-Jun-13 19:15 
QuestionHuge memory leak in docPtr->load() Pin
RedUnited16-Aug-12 20:11
RedUnited16-Aug-12 20:11 
AnswerRe: Huge memory leak in docPtr->load() Pin
Sergey Chepurin16-Aug-12 20:42
Sergey Chepurin16-Aug-12 20:42 
GeneralRe: Huge memory leak in docPtr->load() Pin
RedUnited11-Sep-12 20:55
RedUnited11-Sep-12 20:55 
GeneralRe: Huge memory leak in docPtr->load() Pin
Sergey Chepurin11-Sep-12 21:07
Sergey Chepurin11-Sep-12 21:07 
GeneralRe: Huge memory leak in docPtr->load() Pin
Michael B Pliam30-Sep-13 8:19
Michael B Pliam30-Sep-13 8:19 
GeneralDOM != Parser Pin
qisamuelzhang5-Apr-11 18:53
qisamuelzhang5-Apr-11 18:53 
GeneralRe: DOM != Parser Pin
Sergey Chepurin5-Apr-11 20:34
Sergey Chepurin5-Apr-11 20:34 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.