Click here to Skip to main content
13,299,898 members (71,968 online)
Click here to Skip to main content
Add your own
alternative version


76 bookmarked
Posted 17 Feb 2002

C++ Wrappers for the Expat XML Parser

, 17 Feb 2002
Rate this:
Please Sign up or sign in to vote.
The included class definitions provide complete and easy to use C++ wrappers for the Expat C API


The Expat XML Parser is a fine and widely used event based XML parser.  One of the nicer features of Expat is that it has an API capable of being used by C programs.  Even though many programmers use Expat in a C++ environment, the C based API makes it easy to export this API from a DLL.

However, Expat being a C based API doesn't mean we have to live without our C++ classes.  Luckily, Expat was designed with the ability to be augmented with classes.

(Definition: Event Based XML Parser - An XML parser which invokes methods (a.k.a. events) when XML constructs are parsed.  This differs from the DOM (Document Object Model) style parsers that parse the XML and then present the application with XML data in its logical hierarchical format.)

Design Rational

The primary considerations when designing the Expat wrapper classes was completeness, simplicity, and extensibility.  For completeness, almost all Expat API routines have been wrapped in the classes.  This includes even API such as XML_ExpatVersionInfo.  For simplicity, the wrapper classes only wrap the Expat API and provide no other features.  For extensibility, the wrapper classes make it easy to derive new classes the provide enhanced functionality.


This Expat wrappers consist of 2 classes, a template based class (CExpatImpl <class _T>) and a virtual function based class (CExpat).  Each class has features the lend themselves to specific solutions.

The following table illustrates the relationship between the API and the two classes.


CExpatImpl <class _T>

Expat C API

The template class CExpatImpl <class _T> provides the base layer of translation between C++ and the Expat C API.  The benefit to the template designed is that if the application only needs a few of the Expat event routines, then the code for the event routines are not compiled into the final executable.  Admittedly, the amount of space wasted is minimal, but why waste it.

The CExpat class is derived from the CExpatImpl <class _T> template class.  However, excluding the default constructor, the only methods contained within this class are all the event methods declared as virtual functions.  CExpat is intended for situations where virtual functions are more preferable than templates.

Within reason, the two classes are interchangeable.  If you have a class that is derived from CExpat, it could be easily modified to use CExpatImpl <class _T> or visa-versa without having to modify any other source.  See the "Implementation Notes" for more information about some implementation pitfalls with regard to more complex derived classes.

For the rest of this document, only the CExpatImpl <class _T> class will be discussed.  As stated previously, the two wrapper classes are almost 100 percent interchangeable.  Documenting both would be redundant.

Getting Started

The first step in using CExpatImpl <class _T> is deriving a new class that will provide the application specific implementation.  Deriving a class is required.  Like Expat, if there is no derived class then Expat would only verify that the XML is well formed.

As a starting point, let us define an XML parser that will display when an element begins, ends, and the data contained within the element.

class CMyXML : public CExpatImpl <CMyXML> 

	// Constructor 
	CMyXML () 
	// Invoked by CExpatImpl after the parser is created
	void OnPostCreate ()
		// Enable all the event routines we want
		EnableStartElementHandler ();
		EnableEndElementHandler ();
		// Note: EnableElementHandler will do both start and end
		EnableCharacterDataHandler ();
	// Start element handler

	void OnStartElement (const XML_Char *pszName, const XML_Char **papszAttrs)
		printf ("We got a start element %s\n", pszName);

	// End element handler

	void OnEndElement (const XML_Char *pszName)
		printf ("We got an end element %s\n", pszName);

	// Character data handler

	void OnCharacterData (const XML_Char *pszData, int nLength)
		// note, pszData is NOT null terminated
		printf ("We got %d bytes of data\n", nLength);

The CMyXML::OnPostCreate method will be invoked by CExpatImpl <class _T> after the Expat parser has been created.  This provides an easy method of enabling event routines.  The CMyXML::OnStartElement, CMyXML::OnEndElement, and CMyXML::OnCharacterData methods will be invoked by Expat while the XML text is being parsed.  These routines will not be invoked unless they are enabled.  The code inside CMyXML::OnPostCreate enables the three event routines.

Creating a Parser

Now that we have a derived class, we can use it to create an Expat parser.  Creating the parser is very easy.  First create an instance of the parser class, then invoke the Create method. 

The Create method has two arguments, the document encoding and the character used to separate namespaces a name.  The encoding is the default encoding that will be used while parsing the XML document unless an encoding is specified by in the XML document itself.  The namespace separator is used to separate the namespace from the name in calls such as OnStartElement

For example, if in the XML document there was the name SOAP_ENC:Envelope, the SOAP_ENC was defined as being "" and "#" was specified to Create, then OnStartElement would be invoked with the string "".

bool ParseSomeXML (LPCTSTR pszXMLText)
	CMyXML sParser;
	sParser .Create ();
	// do something useful

Parsing a Simple Text String

Next, we actually need to send the XML document to the parser.  There are two different methods of sending the document to the XML parser, directly or by internal buffers.  The easier of the two is sending the data directly to the parser.  However, it is also just a bit slower.

To send a simple string to the parser, the application invokes the Parse (LPCTSTR pszBuffer, int nLength = -1, bool fIsFinal = true) method.  The first argument is a pointer to a string of data to be parsed.  A routine has been defined for both ANSI and UNICODE strings.  The second parameter is the length of the string in characters (char or wchar_t depending on ANSI or UNICODE).  If nLength is less than zero, then it is required that the string pointed to by pszBuffer is a NUL terminated string and the length will be determined from the string.  If nLength is greater or equal to zero, then the string need not be NUL terminated and the length shouldn't include the NUL character if it exists.  The third parameter lets the XML parser know when there is no more data.  If the whole XML document can be contained within one simple string, then fIsFinal can be set to true the first time.  Otherwise, fIsFinal should remain false while there is more data to be parsed.  Parse can be invoked with a nLength set to zero and fIsFinal set to true after all data has been read in.

bool ParseSomeXML (LPCTSTR pszXMLText)
	CMyXML sParser;
	sParser .Create ();
	// Send this simple string to the parser
	return sParser .Parse (pszXMLText);

Parsing Using Internal Buffers

To reduce the number of extra memory copies, buffers internal to the Expat parser can be used instead of passing data into the parser just to have the Expat parser copy the data to internal buffers.  Using internal buffers takes 3 steps, requesting a buffer, reading data into the buffer, submitting the data to the parser. 

bool ParseSomeXML (LPCSTR pszFileName)

	// Create the parser 
	CMyXML sParser;
	if (!sParser .Create ())
		return false;
	// Open the file
	FILE *fp = fopen (pszFileName, "r");
	if (fp == NULL)
		return false;
	// Loop while there is data
	bool fSuccess = true;
	while (!feof (fp) && fSuccess)
		LPSTR pszBuffer = (LPSTR) sParser .GetBuffer (256); // REQUEST
		if (pszBuffer == NULL)
			fSuccess = false;
			int nLength = fread (pszBuffer, 1, 256, fp); // READ
			fSuccess = sParser .ParseBuffer (nLength, nLength == 0); // PARSE

	// Close the file
	fclose (fp);
	return fSuccess;

As you can see, this method is more complicated that the other, but when you modify the example in the previous section to read a file, the differences in complexity are minimal.

Working With Event Routines

Event routines provide the actual information about what has been parsed to the application.  The method names inside the CExpatImpl <class _T> class have been selected to make it easy to know which routine applies to what Expat event.

In Expat:

Set the event handler routineXML_Set[Event Name]Handler
Name of the event handlerApplication specific

In CExpatImpl <class _T>

Enable the event handler routineEnable[Event Name]Handler
Name of the event handlerOn[Event Name]
Name of the internal event handler[Event Name]Handler

So, if you wish to receive StartElement events, you define a method called OnStartElement with the proper arguments and invoke EnableStartElementHandler with a true for the only argument.  The event routine can be later disabled by invoking EnableStartElementHandler again with false as the only argument.

The specifics about each of the event routines is beyond the scope of this document.  For more information about the events and the Expat parser itself, see  The most all information contained within this document has a counterpart of the same name in CExpatImpl <class _T>.

Implementation Notes

As stated earlier, there are some pitfalls applications will have to be aware of when creating complex derived class hierarchies.  Let us consider the example of an XML parser consisting of two classes, CMyXMLBase and CMyXMLCMyXML is derived from CMyXMLBase and CMyXMLBase is derived from one of the Expat class wrappers.

Consider the case where the classes are derived from the CExpatImpl <class _T> template class.

class CMyXMLBase : public CExpatImpl <CMyXMLBase> 

	CMyXMLBase () 
	void OnStartElement (const XML_Char *pszName, const XML_Char **papszAttrs) 
		// do useful stuff here... 

class CMyXML : public CMyXMLBase

	CMyXML ()
	void OnStartElement (const XML_Char *pszName, const XML_Char **papszAttrs) 
		// do derived useful stuff here...

In this case, the programmer expects the OnStartElement to be invoked by the Expat parser.  However, due to the design of the CExpatImpl <class _T> class, only the methods of the class specified in the template argument list would be invoked.  This is by design.

There are three different way to fix this problem.  The first method would be to declare OnStartElement as being virtual in CMyXMLBase.  The second would be to derive CMyXMLBase from CExpat instead of CExpatImpl <class _T>.  The third method requires the changing of CMyXMLBase from a normal class to a template.  This change provides CExpatImpl <class _T> with the name of the class from which to locate the event routines.

template <class _T>
class CMyXMLBase : public CExpatImpl <_T> 

	CMyXMLBase () 
	void OnStartElement (const XML_Char *pszName, const XML_Char **papszAttrs) 
		// do useful stuff here... 

class CMyXML : public CMyXMLBase <CMyXML>

	CMyXML ()
	void OnStartElement (const XML_Char *pszName, const XML_Char **papszAttrs) 
		// do derived useful stuff here...


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Tim Smith
Web Developer
Canada Canada
Tim has been a professional programmer for way too long. He currently works in the tools department at BioWare Corp..

You may also be interested in...

Comments and Discussions

GeneralMessage Closed Pin
6-Sep-16 9:05
memberMember 127235396-Sep-16 9:05 
Generallnnk error : LNK2019 Pin
Yang-Seok Yoon4-Aug-06 16:51
memberYang-Seok Yoon4-Aug-06 16:51 
GeneralMessage Closed Pin
18-Sep-05 3:12
memberPriyank Bolia18-Sep-05 3:12 
GeneralLinker errors too Pin
tombox25-Jun-05 2:46
membertombox25-Jun-05 2:46 
GeneralRe: Linker errors too Pin
Tim Smith27-Jun-05 4:40
memberTim Smith27-Jun-05 4:40 
GeneralRe: Linker errors too Pin
RoadRashKing15-Dec-06 5:31
memberRoadRashKing15-Dec-06 5:31 
Generallinker errors Pin
alschmid10-Jun-05 1:18
memberalschmid10-Jun-05 1:18 
GeneralRe: linker errors Pin
Tim Smith10-Jun-05 4:51
memberTim Smith10-Jun-05 4:51 
GeneralRe: linker errors Pin
alschmid11-Jun-05 0:46
memberalschmid11-Jun-05 0:46 
GeneralLicence Question Pin
kdd29@msstate.edu14-Dec-04 12:10
memberkdd29@msstate.edu14-Dec-04 12:10 
GeneralMessage Closed Pin
17-Oct-16 3:48
memberMember 1279791217-Oct-16 3:48 
GeneralMessage Closed Pin
17-Oct-16 3:54
memberMember 1279791217-Oct-16 3:54 
GeneralExtracting Values Pin
CliffWoodger11-Nov-04 0:45
memberCliffWoodger11-Nov-04 0:45 
GeneralRe: Extracting Values Pin
Tim Smith11-Nov-04 5:55
memberTim Smith11-Nov-04 5:55 
GeneralWhy templates Pin
Werner BEROUX9-Nov-04 11:42
memberWerner BEROUX9-Nov-04 11:42 
GeneralRe: Why templates Pin
Tim Smith9-Nov-04 14:53
memberTim Smith9-Nov-04 14:53 
GeneralRe: Why templates Pin
Werner BEROUX10-Nov-04 10:12
memberWerner BEROUX10-Nov-04 10:12 
GeneralRe: Why templates Pin
Johnny Casey18-Mar-05 11:51
memberJohnny Casey18-Mar-05 11:51 
GeneralRe: Why templates Pin
robiwano4-May-09 9:46
memberrobiwano4-May-09 9:46 
GeneralProblems with OnCharacterData Methode Pin
outlast30-Sep-04 11:18
memberoutlast30-Sep-04 11:18 
GeneralRe: Problems with OnCharacterData Methode Pin
Tim Smith30-Sep-04 16:06
memberTim Smith30-Sep-04 16:06 
GeneralRe: Problems with OnCharacterData Methode Pin
outlast30-Sep-04 23:29
memberoutlast30-Sep-04 23:29 
GeneralRe: Problems with OnCharacterData Methode Pin
Tim Smith1-Oct-04 4:31
memberTim Smith1-Oct-04 4:31 
GeneralRe: Problems with OnCharacterData Methode Pin
outlast1-Oct-04 4:52
memberoutlast1-Oct-04 4:52 
Generalelements are not printed Pin 0:00 0:00 
GeneralConflict between CExpatImpl and other code Pin
kdd29@msstate.edu12-Apr-04 10:12
memberkdd29@msstate.edu12-Apr-04 10:12 
I have derived a class from CExpatImpl,
and I need to store a pointer in a Ulong
(unsigned long) class object as a data
member of my derived class.

When I try to do this, I can compile
my program, but during linking I encounter
many "undefined reference to: (Ulong related
methods)", when the methods are indeed defined.

The problem ONLY occurs when trying to instantiate
a Ulong object within the CExpatImpl derived
class. Not when instantiating the Ulong object
somewhere else in the same file and scope as where
a CExpatImmpl derived object exists, and not when all
the header files are included.

Do you have any idea why I can use Ulong and
CExpatImpl in the same file, but can't use
Ulong inside of CExpatImpl? Or why trying to do
so makes the linker unable to find select bits of
code in the other libraries I'm linking with?

- Kyle Duncan
GeneralRe: Conflict between CExpatImpl and other code Pin
Tim Smith12-Apr-04 11:02
memberTim Smith12-Apr-04 11:02 
GeneralRe: Conflict between CExpatImpl and other code Pin
Member 126232047-Jul-16 2:29
memberMember 126232047-Jul-16 2:29 
GeneralMessage Closed Pin
7-Jul-16 2:42
memberMember 126232047-Jul-16 2:42 
Generalparsing mutiple line xml structures one line at a time Pin
kdd29@msstate.edu3-Mar-04 7:28
memberkdd29@msstate.edu3-Mar-04 7:28 
GeneralRe: parsing mutiple line xml structures one line at a time Pin
Tim Smith3-Mar-04 9:24
memberTim Smith3-Mar-04 9:24 
GeneralRe: parsing mutiple line xml structures one line at a time Pin
kdd29@msstate.edu3-Mar-04 11:29
memberkdd29@msstate.edu3-Mar-04 11:29 
GeneralRe: parsing mutiple line xml structures one line at a time Pin
Gunasekaran Dharman29-Sep-04 1:20
sussGunasekaran Dharman29-Sep-04 1:20 
GeneralCannot pass derived parser to a function Pin
kdd29@msstate.edu27-Feb-04 7:13
memberkdd29@msstate.edu27-Feb-04 7:13 
GeneralRe: Cannot pass derived parser to a function Pin
Tim Smith27-Feb-04 15:59
memberTim Smith27-Feb-04 15:59 
Generaltrouble compiling example Pin
kdd29@msstate.edu4-Feb-04 9:57
memberkdd29@msstate.edu4-Feb-04 9:57 
GeneralRe: trouble compiling example Pin
Tim Smith4-Feb-04 15:23
memberTim Smith4-Feb-04 15:23 
Generaltrouble setting up (newbie) Pin
kdd29@msstate.edu30-Jan-04 12:09
memberkdd29@msstate.edu30-Jan-04 12:09 
GeneralRe: trouble setting up (newbie) Pin
Tim Smith30-Jan-04 15:27
memberTim Smith30-Jan-04 15:27 
GeneralExternalEntityRefHandler Pin
Caffeine17-Sep-03 9:21
memberCaffeine17-Sep-03 9:21 
GeneralRe: ExternalEntityRefHandler Pin
Werner BEROUX21-Nov-04 7:59
memberWerner BEROUX21-Nov-04 7:59 
QuestionNew lines? Pin
SukhUK16-Jul-03 10:11
memberSukhUK16-Jul-03 10:11 
AnswerRe: New lines? Pin
Tim Smith17-Jul-03 5:41
memberTim Smith17-Jul-03 5:41 
GeneralRe: New lines? Pin
SukhUK17-Jul-03 6:54
memberSukhUK17-Jul-03 6:54 
GeneralRe: New lines? Pin
Tim Smith17-Jul-03 15:51
memberTim Smith17-Jul-03 15:51 
Generaljunk characters in parsed string Pin
jaisubha1-Jul-03 22:55
memberjaisubha1-Jul-03 22:55 
GeneralRe: junk characters in parsed string Pin
Tim Smith2-Jul-03 4:27
memberTim Smith2-Jul-03 4:27 
GeneralExcellent Work Pin
Popeye Doyle Murray9-May-03 4:10
memberPopeye Doyle Murray9-May-03 4:10 
QuestionCopyright? Pin
Matt Weagle16-Dec-02 8:49
memberMatt Weagle16-Dec-02 8:49 
AnswerRe: Copyright? Pin
Tim Smith16-Dec-02 9:11
memberTim Smith16-Dec-02 9:11 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.171207.1 | Last Updated 18 Feb 2002
Article Copyright 2002 by Tim Smith
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid