|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
IntroductionThe Expat XML Parser is a fine and widely used event based XML parser. One of the nicer features of Expat is that it has an API capable of being used by C programs. Even though many programmers use Expat in a C++ environment, the C based API makes it easy to export this API from a DLL. However, Expat being a C based API doesn't mean we have to live without our C++ classes. Luckily, Expat was designed with the ability to be augmented with classes. (Definition: Event Based XML Parser - An XML parser which invokes methods (a.k.a. events) when XML constructs are parsed. This differs from the DOM (Document Object Model) style parsers that parse the XML and then present the application with XML data in its logical hierarchical format.) Design RationalThe primary considerations when designing the Expat wrapper classes was
completeness, simplicity, and extensibility. For completeness, almost all
Expat API routines have been wrapped in the classes. This includes even
API such as BasicsThis Expat wrappers consist of 2 classes, a template based class
( The following table illustrates the relationship between the API and the two classes.
The template class The Within reason, the two classes are interchangeable. If you have a class that is derived from For the rest of this document, only the Getting StartedThe first step in using As a starting point, let us define an XML parser that will display when an element begins, ends, and the data contained within the element. class CMyXML : public CExpatImpl <CMyXML>
{
public:
// Constructor
CMyXML ()
{
}
// Invoked by CExpatImpl after the parser is created
void OnPostCreate ()
{
// Enable all the event routines we want
EnableStartElementHandler ();
EnableEndElementHandler ();
// Note: EnableElementHandler will do both start and end
EnableCharacterDataHandler ();
}
// Start element handler
void OnStartElement (const XML_Char *pszName, const XML_Char **papszAttrs)
{
printf ("We got a start element %s\n", pszName);
return;
}
// End element handler
void OnEndElement (const XML_Char *pszName)
{
printf ("We got an end element %s\n", pszName);
return;
}
// Character data handler
void OnCharacterData (const XML_Char *pszData, int nLength)
{
// note, pszData is NOT null terminated
printf ("We got %d bytes of data\n", nLength);
return;
}
};
The Creating a ParserNow that we have a derived class, we can use it to create an Expat parser.
Creating the parser is very easy. First create an instance of the parser
class, then invoke the The For example, if in the XML document there was the name
bool ParseSomeXML (LPCTSTR pszXMLText)
{
CMyXML sParser;
sParser .Create ();
// do something useful
}
Parsing a Simple Text StringNext, we actually need to send the XML document to the parser. There are two different methods of sending the document to the XML parser, directly or by internal buffers. The easier of the two is sending the data directly to the parser. However, it is also just a bit slower. To send a simple string to the parser, the application invokes the bool ParseSomeXML (LPCTSTR pszXMLText)
{
CMyXML sParser;
sParser .Create ();
// Send this simple string to the parser
return sParser .Parse (pszXMLText);
}
Parsing Using Internal BuffersTo reduce the number of extra memory copies, buffers internal to the Expat parser can be used instead of passing data into the parser just to have the Expat parser copy the data to internal buffers. Using internal buffers takes 3 steps, requesting a buffer, reading data into the buffer, submitting the data to the parser. bool ParseSomeXML (LPCSTR pszFileName)
{
// Create the parser
CMyXML sParser;
if (!sParser .Create ())
return false;
// Open the file
FILE *fp = fopen (pszFileName, "r");
if (fp == NULL)
return false;
// Loop while there is data
bool fSuccess = true;
while (!feof (fp) && fSuccess)
{
LPSTR pszBuffer = (LPSTR) sParser .GetBuffer (256); // REQUEST
if (pszBuffer == NULL)
fSuccess = false;
else
{
int nLength = fread (pszBuffer, 1, 256, fp); // READ
fSuccess = sParser .ParseBuffer (nLength, nLength == 0); // PARSE
}
}
// Close the file
fclose (fp);
return fSuccess;
}
As you can see, this method is more complicated that the other, but when you modify the example in the previous section to read a file, the differences in complexity are minimal. Working With Event RoutinesEvent routines provide the actual information about what has been parsed to the
application. The method names inside the In Expat:
In CExpatImpl <class _T>
So, if you wish to receive StartElement events, you define a method called
The specifics about each of the event routines is beyond the scope of this
document. For more information about the events and the Expat parser
itself, see http://www.xml.com/pub/a/1999/09/expat/index.html.
The most all information contained within this document has a counterpart
of the same name in Implementation NotesAs stated earlier, there are some pitfalls applications will have to be aware of
when creating complex derived class hierarchies. Let us consider the example
of an XML parser consisting of two classes,
Consider the case where the classes are derived from the class CMyXMLBase : public CExpatImpl <CMyXMLBase>
{
public:
CMyXMLBase ()
{
}
void OnStartElement (const XML_Char *pszName, const XML_Char **papszAttrs)
{
// do useful stuff here...
return;
}
};
class CMyXML : public CMyXMLBase
{
public:
CMyXML ()
{
}
void OnStartElement (const XML_Char *pszName, const XML_Char **papszAttrs)
{
// do derived useful stuff here...
return;
}
};
In this case, the programmer expects the There are three different way to fix this problem. The first method would
be to declare template <class _T>
class CMyXMLBase : public CExpatImpl <_T>
{
public:
CMyXMLBase ()
{
}
void OnStartElement (const XML_Char *pszName, const XML_Char **papszAttrs)
{
// do useful stuff here...
return;
}
};
class CMyXML : public CMyXMLBase <CMyXML>
{
public:
CMyXML ()
{
}
void OnStartElement (const XML_Char *pszName, const XML_Char **papszAttrs)
{
// do derived useful stuff here...
return;
}
};
About the AuthorTim has been a professional programmer for way too long. He currently works at a company he co-founded that specializes in data acquisition software for industrial automation. | |||||||||||||||||||||||||||||||||