Click here to Skip to main content
15,883,705 members
Articles / Programming Languages / XML
Article

Generate an XML parser automatically

Rate me:
Please Sign up or sign in to vote.
4.08/5 (12 votes)
31 May 20043 min read 106.5K   2K   52   12
An article on XML parser or code generation automatically.

Introduction

I think your first thought about this article would be that: "oh, another tool to parse XML like MSXML". In fact, this article is based on MSXML. What I will present for you is not a general XML parser, but a generator to create a specific XML parser. The purpose of my article is not to teach you some knowledge about a grammar parsing technique, but to provide you some idea of auto code generation through an XML parser generator. XML parser may not be of any use in your programming area, but that does not matter, if you could get a fresh feeling at the end of my article, it will also help in your future exciting programming life.

Background

Faced with plenty of XML files, I have to write plenty of code to retrieve information from them. Even powered by MSXML SDK and XPATH technique, I have to say, the work is hard. In fact, it will be quite boring and error-prone to write the following code:

C#
IXMLDOMNodePtr psNode = 
  m_pXmlDoc->selectSingleNode(_T("/rss/channel/description"));
psNode = m_pXmlDoc->selectSingleNode(_T("/rss/channel/language"));
...

The sample XML snippet is given below:

XML
<rss version="0.91">
 <channel>
   <description>XML.com features a rich mix of information 
       and services for the XML community.</description> 
   <language>en-us</language> 
   <item>
     <title>Normalizing XML, Part 2</title> 
   </item>
   <item>
     <title>The .NET Schema Object Model</title> 
   </item>
 </channel>
</rss>

The problem here is that if I need the XML node's value, I have to write down each XPATH to get it. For each kind of XML file, there should be a parser, very simple, but quite boring to implement.

The solution - Auto Code Generation

As a programmer, I am both passionate and lazy. I am too lazy to write a single line of the boring code above but I am very passionate to figure out a way to generate the code automatically. Here is my solution:

  • Write an algorithm to generate XPATH from XML files.
  • Write a template parser.
  • Fill XPATH into the template parser.

Since the XML file's schema is often not at hand, there will be some difficulties about how to figure out whether a XML node (e.g., /rss/channel/item) belongs to a structure. Current solution is that if its occurrence is greater than once, it will be treated as structure, otherwise a single node.

Since my language is C++, in each parser, the structure node's values are put into a STL vector, while the single node's value is retrieved by a defined enum type. The programming language for the parser is not important, and you can modify the generator to generate parsers in VB/Java/C#, whatever you like :-)

Points of Interest

As I said from the beginning, what I present for you is not only a code but most importantly, it is an idea (Auto Code Generation) to save your energy and make our programming life easier and more exciting. So, if you have the same experience (not limited to XML area), please contact me freely, and it will be my pleasure to offer my advice. As it goes: "You have an apple, I have an apple; if we exchange, then we still have one each. You have an idea, I have an idea; if we exchange, then we will have two each!"

History

The algorithm to judge whether a node is a structure was revisited to meet such use case: "/items/item", though it may appear in an XML file only once. I think it is more like a structure than a single node.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Software Developer (Senior)
China China
Albert Wang graduated from Shanghai Jiaotong Univ. with a master degree in Computer Science.
He introduced himself as following:"As a programmer,I am both passionate and lazy." He likes travel and piano.You can contact him by chunguolan@gmail.com

Comments and Discussions

 
GeneralA sample project Pin
Vinay Patil28-Oct-06 22:17
Vinay Patil28-Oct-06 22:17 
GeneralAbout the Common.h Pin
Albert Wang8-Jun-04 16:09
Albert Wang8-Jun-04 16:09 
GeneralIt is a good idea! Pin
Michael Ning8-Jun-04 5:53
Michael Ning8-Jun-04 5:53 
GeneralRe: It is a good idea! Pin
Albert Wang8-Jun-04 16:45
Albert Wang8-Jun-04 16:45 
GeneralSome changes I needed to make to get it to compile... Pin
Vadim Tabakman1-Jun-04 13:52
Vadim Tabakman1-Jun-04 13:52 
GeneralRe: Some changes I needed to make to get it to compile... Pin
Albert Wang1-Jun-04 14:57
Albert Wang1-Jun-04 14:57 
GeneralI love this stuff :) Pin
1-Jun-04 3:16
suss1-Jun-04 3:16 
GeneralYes,it needs MSXML 4.0,here it is Pin
Albert Wang1-Jun-04 2:27
Albert Wang1-Jun-04 2:27 
GeneralRe: Yes,it needs MSXML 4.0,here it is Pin
Anthony Akentiev1-Sep-05 7:00
Anthony Akentiev1-Sep-05 7:00 
GeneralError Pin
Gilad Novik1-Jun-04 0:37
Gilad Novik1-Jun-04 0:37 
GeneralAnother error Pin
Gilad Novik1-Jun-04 0:42
Gilad Novik1-Jun-04 0:42 
GeneralRe: Another error Pin
Albert Wang1-Jun-04 2:39
Albert Wang1-Jun-04 2:39 
Sorry for the inconvenience,I think you'd better have a debug to see the reason.
I included the whole project after all Smile | :)

"As a programmer,I am both passionate and lazy."

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.