Click here to Skip to main content
Licence 
First Posted 31 May 2004
Views 75,358
Bookmarked 50 times

Generate an XML parser automatically

By | 31 May 2004 | Article
An article on XML parser or code generation automatically.

Introduction

I think your first thought about this article would be that: "oh, another tool to parse XML like MSXML". In fact, this article is based on MSXML. What I will present for you is not a general XML parser, but a generator to create a specific XML parser. The purpose of my article is not to teach you some knowledge about a grammar parsing technique, but to provide you some idea of auto code generation through an XML parser generator. XML parser may not be of any use in your programming area, but that does not matter, if you could get a fresh feeling at the end of my article, it will also help in your future exciting programming life.

Background

Faced with plenty of XML files, I have to write plenty of code to retrieve information from them. Even powered by MSXML SDK and XPATH technique, I have to say, the work is hard. In fact, it will be quite boring and error-prone to write the following code:

IXMLDOMNodePtr psNode = 
  m_pXmlDoc->selectSingleNode(_T("/rss/channel/description"));
psNode = m_pXmlDoc->selectSingleNode(_T("/rss/channel/language"));
...

The sample XML snippet is given below:

<rss version="0.91">
 <channel>
   <description>XML.com features a rich mix of information 
       and services for the XML community.</description> 
   <language>en-us</language> 
   <item>
     <title>Normalizing XML, Part 2</title> 
   </item>
   <item>
     <title>The .NET Schema Object Model</title> 
   </item>
 </channel>
</rss>

The problem here is that if I need the XML node's value, I have to write down each XPATH to get it. For each kind of XML file, there should be a parser, very simple, but quite boring to implement.

The solution - Auto Code Generation

As a programmer, I am both passionate and lazy. I am too lazy to write a single line of the boring code above but I am very passionate to figure out a way to generate the code automatically. Here is my solution:

  • Write an algorithm to generate XPATH from XML files.
  • Write a template parser.
  • Fill XPATH into the template parser.

Since the XML file's schema is often not at hand, there will be some difficulties about how to figure out whether a XML node (e.g., /rss/channel/item) belongs to a structure. Current solution is that if its occurrence is greater than once, it will be treated as structure, otherwise a single node.

Since my language is C++, in each parser, the structure node's values are put into a STL vector, while the single node's value is retrieved by a defined enum type. The programming language for the parser is not important, and you can modify the generator to generate parsers in VB/Java/C#, whatever you like :-)

Points of Interest

As I said from the beginning, what I present for you is not only a code but most importantly, it is an idea (Auto Code Generation) to save your energy and make our programming life easier and more exciting. So, if you have the same experience (not limited to XML area), please contact me freely, and it will be my pleasure to offer my advice. As it goes: "You have an apple, I have an apple; if we exchange, then we still have one each. You have an idea, I have an idea; if we exchange, then we will have two each!"

History

The algorithm to judge whether a node is a structure was revisited to meet such use case: "/items/item", though it may appear in an XML file only once. I think it is more like a structure than a single node.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Albert Wang

Software Developer (Senior)

China China

Member

Albert Wang graduated from Shanghai Jiaotong Univ. with a master degree in Computer Science.
He introduced himself as following:"As a programmer,I am both passionate and lazy." He likes travel and piano.You can contact him by chunguolan@gmail.com

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
GeneralA sample project PinmemberVinay Patil22:17 28 Oct '06  
GeneralAbout the Common.h PinmemberAlbert Wang16:09 8 Jun '04  
GeneralIt is a good idea! PinmemberMichael Ning5:53 8 Jun '04  
GeneralRe: It is a good idea! PinmemberAlbert Wang16:45 8 Jun '04  
GeneralSome changes I needed to make to get it to compile... PinmemberJubjub13:52 1 Jun '04  
GeneralRe: Some changes I needed to make to get it to compile... PinmemberAlbert Wang14:57 1 Jun '04  
GeneralI love this stuff :) Pinmembermvicky3:16 1 Jun '04  
GeneralYes,it needs MSXML 4.0,here it is PinmemberAlbert Wang2:27 1 Jun '04  
GeneralRe: Yes,it needs MSXML 4.0,here it is PinmemberLordAhriman7:00 1 Sep '05  
oooopphhh...i've already ported it to old-one Frown | :-(
It's not so easy, but now it works with old MSXML.
GeneralError PinmemberGilad Novik0:37 1 Jun '04  
GeneralAnother error PinmemberGilad Novik0:42 1 Jun '04  
GeneralRe: Another error PinmemberAlbert Wang2:39 1 Jun '04  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web02 | 2.5.120529.1 | Last Updated 1 Jun 2004
Article Copyright 2004 by Albert Wang
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid