Click here to Skip to main content
6,595,854 members and growing! (18,050 online)
Email Password   helpLost your password?
General Programming » Programming Tips » Design and Strategy     Intermediate

Generate an XML parser automatically

By Albert Wang

An article on XML parser or code generation automatically.
XML, VC6, VC7, VC7.1.NET 1.0, .NET 1.1, Win2K, WinXP, Win2003, Visual Studio, STL, Dev
Posted:31 May 2004
Views:58,252
Bookmarked:43 times
Announcements
Loading...
 
Search    
Advanced Search
Add to IE Search
printPrint   add Share
      Discuss Discuss   Broken Article?Report  
12 votes for this article.
Popularity: 3.96 Rating: 3.67 out of 5
2 votes, 16.7%
1

2
1 vote, 8.3%
3
4 votes, 33.3%
4
5 votes, 41.7%
5

Introduction

I think your first thought about this article would be that: "oh, another tool to parse XML like MSXML". In fact, this article is based on MSXML. What I will present for you is not a general XML parser, but a generator to create a specific XML parser. The purpose of my article is not to teach you some knowledge about a grammar parsing technique, but to provide you some idea of auto code generation through an XML parser generator. XML parser may not be of any use in your programming area, but that does not matter, if you could get a fresh feeling at the end of my article, it will also help in your future exciting programming life.

Background

Faced with plenty of XML files, I have to write plenty of code to retrieve information from them. Even powered by MSXML SDK and XPATH technique, I have to say, the work is hard. In fact, it will be quite boring and error-prone to write the following code:

IXMLDOMNodePtr psNode = 
  m_pXmlDoc->selectSingleNode(_T("/rss/channel/description"));
psNode = m_pXmlDoc->selectSingleNode(_T("/rss/channel/language"));
...

The sample XML snippet is given below:

<rss version="0.91">
 <channel>
   <description>XML.com features a rich mix of information 
       and services for the XML community.</description> 
   <language>en-us</language> 
   <item>
     <title>Normalizing XML, Part 2</title> 
   </item>
   <item>
     <title>The .NET Schema Object Model</title> 
   </item>
 </channel>
</rss>

The problem here is that if I need the XML node's value, I have to write down each XPATH to get it. For each kind of XML file, there should be a parser, very simple, but quite boring to implement.

The solution - Auto Code Generation

As a programmer, I am both passionate and lazy. I am too lazy to write a single line of the boring code above but I am very passionate to figure out a way to generate the code automatically. Here is my solution:

  • Write an algorithm to generate XPATH from XML files.
  • Write a template parser.
  • Fill XPATH into the template parser.

Since the XML file's schema is often not at hand, there will be some difficulties about how to figure out whether a XML node (e.g., /rss/channel/item) belongs to a structure. Current solution is that if its occurrence is greater than once, it will be treated as structure, otherwise a single node.

Since my language is C++, in each parser, the structure node's values are put into a STL vector, while the single node's value is retrieved by a defined enum type. The programming language for the parser is not important, and you can modify the generator to generate parsers in VB/Java/C#, whatever you like :-)

Points of Interest

As I said from the beginning, what I present for you is not only a code but most importantly, it is an idea (Auto Code Generation) to save your energy and make our programming life easier and more exciting. So, if you have the same experience (not limited to XML area), please contact me freely, and it will be my pleasure to offer my advice. As it goes: "You have an apple, I have an apple; if we exchange, then we still have one each. You have an idea, I have an idea; if we exchange, then we will have two each!"

History

The algorithm to judge whether a node is a structure was revisited to meet such use case: "/items/item", though it may appear in an XML file only once. I think it is more like a structure than a single node.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Albert Wang


Member
Albert Wang graduated from Shanghai Jiaotong Univ. with a master degree in Computer Science.
He introduced himself as following:"As a programmer,I am both passionate and lazy." He likes travel and piano.You can contact him by chunguolan@gmail.com
Occupation: Software Developer (Senior)
Location: China China

Other popular Programming Tips articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
 Msgs 1 to 12 of 12 (Total in Forum: 12) (Refresh)FirstPrevNext
GeneralA sample project PinmemberVinay Patil23:17 28 Oct '06  
GeneralAbout the Common.h PinmemberAlbert Wang17:09 8 Jun '04  
GeneralIt is a good idea! PinmemberMichael Ning6:53 8 Jun '04  
GeneralRe: It is a good idea! PinmemberAlbert Wang17:45 8 Jun '04  
GeneralSome changes I needed to make to get it to compile... PinmemberJubjub14:52 1 Jun '04  
GeneralRe: Some changes I needed to make to get it to compile... PinmemberAlbert Wang15:57 1 Jun '04  
GeneralI love this stuff :) Pinmembermvicky4:16 1 Jun '04  
GeneralYes,it needs MSXML 4.0,here it is PinmemberAlbert Wang3:27 1 Jun '04  
GeneralRe: Yes,it needs MSXML 4.0,here it is PinmemberLordAhriman8:00 1 Sep '05  
GeneralError PinmemberGilad Novik1:37 1 Jun '04  
GeneralAnother error PinmemberGilad Novik1:42 1 Jun '04  
GeneralRe: Another error PinmemberAlbert Wang3:39 1 Jun '04  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 31 May 2004
Editor: Smitha Vijayan
Copyright 2004 by Albert Wang
Everything else Copyright © CodeProject, 1999-2009
Web22 | Advertise on the Code Project