Click here to Skip to main content
Click here to Skip to main content

A simple XML Parser

, 28 Jan 2002 CPOL
Rate this:
Please Sign up or sign in to vote.
A class to read and write non validated XML files

Viewer Sample

Sample Image - XmlViewer.gif

<!-- Article Starts -->

Introduction

Presented here is an Element Centric class that supports reading and writing non validated XML files. Although it is very close to a DOM centric class (It does use the MSXML 4 DOM parser to read) it is different in that each element is an instance that may be used alone. This give the programmer a great deal of flexability AND also responsibiity. This class was created in 1995 to handle data transmittal for intranet applications. With the release of the XML specification the structure was modified to conform to the non validated format. The XML class includes two lists to hold the basic data. One for child XML elements and the other for parameter entries. The basic class is actually a root for an individual XML element. A proper XML document has a header line describing that it is XML with at least a version definition. Following that initial line is a single XML element that wraps the embedded data.

The class does have a Save/SaveAs function and this can be used to create an XML document by using the class as the root element. The general practice that I have been following is to open a file, write the header line and the bounding element tag. I then use this file pointer and the XMLWrite function to write the sequence of child elements out. You then have to terminate the file with the closing tag for the bounding element.

For myself the focal point of the code being an element rather than the document has given me flexibility in this classes usage that I have not seen else ware.

This being very general purpose it will be easy for some to use this code outside of any paths that I have taken. If you find a bug please drop me a line.

History

Jan 2002 - Migrated the V6 MFC String extension to V7. With this port the non-MFC code has been delete. Also fixed a potential leak in the TextFile class.

March 2001 - This update fixes a couple of bugs (primarily leaks) and has a new MabString class. This new string class is no longer an extension of the MFC CString. This was done due to a conflict with the .Net beta. For large string blocks (several 1000 characters) the CString is faster (see CString Management By Joseph M. Newcomer.)

August 2000 - The primary differences between this release and the previous are:

  • The demo application has been changed to be a little more useful.
  • The parameter classes have been renamed to "Attribute" classes to be more in line with common XML notation.
  • With the conversion of VRML to XML my usage has added data that has large attribute fields. Previously the reading of these fields was done a character at a time. This was total unusable with fields of 1000's of character. The reading is now done in with buffered character blocks. A second change related to these large attribute blocks was the addition of code in the MABString class that allows stepping through the child fields of the attribute by using the "StepField" to start and the "NextField" function to step through the Attribute data.

XML Comments

To start with one should be aware of "http://www.w3.org/TR/REC-xml". You will find the definitions of valid and well-formed. I find many misuse these terms. A valid document has either a DTD or a Schema associated with the document. For internal documents that come from trusted sources the validation is an extra step that may not be required. For an external untrusted source this changes. Definitions of trust must be evaluated case by case. A unvalidated parser such as this can easily be used to read in a valid document, but it does not do the validation step.

Other sites that I have found useful (of many):

and SoftwareAG has several white papers and presentations on their site at: http://www.softwareag.com.

Viewer Sample Comments

This sample shows the usage of the code to read in an existing file. and display the hierarch in a tree with the attributes in a second pane.

General File Handling

Although Open and Save functions are provided, general usage would take responsibility for opening and closing the file to read and write to and pass a pointer to the file via the XMLRead and XMLWrite functions. This release assumes the file pointer is derived from the MFC CFile class. Internally this class only handles a single root element. However no repositioning of the file pointer is done so the hosting program may position the file pointer and the read function will start at that point to find the next valid block. If multiple roots are desired you just start (with another variable based on the XML class) where the last read left the pointer.

MFC Dependencies

The enclosed code is dependent on MFC. The reading uses the CFile class and the two lists use the typed CObList class. The derivation of the XML and CParam classes from CObject is only done to be compatible with the lists. Each list may start at either end and the position traversed in both directions from the current position. When the end is pasted a NULL is returned and the current position is set to NULL. The hosting program must restart with the get first or last function calls.

Read This - Embedded Text Data - Read This

For my applications embedded text is not dependent on it's position relative to child elements. Depending on your desired usage this may be unsatisfactory. If you are not sure I highly recommend that you experiment with either the MSXML parser of Apache's Xerces, and get a feel for child nodes that exist for a standard DOM parser. All of those formatting returns and spaces that are between child elements are a collection of text nodes. In this code you do not have that. The class here has a list of attributes and child elements and a single text block vs a DOM that has a list of child nodes of which one is the root element. That root element in turn has a list of chld nodes. Some of which are text, elements, etc. Written files have all text at the start of the element block. In the read file this text may be scattered throughout the block.

Public Function

Construction Functions

CXML();
virtual ~CXML();
void ResetContents();

File handling

BOOL Open(LPCTSTR InFilter="XML Files (*.xml)| *.xml| All Files (*.*)| *.*||", 
          LPCTSTR TagName="XML");
BOOL Save(LPCTSTR InFilter="XML Files (*.xml)| *.xml| All Files (*.*)| *.*||");
BOOL SaveAs(LPCTSTR InFilter="XML Files (*.xml)| *.xml| All Files (*.*)| *.*||");
void XMLWrite(CFile *pFile);
BOOL XMLRead(CFile *pFile, LPCTSTR TagName="XML");

Element Functions

LPCTSTR GetTagName();
void SetTagName(LPCTSTR NewName);
LPCTSTR GetText();
void SetText(LPCTSTR TextBlock);

Attribute access functions

Attributes must be unique.
If an attribute name exists its value is updated.
If a requested attribute does not exist a null or zero is returned.

long GetAttrCount();
void SetAttr(LPCTSTR Name, LPCTSTR Value);
void SetAttr(LPCTSTR Name, int Value);
void SetAttr(LPCTSTR Name, long Value);
void SetAttr(LPCTSTR Name, float Value);
void SetAttr(LPCTSTR Name, double Value);
void SetCurrentAttr(CAttribute *Attribute);
CAttribute* GetCurrentAttribute();
CAttribute* GetLastAttribute();
CAttribute* GetPrevAttribute();
CAttribute* GetNextAttribute();
CAttribute* GetFirstAttribute();
CAttribute* GetAttributePointer(LPCTSTR Name);
CMabString GetAttrText(LPCTSTR Name);
CMabString GetAttrTextUS(LPCTSTR Name);
CMabString GetTag();
CMabString GetAttr(LPCTSTR Name);
int GetAttrInt(LPCTSTR Name);
long GetAttrLong(LPCTSTR Name);
float GetAttrFloat(LPCTSTR Name);
double GetAttrDouble(LPCTSTR Name);
void RemoveCurrentAttr();
void AddAttr(CAttribute *Attribute);
void RemoveAttr(LPCSTR Name);
void SetAttrUS(LPCSTR Name, double value);

Child Access Functions

Children may have as many duplicated types as desired. The Children's order is maintained by the program.

void AddChildById(CXML *newchild);
void AddChildByName(CXML *newchild);
void SortById();
void SortByName();
CXML* FindByName(CString name);
CXML* FindByID(long id);
void InsertChild(CXML *Child);
long GetChildCount();
void AddChild(CXML *Child);
void SetCurrentChild(CXML *Child);
void RemoveCurrentChild();
CXML* GetCurrentChild();
CXML* GetLastChild();
CXML* GetPrevChild();
CXML* GetNextChild();
CXML* GetFirstChild();
void InsertChildBefore(int ZeroBasedPos, CXML *Child);
CXML* RemoveAt(int position);
CXML* GetChildAt(int index);
-- and data ---
CMabString GetField(char Delim, int FieldNum);
CMabString FirstField(char Delim);
CMabString NextField(char Delim);

Example input file:

<?xml version=\"1.0\" encoding=\"utf-8\" ?>
<&ROOT>
<People>
This is internal text.
LIne 2
<VALIDITY VALUE="TRUE" SUPPERCED=""/>
<Person NAME="LastName,First" ID="001.03"/>
<Person APointless="Entry is this" AnyData="Can this be"/>
<Members NAME="Section 1" CLASSID="MyClassId" VERSION="1,0,0,0">
<Person NAME="Michael" CLASSID="2.02" VERSION="0,0,0,1" DATE="12/02/1997"/>
</Members>

[...Some more lines]

<Members NAME="Typical" CLASSID="3.03" VERSION="1,0,0,0">
<Person NAME="George2" CLASSID="4.44" VERSION="0,0,0,1" DATE="12/02/1997"/>
</Members>
</People>
</&ROOT>

Example output file:

<?xml version=\"1.0\" encoding=\"utf-8\" ?>
<&ROOT>
<People>
This is internal text.
LIne 2
So more lines
<VALIDITY SUPPERCED="" VALUE="TRUE"/>
<Person ID="001.03" NAME="LastName,First"/>
<Person APointless="Entry is this" AnyData="Can this be" Gothis="Jack"/>
<Members CLASSID="MyClassId" NAME="Section 1" VERSION="1,0,0,0">
<Person CLASSID="2.02" DATE="12/02/1997" NAME="Michael" VERSION="0,0,0,1"/>
</Members>
<Members CLASSID="3.03" NAME="Typical" VERSION="1,0,0,0">
<Person CLASSID="4.44" DATE="12/02/1997" NAME="George2" VERSION="0,0,0,1"/>
</Members>
</People>
</&ROOT>

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Michael A. Barnhart
Systems Engineer
United States United States
Began programming in 1968 on a Wang 720. Move to Fortran and began developing FEM (finite element model) applications on an IBM 360 in 1973. Developed custom FEM editors for most of my career until 1995. Since then I have been focusing on improving information flow and quality with web based communications (Web Services and SOA concepts.) Mostly in an evangelist role.

Comments and Discussions

 
Generalhave trouble when parse XML (resend) Pinsussf_iks16-Mar-05 19:38 
GeneralRe: have trouble when parse XML (resend) PinmemberMichael A. Barnhart17-Mar-05 8:07 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.141216.1 | Last Updated 29 Jan 2002
Article Copyright 2000 by Michael A. Barnhart
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid