<!-- Article Starts -->
Presented here is an Element Centric class that supports reading and writing non validated XML
files. Although it is very close to a DOM centric class (It does use the MSXML 4
DOM parser to read) it is different in that each element is an instance that may be
used alone. This give the programmer a great deal of flexability AND also responsibiity.
This class was created in 1995 to handle data transmittal for intranet
applications. With the release of the XML specification the structure was modified
to conform to the non validated format. The XML class includes two lists to hold
the basic data. One for child XML elements and the other for parameter entries.
The basic class is actually a root for an individual XML element. A proper XML document
has a header line describing that it is XML with at least a version definition.
Following that initial line is a single XML element that wraps the embedded data.
The class does have a Save/SaveAs function and this can be used to create an XML document
by using the class as the root element. The general practice that I have been following
is to open a file, write the header line and the bounding element tag. I then use
this file pointer and the XMLWrite function to write the sequence of child elements
out. You then have to terminate the file with the closing tag for the bounding element.
For myself the focal point of the code being an element rather than the document has
given me flexibility in this classes usage that I have not seen else ware.
This being very general purpose it will be easy for some to use this code outside
of any paths that I have taken. If you find a bug please drop me a line.
Jan 2002 - Migrated the V6 MFC String extension to V7. With this port the non-MFC code
has been delete. Also fixed a potential leak in the
March 2001 - This update fixes a couple of bugs (primarily leaks) and has a new
This new string class is no longer an extension of the MFC
. This was done due
to a conflict with the .Net beta. For large string blocks (several 1000 characters) the
is faster (see
Management By Joseph M. Newcomer.)
August 2000 - The primary differences between this release and the previous are:
- The demo application has been changed to be a little more useful.
- The parameter classes have been renamed to "Attribute" classes to be more in line with common XML notation.
- With the conversion of VRML to XML my usage
has added data that has large attribute fields. Previously the reading of these fields was done a character
at a time. This was total unusable with fields of 1000's of character. The reading is now done in with buffered
character blocks. A second change related to these large attribute blocks was the addition of code in the
class that allows stepping through the child fields of the attribute by using the
StepField" to start and the "
function to step through the Attribute data.
To start with one should be aware of "http://www.w3.org/TR/REC-xml". You will find
the definitions of valid and well-formed. I find many misuse these terms. A valid
document has either a DTD or a Schema associated with the document. For internal
documents that come from trusted sources the validation is an extra step that may
not be required. For an external untrusted source this changes. Definitions of trust
must be evaluated case by case. A unvalidated parser such as this can easily be used
to read in a valid document, but it does not do the validation step.
Other sites that I have found useful (of many):
and SoftwareAG has several white papers and presentations on their site at: http://www.softwareag.com.
Viewer Sample Comments
This sample shows the usage of the code to read in an existing file. and display
the hierarch in a tree with the attributes in a second pane.
General File Handling
Although Open and Save functions are provided, general usage would take
responsibility for opening and closing the file to read and write to and pass a
pointer to the file via the
XMLWrite functions. This
release assumes the file pointer is derived from the MFC
Internally this class only handles a single root element. However no repositioning of
the file pointer is done so the hosting program may position the file pointer and the
read function will start at that point to find the next valid block. If multiple roots
are desired you just start (with another variable based on the XML class) where the last
read left the pointer.
The enclosed code is dependent on MFC. The reading uses the
class and the two lists use the typed
CObList class. The derivation of the XML and
CObject is only done to be compatible with the lists. Each list may start at
either end and the position traversed in both directions from the current position.
When the end is pasted a NULL is returned and the current position is set to NULL.
The hosting program must restart with the get first or last function calls.
Read This - Embedded Text Data - Read This
For my applications embedded text is not dependent on it's position relative to
child elements. Depending on your desired usage this may be unsatisfactory.
If you are not sure I highly recommend that you experiment with either the MSXML
parser of Apache's Xerces, and get a feel for child nodes that exist for a
standard DOM parser. All of those formatting returns and spaces that are between
child elements are a collection of text nodes. In this code you do not have that.
The class here has a list of attributes and child elements and a single text block vs
a DOM that has a list of child nodes of which one is the root element. That root
element in turn has a list of chld nodes. Some of which are text, elements, etc.
Written files have all text at the start of the element block. In the read
file this text may be scattered throughout the block.
BOOL Open(LPCTSTR InFilter="XML Files (*.xml)| *.xml| All Files (*.*)| *.*||",
BOOL Save(LPCTSTR InFilter="XML Files (*.xml)| *.xml| All Files (*.*)| *.*||");
BOOL SaveAs(LPCTSTR InFilter="XML Files (*.xml)| *.xml| All Files (*.*)| *.*||");
void XMLWrite(CFile *pFile);
BOOL XMLRead(CFile *pFile, LPCTSTR TagName="XML");
void SetTagName(LPCTSTR NewName);
void SetText(LPCTSTR TextBlock);
Attribute access functions
Attributes must be unique.
If an attribute name exists its value is updated.
If a requested attribute does not exist a null or zero is returned.
void SetAttr(LPCTSTR Name, LPCTSTR Value);
void SetAttr(LPCTSTR Name, int Value);
void SetAttr(LPCTSTR Name, long Value);
void SetAttr(LPCTSTR Name, float Value);
void SetAttr(LPCTSTR Name, double Value);
void SetCurrentAttr(CAttribute *Attribute);
CAttribute* GetAttributePointer(LPCTSTR Name);
CMabString GetAttrText(LPCTSTR Name);
CMabString GetAttrTextUS(LPCTSTR Name);
CMabString GetAttr(LPCTSTR Name);
int GetAttrInt(LPCTSTR Name);
long GetAttrLong(LPCTSTR Name);
float GetAttrFloat(LPCTSTR Name);
double GetAttrDouble(LPCTSTR Name);
void AddAttr(CAttribute *Attribute);
void RemoveAttr(LPCSTR Name);
void SetAttrUS(LPCSTR Name, double value);
Child Access Functions
Children may have as many duplicated types as desired.
The Children's order is maintained by the program.
void AddChildById(CXML *newchild);
void AddChildByName(CXML *newchild);
CXML* FindByName(CString name);
CXML* FindByID(long id);
void InsertChild(CXML *Child);
void AddChild(CXML *Child);
void SetCurrentChild(CXML *Child);
void InsertChildBefore(int ZeroBasedPos, CXML *Child);
CXML* RemoveAt(int position);
CXML* GetChildAt(int index);
-- and data ---
CMabString GetField(char Delim, int FieldNum);
CMabString FirstField(char Delim);
CMabString NextField(char Delim);
Example input file:
<?xml version=\"1.0\" encoding=\"utf-8\" ?>
This is internal text.
<VALIDITY VALUE="TRUE" SUPPERCED=""/>
<Person NAME="LastName,First" ID="001.03"/>
<Person APointless="Entry is this" AnyData="Can this be"/>
<Members NAME="Section 1" CLASSID="MyClassId" VERSION="1,0,0,0">
<Person NAME="Michael" CLASSID="2.02" VERSION="0,0,0,1" DATE="12/02/1997"/>
[...Some more lines]
<Members NAME="Typical" CLASSID="3.03" VERSION="1,0,0,0">
<Person NAME="George2" CLASSID="4.44" VERSION="0,0,0,1" DATE="12/02/1997"/>
Example output file:
<?xml version=\"1.0\" encoding=\"utf-8\" ?>
This is internal text.
So more lines
<VALIDITY SUPPERCED="" VALUE="TRUE"/>
<Person ID="001.03" NAME="LastName,First"/>
<Person APointless="Entry is this" AnyData="Can this be" Gothis="Jack"/>
<Members CLASSID="MyClassId" NAME="Section 1" VERSION="1,0,0,0">
<Person CLASSID="2.02" DATE="12/02/1997" NAME="Michael" VERSION="0,0,0,1"/>
<Members CLASSID="3.03" NAME="Typical" VERSION="1,0,0,0">
<Person CLASSID="4.44" DATE="12/02/1997" NAME="George2" VERSION="0,0,0,1"/>