Click here to Skip to main content
Click here to Skip to main content

XML class for processing and building simple XML documents

By , 23 Sep 2003
 

This article has been re-written with the help of 2 years of feedback, and the new source code has benefited from all of the fixes and developments during that time period. See release notes below.

Introduction

Often times you don't want to invest in learning a complex XML tool to implement a little bit of XML processing in your application. Its SO Easy! Just add Markup.cpp and Markup.h to your Visual C++ MFC project, #include "Markup.h", and begin using it. There are no other dependencies.

Features

  • Light: one small class that maintains one single document string with a simple array of indexes
  • Fast: the parser builds the index array in one quick pass
  • Simple: EDOM methods make it ridiculously easy to create or process XML strings
  • Independent: compiles into your program without requiring MSXML or any tokenizer
  • UNICODE: can be compiled for UNICODE for Windows CE and NT/XP platforms (define _UNICODE)
  • UTF-8: when not in UNICODE or MBCS builds, it works with UTF-8, ASCII, or Windows extended sets
  • MBCS: can be compiled for Windows double-byte character sets such as Chinese GB2312 (define _MBCS)

XML for Everyday Data

We often need to store and/or pass information in a file, or send a block of information from computer A to computer B. And the issue is always the same: How shall I format this data? Before XML, you might have considered "env" style e.g. PATH=C:\WIN95; "ini" style (grouped in sections); comma-delimited or otherwise delimited; or fixed character lengths. XML is now the established answer to that question except that programmers are sometimes discouraged by the size and complexity of XML solutions when all they need is something convenient to help parse and format angle brackets. For good minimalist reading on the syntax rules for XML tags, I recommend Beginning XML - Chapter 2: Well-Formed XML posted here on the Code Project.

XML is better because of its flexible and hierarchical nature, plus its wide acceptance. Although XML uses more characters than delimited formats, it compresses down well if needed. The flexibility of XML becomes apparent when you want to expand the types of information your document can contain without requiring every consumer of the information to rewrite processing logic. You can keep the old information identified and ordered the same way it was while adding new attributes and elements.

CMarkup Lite Methods

CMarkup is based on the "Encapsulated" Document Object Model (EDOM), the key to simple XML processing. Its a set of methods for XML processing with the same general purpose as DOM (Document Object Model). But while DOM has numerous types of objects, EDOM defines only one object, the XML document. EDOM harks back to the original attraction of XML which was its simplicity. To keep overhead low, CMarkup takes a very light non-conforming non-validating approach to XML, and it does not verify the XML is well-formed.

The CMarkup "Lite" in this article is the free version of the CMarkup product sold at firstobject.com. CMarkup Lite implements a subset of EDOM methods for creating and parsing XML document strings. The Lite methods also encompass some modification functionality such as setting an attribute or adding additional elements to an existing XML document, but not changing the data of, or removing, XML elements. See the EDOM specification to compare the full CMarkup with CMarkup Lite. The full CMarkup is available in Evaluation (Educational) and licensed Developer versions with many more methods, STL and MSXML versions, Base64, and additional documentation. But this Lite version here at Code Project is more than adequate for parsing and creating simple XML strings in MFC.

The CMarkup Lite methods are grouped into Creation and Navigation categories listed below.

CMarkup Lite Creation Methods

CString GetDoc() const { return m_csDoc; };
bool AddElem( LPCTSTR szName, LPCTSTR szData=NULL );
bool AddChildElem( LPCTSTR szName, LPCTSTR szData=NULL );
bool AddAttrib( LPCTSTR szAttrib, LPCTSTR szValue );
bool AddChildAttrib( LPCTSTR szAttrib, LPCTSTR szValue );
bool SetAttrib( LPCTSTR szAttrib, LPCTSTR szValue );
bool SetChildAttrib( LPCTSTR szAttrib, LPCTSTR szValue );

GetDoc is used to get the document string after adding elements and setting attributes. The AddAttrib and SetAttrib methods do the same thing as each other (as do AddChildAttrib and SetChildAttrib). They will change the attribute's value if it already exists, and add the attribute if it doesn't.

CMarkup Lite Navigation Methods

bool SetDoc( LPCTSTR szDoc );
bool IsWellFormed();
bool FindElem( LPCTSTR szName=NULL );
bool FindChildElem( LPCTSTR szName=NULL );
bool IntoElem();
bool OutOfElem();
void ResetChildPos();
void ResetMainPos();
void ResetPos();
CString GetTagName() const;
CString GetChildTagName() const;
CString GetData() const;
CString GetChildData() const;
CString GetAttrib( LPCTSTR szAttrib ) const;
CString GetChildAttrib( LPCTSTR szAttrib ) const;
CString GetError() const;

When you call SetDoc it parses the szDoc string and populates the CMarkup object. If it fails, it returns false, and you can call GetError for an error description. The IsWellFormed method returns true if the CMarkup object has at least a root element; it does not verify well-formedness.

Using CMarkup

The CMarkup class encapsulates the XML document text, structure, and current positions. It has methods both to add elements and to navigate and get element attributes and data. The locations in the document where operations are performed are governed by the current position and the current child position. This current positioning allows you to work with the XML document without instantiating additional objects that point into the document. At all times, the object maintains a string representing the text of the document which can be retrieved using GetDoc.

Check out the free firstobject XML editor which generates C++ source code for creating and navigating your own XML documents with CMarkup Lite.

Creating an XML Document

To create an XML document, instantiate a CMarkup object and call AddElem to create the root element. At this point, if you called AddElem("ORDER") your document would simply contain the empty ORDER element <ORDER/>. Then call AddChildElem to create elements under the root element (i.e. "inside" the root element, hierarchically speaking). The following example code creates an XML document and retrieves it into a CString:

CMarkup xml;
xml.AddElem( "ORDER" );
xml.AddChildElem( "ITEM" );
xml.IntoElem();
xml.AddChildElem( "SN", "132487A-J" );
xml.AddChildElem( "NAME", "crank casing" );
xml.AddChildElem( "QTY", "1" );
CString csXML = xml.GetDoc();

This code generates the following XML. The root is the ORDER element; notice that its start tag <ORDER> is at the beginning and end tag </ORDER> is at the bottom. When an element is under (i.e. inside or contained by) a parent element, the parent's start tag is before it and the parent's end tag is after it. The ORDER element contains one ITEM element. That ITEM element contains 3 child elements: SN, NAME, and QTY.

<ORDER>
<ITEM>
<SN>132487A-J</SN>
<NAME>crank casing</NAME>
<QTY>1</QTY>
</ITEM>
</ORDER>

As shown in the example, you can create elements under a child element by calling IntoElem to move your current main position to where the current child position is so you can begin adding under what was the child element. CMarkup maintains a current position in order to keep your source code shorter and simpler. This same position logic is used when navigating a document.

Navigating an XML Document

The XML string created in the above example can be parsed into a CMarkup object with the SetDoc method. You can also navigate it right inside the same CMarkup object where it was created; just call ResetPos if you want to reset the current position back to the beginning of the document.

In the following example, after populating the CMarkup object from the csDoc string, we loop through all ITEM elements under the ORDER element and get the serial number and quantity of each item:

CMarkup xml;
xml.SetDoc( csXML );
while ( xml.FindChildElem("ITEM") )
{
    xml.IntoElem();
    xml.FindChildElem( "SN" );
    CString csSN = xml.GetChildData();
    xml.FindChildElem( "QTY" );
    int nQty = atoi( xml.GetChildData() );
    xml.OutOfElem();
}

For each item we find, we call IntoElem before interrogating its child elements, and then OutOfElem afterwards. As you get accustomed to this type of navigation you will know to check in your loops to make sure there is a corresponding OutOfElem call for every IntoElem call.

Adding Elements and Attributes

The above example for creating a document only created one ITEM element. Here is an example that creates multiple items loaded from a previously populated data source, plus a SHIPMENT information element in which one of the elements has an attribute. This code also demonstrates that instead of calling AddChildElem, you can call IntoElem and AddElem. It means more calls, but some people find this more intuitive.

CMarkup xml;
xml.AddElem( "ORDER" );
xml.IntoElem(); // inside ORDER
for ( int nItem=0; nItem<aItems.GetSize(); ++nItem )
{
    xml.AddElem( "ITEM" );
    xml.IntoElem(); // inside ITEM
    xml.AddElem( "SN", aItems[nItem].csSN );
    xml.AddElem( "NAME", aItems[nItem].csName );
    xml.AddElem( "QTY", aItems[nItem].nQty );
    xml.OutOfElem(); // back out to ITEM level
}
xml.AddElem( "SHIPMENT" );
xml.IntoElem(); // inside SHIPMENT
xml.AddElem( "POC" );
xml.SetAttrib( "type", csPOCType );
xml.IntoElem(); // inside POC
xml.AddElem( "NAME", csPOCName );
xml.AddElem( "TEL", csPOCTel );

This code generates the following XML. The root ORDER element contains 2 ITEM elements and a SHIPMENT element. The ITEM elements both contain SN, NAME and QTY elements. The SHIPMENT element contains a POC element which has a type attribute, and NAME and TEL child elements.

<ORDER>
<ITEM>
<SN>132487A-J</SN>
<NAME>crank casing</NAME>
<QTY>1</QTY>
</ITEM>
<ITEM>
<SN>4238764-A</SN>
<NAME>bearing</NAME>
<QTY>15</QTY>
</ITEM>
<SHIPMENT>
<POC type="non-emergency">
<NAME>John Smith</NAME>
<TEL>555-1234</TEL>
</POC>
</SHIPMENT>
</ORDER>

Finding Elements

The FindElem and FindChildElem methods go to the next sibling element. If the optional tag name argument is specified, then they go to the next element with a matching tag name. The element that is found becomes the current element, and the next call to Find will go to the next sibling or matching sibling after that current position.

When you cannot assume the order of the elements, you must reset the position in between calling the Find method. Looking at the ITEM element in the above example, if someone else is creating the XML and you cannot assume the SN element is before the QTY element, then call ResetChildPos() before finding the QTY element.

To find the item with a particular serial number, you can loop through the ITEM elements and compare the SN element data to the serial number you are searching for. This example differs from the original navigation example by calling IntoElem to go into the ORDER element and use FindElem("ITEM") instead of FindChildElem("ITEM"); either way is fine. And notice that by specifying the "ITEM" element tag name in the Find method we ignore all other sibling elements such as the SHIPMENT element.

CMarkup xml;
xml.SetDoc( csXML );
xml.FindElem(); // ORDER element is root
xml.IntoElem(); // inside ORDER
while ( xml.FindElem("ITEM") )
{
    xml.FindChildElem( "SN" );
    if ( xml.GetChildData() == csFindSN )
        break; // found
}

Encodings

ASCII refers to the character codes under 128 that we have come to depend on, programming in English. Conveniently if you are only using ASCII, UTF-8 encoding is the same as your common ASCII set.

If you are using a character set not corresponding to one of the Unicode sets UTF-8, UTF-16 or UCS-2, you really should declare it in your XML declaration for the sake of interoperability and viewing it properly in Internet Explorer. Character sets like ISO-8859-1 (Western European) assign characters to the values in a byte between 128 and 255, so that every character still only uses one byte. Windows double-byte character sets such as GB2312, Shift_JIS and EUC-KR use one or two bytes per character. For these Windows charsets, put _MBCS in your preprocessor definitions and make sure your user's Operating System is set to the corresponding code page.

To prefix your XML document with an XML declaration such as <?xml version="1.0" encoding="ISO-8859-1"?>, pass it to SetDoc or the CMarkup constructor. Include a CRLF at the end as shown so that the root element goes on the next line.

xml.SetDoc( "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\r\n" );
xml.AddElem( "island", "Curaçao" );

Depth First Traversal

You can use the following code to loop through every element in your XML document. In the part of the code where you process the element, every element in the document (except the root element) will be encountered in depth first order. For illustrative purposes, it gets the tag name of the element. If you were searching for a particular element tag name you could break out of the loop at this point. "Depth first" means that it traverses all of an element's children before going to its sibling.

BOOL bFinished = FALSE;
xml.ResetPos();
if ( ! xml.FindChildElem() )
    bFinished = TRUE;
while ( ! bFinished )
{
    // Process element
    xml.IntoElem();
    CString csTag = xml.GetTagName();

    // Next element (depth first)
    BOOL bFound = xml.FindChildElem();
    while ( ! bFound && ! bFinished )
    {
        if ( xml.OutOfElem() )
            bFound = xml.FindChildElem();
        else
            bFinished = TRUE;
    }
}

Loading and Saving Files

CMarkup Lite does not have Load and Save methods. To load a file, look in the CMarkupDlg::OnButtonParse method which loads a file into a string. Once you have it in a string, you can put it into the CMarkup object using SetDoc. To save it to a file, call GetDoc to get the string and then implement your own code to write the string to your file. When you need to implement any of your own project specific I/O error handling, streaming, permissions/locking, and charset conversion, it is actually good software design to keep this outside of the CMarkup class allowing CMarkup to remain a generic class.

The Test Dialog

The Markup.exe test bed for CMarkup is a Visual Studio 6.0 MFC project (also compiles in VS .NET too). When the dialog starts, it performs diagnostics in the RunTest function to test CMarkup in the context of the particular build options that have been selected. You can step through the RunTest function to see a lot of examples of how to use CMarkup. Use the Open and Parse button in the dialog to test a file.

In the following illustration, the Build Version is shown as "CMarkup Lite 6.5 Debug Unicode." This means that it is the debug version built with _UNICODE defined. The RunTest completed successfully. A parse error was encountered in the order_e.xml file. It also shows the load and parse times, and file size.

The Test Dialog keeps track of the last file parsed and the dialog screen position for convenience. This is kept in the registry under HKEY_CURRENT_USER/ Software/ First Objective Software/ Markup/ Settings.

How CMarkup Works

The CMarkup strategy is to leave the data in the document string and maintain a hierarchical arrangement of indexes mapping out the document.

  • increase speed: parse in one pass and maintain hierarchy of indexes
  • reduce overhead: do not copy or break up the text of the document

CMarkup parses the 250k play.xml sample document in about 40 milliseconds (1/25th of a second) on a 500Mhz machine, holding it as a single string, and allocating about 200k for a map of the 6343 elements. From then on, navigation does not require any parsing. As a rule of thumb, the map of indexes takes up approximately the same amount of memory as the document, so the memory footprint of the CMarkup object should settle down around 2 times the size of the document. For each element in the document a struct of eight integers (32 bytes) is maintained.

int nStartL;
int nStartR;
int nEndL;
int nEndR;
int nReserved;
int iElemParent;
int iElemChild;
int iElemNext;

Look at the start and end tags in <QTY>1</QTY>. The struct contains the offsets of the left and right of both the start and end tags (i.e. all the < and > signs). The reserved integer is not currently used but could be used for a delete flag and/or level (i.e. depth) in the hierarchy to support indentation. The other three integers are indexes to the structs for the parent, child and next elements.

When the document is first parsed an array of these structs is built, and then as elements are modified and inserted in the XML, the structs are modified and added. Rather than allocating structs individually, they are allocated in an array using a "grow-by" mechanism to reduce the number of allocations to a handful. That is why integer array indexes rather than pointers are used for the links. Once an element is assigned an index in the array, that index does not change. So the index can be used as a way of referring to and locating an element

Release Notes

This release 6.5 of CMarkup Lite's public methods are backwards compatible with the previous release 6.1 posted here in August 2001 except for one rare usage of IntoElem. In 6.1, if you called IntoElem without a current child element, it would find the first child element. Now in 6.5 when there is no current child position, IntoElem puts the main position before the first child element so that a subsequent call to FindElem will not bypass the first element. So, the quick way to check this when upgrading is to scan all occurrences of IntoElem and make sure the previous CMarkup navigation call is FindChildElem before it. Or, if the child element was just created with AddChildElem then its okay because that sets the current child position too. For full details on this, see the IntoElem Changes in Release 6.3.

Other major changes since 6.1:

  • Fix: MBCS double-byte text x_TextToDoc *thanks knight_zhuge
  • Performance: parsing is roughly twice as fast
  • Debugging: see m_pMainDS and m_pChildDS class members while debugging to see string pointers showing current main and child positions
  • New Test Dialog interface with diagnostic results and load vs. parse times, and RunTest code for startup

License

CMarkup Lite is free for compiling into your commercial, personal and educational applications. Modify it as much as you like, but retain the copyright notice in the source code remarks. Redistribution of the modified or unmodified CMarkup Lite class source code is limited to your own development team and it cannot be made publicly available or distributable as part of any source code library or product, even if that offering is free. For source code products that derive from or utilize CMarkup Lite, please refer users to this article to obtain the source files for themselves. You are encouraged to discuss this source code and share enhancements here in the discussion board under this article. Enjoy!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Ben Bryant
United States United States
Member
Raised in Southern Ontario Canada. Bachelor of Science from the University of Toronto in Computer Science and Anthropology. Living near Washington D.C. in Virginia, USA.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionHow to delete the data from the XML filememberMember 927486425 Jul '12 - 23:08 
I have created an xml file for profiling different users using CMarkup class ,so can you help me in writing the code to delete a particular user details.
QuestionIs CMarkup 64bit safe?memberDeepT24 May '12 - 11:03 
I compiled this for 64bit on VS 2008 and got these warnings:
 
\Markup.cpp(338) : warning C4244: '=' : conversion from '__int64' to 'int', possible loss of data
1>.\Markup.cpp(404) : warning C4267: '+=' : conversion from 'size_t' to 'int', possible loss of data
1>.\Markup.cpp(535) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
1>.\Markup.cpp(711) : warning C4267: 'initializing' : conversion from 'size_t' to 'int', possible loss of data
1>.\Markup.cpp(729) : warning C4267: '+=' : conversion from 'size_t' to 'int', possible loss of data
1>.\Markup.cpp(734) : warning C4267: '+=' : conversion from 'size_t' to 'int', possible loss of data
1>.\Markup.cpp(787) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
1>.\Markup.cpp(1035) : warning C4267: 'initializing' : conversion from 'size_t' to 'int', possible loss of data
1>.\Markup.cpp(1036) : warning C4267: 'initializing' : conversion from 'size_t' to 'int', possible loss of data
 
I looked at some of them and they seemed like they would be safe as long as you didn't have a XML file over 4 gigs. I am not certain it is safe though.
AnswerRe: Is CMarkup 64bit safe?memberBen Bryant25 May '12 - 2:19 
yes, but actually 2 gigs is the limit because of the signed int. I doubt there are 64-bit issues. More recent versions have been tested for 64-bit and I have only added casts and tweaks mostly to get rid of warnings since I wanted to keep the 32-bit integers and not double the memory consumption of the indexes.
Questionhow can i load xml from web using this class??memberlashalasha5 Feb '11 - 12:39 
how can i load xml from web using this class?? for examlple www.myweb.com/myxml.xml D'Oh! | :doh: D'Oh! | :doh:
AnswerRe: how can i load xml from web using this class??memberBen Bryant5 Feb '11 - 14:43 
CMarkup will not load XML from the web. You would need to use something else to HTTP GET the XML and then put it into CMarkup using SetDoc. Unfortunately there is no convenient tool to do this. On Windows you can use MSXML.
GeneralWon't compile with VS2008memberWitte24 Oct '10 - 10:34 
\markup.cpp(725) : error C2440: '=' : cannot convert from 'const wchar_t *' to '_TCHAR *'
 

Any suggestions?
 
Thanks
GeneralRe: Won't compile with VS2008memberBen Bryant24 Oct '10 - 16:47 
thanks for your post; please fix the declaration (9 lines above) to read:
const _TCHAR* pFound;
Generalseveral other xml parsers (informative)membermaplewang20 Oct '10 - 22:01 
for non-commercial CMarkup seems to be a good choice too.
 
XML: Include a Flexible Parser in Your C++ Applications[^]
 
tinyxml [^]
 
http://pugixml.org/documentation/[^]
 
XmlBind: putting PugXML on steroïds ![^]
 
XMLLib for PUGXML with XPath[^]
 
PugXML - A Small, Pugnacious XML Parser[^]
GeneralCannot get Value of Elemnts containing SubelementsmemberTHaala29 Sep '10 - 0:33 
really usable....
 
The following context seems to be allowed and happens inside xml files
i was scanning with your class. The Value of "crafttype" B7375 seems unreachable.
Is there a way to get it ?
          <crafttype>
            B7375<type>
              <length>3101</length>
              <....>
            </type>
          </crafttype>
 
thank you in advance..
Knoepfle

GeneralRe: Cannot get Value of Elemnts containing SubelementsmemberBen Bryant29 Sep '10 - 3:23 
CMarkup *Lite* does not have the node methods to get to it. Yes, "mixed content" is allowed, though not recommended for this purpose, but I understand if you have to scan a file containing it, then you are stuck.
GeneralRe: Cannot get Value of Elemnts containing SubelementsmemberTHaala29 Sep '10 - 22:52 
i understand.
 
does your licensed *standard* edition solve this ?
Knoepfle

GeneralRe: Cannot get Value of Elemnts containing SubelementsmemberBen Bryant29 Sep '10 - 23:12 
Yes even the eval/free for non-commercial -- email info at firstobject for what the code would look like for your example.
QuestionCMarkup Licensingmemberwill626210 Sep '10 - 9:48 
Hello Mr. Bryant. My company has incorporated your CMarkup class (which we downloaded from this website) into one of our products.
We would like to purchase the licensed version. Is the licensed version called "CMarkup Developers License" and is the website www.firstobject.com a valid location to obtain the same code that we got from CodeProject?
 
I just want to make sure we get your code, which has worked well for us.
 
Bill Patience
AnswerRe: CMarkup LicensingmemberBen Bryant10 Sep '10 - 10:15 
Yes.
GeneralMy vote of 2member4you28 Aug '10 - 5:24 
I don't know a class which only can create xml what use for ? ad ?
GeneralProblem in VS 2008 (MFC App)memberMember 360833023 Jul '10 - 21:15 
I only made Object (CMarkup xmlSettings;)
 
in debug I had this message
 

1>test1Dlg.obj : error LNK2019: unresolved external symbol "public: bool __thiscall CMarkup::SetDoc(wchar_t const *)" (?SetDoc@CMarkup@@QAE_NPB_W@Z) referenced in function "public: __thiscall CMarkup::CMarkup(void)" (??0CMarkup@@QAE@XZ)
1>test1Dlg.obj : error LNK2019: unresolved external symbol "protected: void __thiscall CMarkup::x_InitMarkup(void)" (?x_InitMarkup@CMarkup@@IAEXXZ) referenced in function "public: __thiscall CMarkup::CMarkup(void)" (??0CMarkup@@QAE@XZ)
1>test1Dlg.obj : error LNK2019: unresolved external symbol "public: __thiscall CMarkup::~CMarkup(void)" (??1CMarkup@@QAE@XZ) referenced in function "public: virtual __thiscall CAboutDlg::~CAboutDlg(void)" (??1CAboutDlg@@UAE@XZ)
1>C:\Users\Mahdi\Documents\Visual Studio 2008\Projects\test1\Debug\test1.exe : fatal error LNK1120: 3 unresolved externals
 

what my mistake ?
GeneralRe: Problem in VS 2008 (MFC App)memberBen Bryant24 Jul '10 - 3:44 
It looks like you have not added Markup.cpp to your project. You have probably included the header correctly, but not added the source files to your project.
QuestionCan you implement GetParentElem method?memberYinBoChao1 Apr '10 - 5:21 
How can I get parent of current node?
Plesase help me,thanks!
AnswerRe: Can you implement GetParentElem method?memberBen Bryant1 Apr '10 - 5:59 
if you are at element CHILD and you want to go to the PARENT element that contains it use OutOfElem:
 
str sTagName = xml.GetTagName(); // "CHILD"
if (xml.OutOfElem())
{
  sTagName = xml.GetTagName(); // "PARENT"
  xml.IntoElem();
}
 
I put an if statement around the OutOfElem because if you are already at the root element it will return false.
GeneralRe: Can you implement GetParentElem method?memberYinBoChao1 Apr '10 - 14:38 
thanks for your answer.
 
But I still have a problem.
When I am traveling all the childs of the node,when found a child ,I need get its parent tagname,
if I use your method,but when I come back to child level,I cannot position the previous position.still get the first child,this will goto a loop.
For example,
 
Parent
--child1
--child2
...
--childn.
 
when i found child1,i need to get "Parent",and then to find child2(alse need to get "Parent"),and so on.
 
Could you please give me a good idea?thanks.
GeneralRe: Can you implement GetParentElem method?memberBen Bryant1 Apr '10 - 15:54 
Actually I considered that. The OutOfElem method recalls the child element as the current child position, so that when you go back IntoElem, you are at the one you went out of, not back at the first child. There is also an alternative way of doing the looping you described where you keep the parent position and loop through the children with FindChildElem.
GeneralRe: Can you implement GetParentElem method?memberYinBoChao4 Apr '10 - 5:28 
Thanks.
But By my debug,I found that I shold add xml.FindElem(XX) after xml.IntroElem();
if without this code line, cannot position the right pos which i went out of.
GeneralRe: Can you implement GetParentElem method?memberBen Bryant4 Apr '10 - 6:03 
Try GetChildTagName after the OutOfElem and before your IntoElem. It should show you that the child position is maintained. FindElem([NULL]) will take you to the next sibling.
GeneralRe: Can you implement GetParentElem method?memberYinBoChao4 Apr '10 - 5:32 
Hi,Sir,Could you implement a method which can get all attribute of the node?
return the attribute list.Thus if we donnot know the name of attribute we still can get its attribute.
Questionusing xQuery with xmlmembersairfan18 Feb '10 - 19:15 
im developing a web site where i need to use xquery, but its code is printed as its in the browser. can you advise what is the method of writing xquery code.
thanks in advance.
Questionhow to get xml version and encoding?memberwheregone17 Jan '10 - 20:55 
thanks for the good class.
 
Just wonder how to get xml version and encoding?
especially encoding, sometimes we need to know it's encoding.
 
Hardware-OS-Software
===== Bridge =======

AnswerRe: how to get xml version and encoding?memberBen Bryant17 Jan '10 - 22:16 
this version of CMarkup doesn't help with looking at the XML declaration node.
Generalexcellent, thanxmemberadelezy3 Dec '09 - 20:33 
really handy codes, very useful
thank you a lot
QuestionSeveral questions related to the compilation fatal error C1189: #error : WINDOWS.H already included.memberab5zn5 Oct '09 - 13:09 
Today I attempted to use CMarkup Lite in a project at work and ran into the WINDOWS.H already included error, plus the following error on Markup.cpp: error C2440: '=' : cannot convert from 'const wchar_t *' to '_TCHAR *'
 
I understand now from the questions and answers of other folks that CMarkup Lite requires MFC. Now I have a few questions:
 
1. I am using Visual Studio 2008. Where in VS do I change my application to an MFC app?
2. What are the consequences of changing my app to an MFC app?
3. If my team leader does not approve of my changing the app to an MFC app, is there something simple that can be done in the code to make it non-MFC?
4. If it turns out that I cannot use CMarkup Lite, what is a good, free-source C/C++ alternative to CMarkup Lite?
 
Thank you!
GeneralError in vs2005memberdimoni6914 Sep '09 - 0:09 
Hello:
 
I have converted a project that uses cmarkup 6.5 lite from vs2003 to vs2005 and it has some errors:
 
Error 9 error C2440: '=' : no se puede realizar la conversión de 'const char *' a '_TCHAR *' e:\etraNET\SAEProduct\Duplo2009\src\DUPLOSERVER2\Markup.cpp 766
 
In the following line:
 
pFound=_tcschr(pFind,cSource)
 

Could someone help me?
 
thanks.
GeneralRe: Error in vs2005memberBen Bryant14 Sep '09 - 1:43 
I wonder if it is a modified version of Markup.cpp because when I go to 766 it is very different. Maybe someone has introduced a const char* or LPCSTR pFound or pFind variable that was not discovered during a non-UNICODE compile. Use type LPCTSTR (with a T) instead.
GeneralRe: Error in vs2005memberBen Bryant25 Oct '10 - 6:22 
see this link to add const to declaration[^]
QuestionHow to add CDATA sectionsmemberChristian Frenz30 Jun '09 - 3:07 
Hi,
 
is there any possibility to add CDATA sections with this really great class?
 

Ciao
Christian
AnswerRe: How to add CDATA sectionsmemberBen Bryant30 Jun '09 - 5:09 
This version CMarkup Lite 6.0 does not have a way to add CDATA sections. It does retrieve simple CDATA Section data inside of elements with the GetData method, but even there the CData Section handling is incomplete such as with split CData sections due to an interior ]]> and CData Sections combined with text nodes. See:
http://www.firstobject.com/dn_markcdatasections.htm[^]
AnswerRe: How to add CDATA sectionsmemberadelezy3 Dec '09 - 20:35 
well, i got the 6.5 release, CDATA sections supported
GeneralBug...memberJack1225 Nov '08 - 20:19 
hello .
i think First depth traverse code here has some bug.
 
it only traverses the node one time..
 
i mean
 




 
There are many
 
Basically in the traverse code, the last 'outofelement()' function always returns FALSE.
 
So it gets out of the while loop. as a result, it didn't work for me.
GeneralRe: Bug...memberBen Bryant26 Nov '08 - 0:43 
<videos>
  <video>  </video>
  <video>  </video>
</videos>
 
Once you are inside the videos element, just plain FindElem() should work to loop through the video elements. Now, going back to your other example:
 
<videos>
  <video videofilename="hh.avi"
    extension="avi" title="hh.avi">
    <genre>drama</genre>
    <rating>4</rating>
    <user_rating>5</user_rating>
    <summary>This is totally a blast!</summary>
    <details>The setting : The medieval</details>
    <year>1993</year>
    <director>Jack Nicolson</director>
    <studio>Paramount</studio>
    <runtime>128</runtime>
  </video>
  <video videofilename="test.ppm"
    extension="ppm" title="test.ppm">
    <genre>drama</genre>
    <rating>4</rating>
    <user_rating>5</user_rating>
    <summary>This is totally a blast!</summary>
    <details>The setting : The medieval</details>
    <year>1993</year>
    <director>Jack Nicolson</director>
    <studio>Paramount</studio>
    <runtime>128</runtime>
  </video>
</videos>
 
Here is an example that also loops through the child elements of the video elements grabbing tag names such as genre and rating plus element values.
 
xml.ResetPos();
xml.FindElem(); // videos
xml.IntoElem();
while ( xml.FindElem(_T("video")) )
{
  xml.IntoElem();
  // utilize child elements of video
  while ( xml.FindElem() )
  {
    strName = xml.GetTagName();
    strValue = xml.GetData();
  }
  xml.OutOfElem();
}

GeneralRe: Bug...memberJack1226 Nov '08 - 13:38 
I see..
 
Thank you and
 
I will try as you posted and find what's wrong with my code..
 
Have a great day ~
QuestionHow can I simply traverse the sibling nodes?memberJack1225 Nov '08 - 19:53 
Hello .
 
I don't know why it's not working.
 
I have been working on this for a few days... 24 hours...
 
it seemed to work.. it s not working again.
 
Could you guys please help me?
 
I want to traverse the sibling nodes of
AnswerRe: How can I simply traverse the sibling nodes?memberBen Bryant26 Nov '08 - 0:29 
Your tag name in the document is not capitalized, but you've specified a capitalized tag name to FindElem.
On CodeProject forums, use the Preview feature to ensure your XML is visible and not confused for HTML tags. I think I recovered your document somewhat:
 
<videos>
  <video videofilename="C:\Documents= and Settings\Owner\Desktop\CartoonMaker\hh.avi" extension="avi" title="C:\Documents= and Settings\Owner\Desktop\CartoonMaker\hh.avi">
    <genre>drama</genre>
    <rating>4</rating>
    <user_rating>5</user_rating>
    <summary>This is totally a blast!</summary>
    <details>The setting : The medieval</details>
    <year>1993</year>
    <director>Jack Nicolson</director>
    <studio>Paramount</studio>
    <runtime>128</runtime>
  </video>
  <video videofilename="C:\Documents= and Settings\Owner\Desktop\CartoonMaker\test.ppm" extension="ppm" title="C:\Documents= and Settings\Owner\Desktop\CartoonMaker\test.ppm">
    <genre>drama</genre>
    <rating>4</rating>
    <user_rating>5</user_rating>
    <summary>This is totally a blast!</summary>
    <details>The setting : The medieval</details>
    <year>1993</year>
    <director>Jack Nicolson</director>
    <studio>Paramount</studio>
    <runtime>128</runtime>
  </video>
</videos>

GeneralRe: How can I simply traverse the sibling nodes?memberJack1226 Nov '08 - 13:36 
Thank you very much.
 
Smile | :)
 
Have a great day. I think it will work this time.
Generalproblems with binary contentmemberiquestor15 Aug '08 - 5:35 
Hi ben, great code, thanks. Smile | :)
 
My issue is that i am parsing an XML document that was exported with an ADO recordset in .NET. It contains binary content. If I parse it with the sample code, I get the error : "Error converting file to string (may contain binary data)". But, if I open the file in notepad, save it, then re run the demo, it works. This is because notepad ignores the binary stuff and then doesnt save it.
 
Question - is there a programmatic way of handling this??
 
thanks!!
 

Bob Meads
GeneralRe: problems with binary contentmemberBen Bryant15 Aug '08 - 8:19 
Thanks for your question. CMarkup does not produce an error like "Error converting file to string (may contain binary data)" so I am not sure what you are referring to when you say "parse it with the sample code." Which sample code are you referring to? Best wishes. Ben
GeneralRe: problems with binary contentmemberiquestor15 Aug '08 - 8:59 
Ben
 
Thanks for your reply. I will try to be more clear.
 
I am referring to this page:
 
http://www.codeproject.com/KB/cpp/markupclass.aspx?msg=2680465#xx2680465xx
 
the sample code was from the link at the top of the page Download release 6.5 lite source with exe - 326 Kb
 
from this link, I downloaded the markupclass_demo.zip file, which included the vc6 project. When I compiled and ran the project, i got a dialog that says 'CMarkUp Lite' as the window heading. It allows you to browse a file and then press a button that says 'parse'.
 
When i do this, the dialog reports the error message : "Error converting file to string (may contain binary data)"
 
in the code on the parse button I see where it fails. Here is a snippet:
 
// If it is too short, assume it got truncated due to non-text content
	if ( csText.GetLength() < nFileLen / 2 - 20 )
	{
		OutputParseResults( _T("Error converting file to string (may contain binary data)") );
		return;
	}
 

 
Does this help?
 
thanks!!
 
Bob
GeneralRe: problems with binary contentmemberBen Bryant15 Aug '08 - 9:27 
Oh you are absolutely right, the demo project does generate this message in MarkupDlg.cpp. I have to assume the file somehow has a null in it or there is an encoding issue; there shouldn't be any unencoded binary in it since XML is supposed to be text. There's no quick fix I can think of in how it is loading the file into a string and without knowing exactly the reason I'd be guessing at a solution. If it is not sensitive you can send the problem XML file to me at info at firstobject.com to have a look and tell you exactly what the issue is. Thanks,
Ben
GeneralThe problem with chinese tags.memberVincent-Lin9 Aug '08 - 22:20 
Dear Ben Bryant:
 
Thanks for this wonderful Xml Paser.
 
When I used it on my projects, I found a problem that is if I using the Chinese tag name I would not be able
to use the FindElem function.  

if ( !xml.FindElem("中文") ) --> always return false; But not when I using English tag name.
     return false;
     
below is my xml file.,
 
<?xml version="1.0" encoding="big5"?>
<中文>
</中文>
 
P.S. I also tried the UTF-8 encording mode, and having the same problem.
 
I don't know if any solutions, or it is a bug.
Please help me, thank.
GeneralRe: The problem with chinese tags.memberBen Bryant10 Aug '08 - 7:28 
Thanks for the message. For the benefit of others, I'll emphasize this is an issue with using system locale charset build, not a wide char UNICODE build.
 
The problem in this case is *likely* a character encoding issue with all text in the document, rather than an issue only with non-ASCII tags (let me know if I'm wrong).
 
A couple of questions here are:
 
1. The encoding that your C++ source code file is saved in -- in Visual Studio it will likely be treated as the system locale language for non-Unicode programs which could be gb2312 and not big5.
 
2. The encoding that your XML file is actually saved in which depending on your editor there is a chance it might not be big5 even if the XML declaration says so (if it looks fine in IE then it is correct big5).
 
If you have a file in different encoding that the system locale charset, you need to do the conversion when loading and saving using MultiByteToWideChar and WideCHarToMultiByite.
 
You can compile your project without MBCS defined to have CMarkup treat the document as Unicode UTF-8. But that still requires the file to be converted into UTF-8 when loaded.
 
What is your compiler version and system locale charset (big5, gb2312)?
GeneralRe: The problem with chinese tags.memberVincent-Lin10 Aug '08 - 21:31 
Thanks for the helping.
 
The problem only happened when I added the non-ASCII tags.
I readed the non-ASCII data properly.
 
listig the infomation you needed below.
 
1. The XML file was saved in UTF-8, I also tried the BIG5 to testing in the different encoding.
 
2. I use the firstObject XML editor to create the xml files, and made the encoding mode to UTF-8.
I was use the MultiByteToWideChar and WideCHarToMultiByite.
 
I list my code for you, I don't know if any things wrong.
		xmlText = (LPCSTR)pBuffer;
		
		const   char   *pchar = xmlText; 
 
		int nStrLen = strlen(pchar);
		
		wchar_t* pwchar = new wchar_t[nStrLen + 1];  
		
		int nLen = MultiByteToWideChar( CP_UTF8, 0, pchar, -1, pwchar, nStrLen+1);  
		
		LPSTR   lpsz   =   new   CHAR[nLen+1];  
		
		nLen   =   WideCharToMultiByte(   CP_ACP,   0,   pwchar,   -1,   lpsz,   nLen+1,   NULL,   NULL   );  
		
		xmlText = lpsz;
 
My compiler is VC6+SP6, and system locale charset is BIG5.
 

A big advanced thank to you. Smile | :)
AnswerRe: The problem with chinese tags.memberBen Bryant13 Aug '08 - 22:24 
I assume your conversion is working if non-ASCII data is coming through correctly. There may still be a source code file encoding issue.
 
If you can trace into the FindElem and then TokenPos::Match function and look at the bytes of &szDoc[nL] and szName it should help show the problem. If the bytes are different, you can post or send them and I will tell you what the encoding issue is.
 
Or just double check the byte values of the string being passed to FindElem.
 
Thanks!
GeneralRe: The problem with chinese tags.memberVincent-Lin18 Aug '08 - 20:03 
Dear Ben:
 
Thank you for your time.
I already checked the string that transited, and I follow your advises to look the Match function and szDoc. I don't know what's wrong, but I had the wrong iPos values when I used the non-ASCII data. I also found that some error char in my string, so I fixed it. However, this issue still happening.
 
I will keep looking if I can find more informations.
 
Any way, appreciate your help.
 
Regards,

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130516.1 | Last Updated 24 Sep 2003
Article Copyright 2001 by Ben Bryant
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid