Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

UTF-8 encoded XML file/stream processing

0.00/5 (No votes)
5 Jan 2007 2  
Process an UTF-8 encoded XML file or stream; read group & attribute values; write & delete grps, attribs, values & comments.

Sample Image

Introduction

This DLL provides routines to manipulate UTF-8 encoded XML files. The set provided is not all-singing-and-dancing but a useful, small collection. Several co-operating executables, living off a common UTF-8 encoded XML file, may find their operating parameters and set parameters for others.

Background

Initially, the read functions were implemented to save incorporating the large overhead of using a proprietary interface. From this grew a certain understanding of the mechanism. Then were added write & delete routines; stream routines that allowed the user program to supply & recover the UTF-8 encoded XML data (without using disk files); some super (i.e. over-arching) routines to shrink the user's code.

Using the code

VC 6.0 projects: Place the XM8DLL.dll in a directory on your path variable. Add the library XM8DLL.lib to the project resources. Add the module XM8calls.h to the project. Use the routines therein.

VB 6.0 projects: Register the XM8DLL.dll with regsvr32. Add the module XM8DLL.bas to the project. Use the public routines therein.

//

// Sample source to produce the above file

//

  XM8_newFile("Order");

  XM8_getFrstGroup("Order",0);
  XM8_newAttPutVal("number","1234");

  XM8_pokeNewGrpPutVal("Date","2000/1/1");
  XM8_newGrpPutVal("Customer","Acme < & > \" ' Ltd");
  XM8_newAttPutVal("ID","1234A");

  XM8_getFrstGroup("Order",0);
  XM8_newGroup("ITEM");
  XM8_newGroup("ITEM");

  XM8_getFrstGroup("ITEM",1);
  XM8_newAttPutVal("ID","01");
  XM8_newGrpPutVal("Part-number","E16-25A");
  XM8_newAttPutVal("warehouse","Warehouse11");
  XM8_getFrstGroup("ITEM",1);
  XM8_pokeNewGrpPutVal("Description","Production-Class Widget A");
  XM8_newGrpPutVal("Quantity","16");

  XM8_getLastGroup("ITEM",1);
  XM8_newAttPutVal("ID","02");
  XM8_newGrpPutVal("Part-number","E23-45B");
  XM8_newAttPutVal("warehouse","Warehouse11");
  XM8_getLastGroup("ITEM",1);
  XM8_pokeNewGrpPutVal("Description","Production-Class Widget B");
  XM8_newGrpPutVal("Quantity","12");

  XM8_writeFile(fileName);

Points of Interest

  • Throughout this article, the acronym UTF means UTF-8.
  • Four 'conversion' routines are also supplied. These are not used internally by XM8DLL. The pair XM8_UTFtoUCS, XM8_UCStoUTF. The pair XM8_UTF8toUTF16, XM8_UTF16toUTF8.
  • After installing the relevant character sets on W2K, I managed to reveal the Japanese streams.
  • For C/C++ only users, a static library can be built using workspace & project files provided.
  • The private routines in the XM8DLL.bas module are to get around C/C++ <-> VB differences.
  • The implementation of 'false' (C/C++ 0, VB -1).
  • VB string addresses to C/C++ routines.
  • VB return-string-parameter is handled in the DLL.

History

  • 1.9 Corrections to XMJ_deProfundis.
  • 1.8 XM8_sNew.cpp bug fixed in putThing.
  • 1.7 Encryption using TinyEncryptionAlgorithm (TEA).
    • XM8_crypt_vb.zip - demonstration of TEA applied to XML files.
    • Four encryption routines to implement TEA: XMLteaCryptKey, XMLteaEncrypt, XMLteaEncryptVal and XMLteaDecrypt.
  • 1.6 Default is now 1-4 byte UTF-8, 22 bit UNICODE usage.
    • New routine XM8_fullCODE, revert to 1-6 byte UTF-8, 31 bit usage.
  • 1.5 XM8_sNew.cpp new loop routine XM8_deProfundis.
    • Third VB demo. XLS files to XML files.
  • 1.4 XM8_sNew.cpp bug fixed in XM8_newStream.
  • 1.3 handles <, &, >, " and ' within values; both read & write.
    • XM8DLL.bas bug fixed in XM8_UTF8toUTF16.
  • 1.2 handles group to attribute & attribute to attribute white space.
    • What took 661 mS now takes 231 mS.
  • 1.1 XM8 handles ASCII encoded XML files because they are a sub-set of UTF-8. Therefore, XMJ may be replaced by XM8. Because XM8 works internally in UCS, it is about 30% slower than XMJ. Any observations on the code that might recover this loss will be much appreciated.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here