UTF-8 encoded XML file/stream processing






3.58/5 (9 votes)
Jun 3, 2004
2 min read

85161

5952
Process an UTF-8 encoded XML file or stream; read group & attribute values; write & delete grps, attribs, values & comments.
- Download demo C/C++ projects(2) - 49.7 Kb
- Download demo VB projects(2) - 52.5 Kb
- Download demo VB project(3rd) - 12.2 Kb
- Download crypt VB projects(2) - 21 Kb
- Download distribution - 40.4 Kb
- Download source - 34.1 Kb
- Download function documentation - 16.5 Kb
Introduction
This DLL provides routines to manipulate UTF-8 encoded XML files. The set provided is not all-singing-and-dancing but a useful, small collection. Several co-operating executables, living off a common UTF-8 encoded XML file, may find their operating parameters and set parameters for others.
Background
Initially, the read functions were implemented to save incorporating the large overhead of using a proprietary interface. From this grew a certain understanding of the mechanism. Then were added write & delete routines; stream routines that allowed the user program to supply & recover the UTF-8 encoded XML data (without using disk files); some super (i.e. over-arching) routines to shrink the user's code.
Using the code
VC 6.0 projects: Place the XM8DLL.dll in a directory on your path variable. Add the library XM8DLL.lib to the project resources. Add the module XM8calls.h to the project. Use the routines therein.
VB 6.0 projects: Register the XM8DLL.dll with regsvr32. Add the module XM8DLL.bas to the project. Use the public routines therein.
//
// Sample source to produce the above file
//
XM8_newFile("Order");
XM8_getFrstGroup("Order",0);
XM8_newAttPutVal("number","1234");
XM8_pokeNewGrpPutVal("Date","2000/1/1");
XM8_newGrpPutVal("Customer","Acme < & > \" ' Ltd");
XM8_newAttPutVal("ID","1234A");
XM8_getFrstGroup("Order",0);
XM8_newGroup("ITEM");
XM8_newGroup("ITEM");
XM8_getFrstGroup("ITEM",1);
XM8_newAttPutVal("ID","01");
XM8_newGrpPutVal("Part-number","E16-25A");
XM8_newAttPutVal("warehouse","Warehouse11");
XM8_getFrstGroup("ITEM",1);
XM8_pokeNewGrpPutVal("Description","Production-Class Widget A");
XM8_newGrpPutVal("Quantity","16");
XM8_getLastGroup("ITEM",1);
XM8_newAttPutVal("ID","02");
XM8_newGrpPutVal("Part-number","E23-45B");
XM8_newAttPutVal("warehouse","Warehouse11");
XM8_getLastGroup("ITEM",1);
XM8_pokeNewGrpPutVal("Description","Production-Class Widget B");
XM8_newGrpPutVal("Quantity","12");
XM8_writeFile(fileName);
Points of Interest
- Throughout this article, the acronym UTF means UTF-8.
- Four 'conversion' routines are also supplied. These are not used internally by XM8DLL. The pair
XM8_UTFtoUCS
,XM8_UCStoUTF
. The pairXM8_UTF8toUTF16
,XM8_UTF16toUTF8
. - After installing the relevant character sets on W2K, I managed to reveal the Japanese streams.
- For C/C++ only users, a static library can be built using workspace & project files provided.
- The private routines in the XM8DLL.bas module are to get around C/C++ <-> VB differences.
- The implementation of 'false' (C/C++ 0, VB -1).
- VB string addresses to C/C++ routines.
- VB return-string-parameter is handled in the DLL.
History
- 1.9 Corrections to
XMJ_deProfundis
. - 1.8 XM8_sNew.cpp bug fixed in
putThing
. - 1.7 Encryption using TinyEncryptionAlgorithm (TEA).
- XM8_crypt_vb.zip - demonstration of TEA applied to XML files.
- Four encryption routines to implement TEA:
XMLteaCryptKey
,XMLteaEncrypt
,XMLteaEncryptVal
andXMLteaDecrypt
.
- 1.6 Default is now 1-4 byte UTF-8, 22 bit UNICODE usage.
- New routine
XM8_fullCODE
, revert to 1-6 byte UTF-8, 31 bit usage.
- New routine
- 1.5 XM8_sNew.cpp new loop routine
XM8_deProfundis
.- Third VB demo. XLS files to XML files.
- 1.4 XM8_sNew.cpp bug fixed in
XM8_newStream
. - 1.3 handles <, &, >, " and ' within values; both read & write.
- XM8DLL.bas bug fixed in
XM8_UTF8toUTF16
.
- XM8DLL.bas bug fixed in
- 1.2 handles group to attribute & attribute to attribute white space.
- What took 661 mS now takes 231 mS.
- 1.1 XM8 handles ASCII encoded XML files because they are a sub-set of UTF-8. Therefore, XMJ may be replaced by XM8. Because XM8 works internally in UCS, it is about 30% slower than XMJ. Any observations on the code that might recover this loss will be much appreciated.