Click here to Skip to main content
15,867,308 members
Articles / Programming Languages / XML
Article

UTF-8 encoded XML file/stream processing

Rate me:
Please Sign up or sign in to vote.
3.58/5 (9 votes)
5 Jan 20072 min read 84.4K   5.9K   31   8
Process an UTF-8 encoded XML file or stream; read group & attribute values; write & delete grps, attribs, values & comments.

Sample Image

Introduction

This DLL provides routines to manipulate UTF-8 encoded XML files. The set provided is not all-singing-and-dancing but a useful, small collection. Several co-operating executables, living off a common UTF-8 encoded XML file, may find their operating parameters and set parameters for others.

Background

Initially, the read functions were implemented to save incorporating the large overhead of using a proprietary interface. From this grew a certain understanding of the mechanism. Then were added write & delete routines; stream routines that allowed the user program to supply & recover the UTF-8 encoded XML data (without using disk files); some super (i.e. over-arching) routines to shrink the user's code.

Using the code

VC 6.0 projects: Place the XM8DLL.dll in a directory on your path variable. Add the library XM8DLL.lib to the project resources. Add the module XM8calls.h to the project. Use the routines therein.

VB 6.0 projects: Register the XM8DLL.dll with regsvr32. Add the module XM8DLL.bas to the project. Use the public routines therein.

C#
//
// Sample source to produce the above file
//
  XM8_newFile("Order");

  XM8_getFrstGroup("Order",0);
  XM8_newAttPutVal("number","1234");

  XM8_pokeNewGrpPutVal("Date","2000/1/1");
  XM8_newGrpPutVal("Customer","Acme < & > \" ' Ltd");
  XM8_newAttPutVal("ID","1234A");

  XM8_getFrstGroup("Order",0);
  XM8_newGroup("ITEM");
  XM8_newGroup("ITEM");

  XM8_getFrstGroup("ITEM",1);
  XM8_newAttPutVal("ID","01");
  XM8_newGrpPutVal("Part-number","E16-25A");
  XM8_newAttPutVal("warehouse","Warehouse11");
  XM8_getFrstGroup("ITEM",1);
  XM8_pokeNewGrpPutVal("Description","Production-Class Widget A");
  XM8_newGrpPutVal("Quantity","16");

  XM8_getLastGroup("ITEM",1);
  XM8_newAttPutVal("ID","02");
  XM8_newGrpPutVal("Part-number","E23-45B");
  XM8_newAttPutVal("warehouse","Warehouse11");
  XM8_getLastGroup("ITEM",1);
  XM8_pokeNewGrpPutVal("Description","Production-Class Widget B");
  XM8_newGrpPutVal("Quantity","12");

  XM8_writeFile(fileName);

Points of Interest

  • Throughout this article, the acronym UTF means UTF-8.
  • Four 'conversion' routines are also supplied. These are not used internally by XM8DLL. The pair XM8_UTFtoUCS, XM8_UCStoUTF. The pair XM8_UTF8toUTF16, XM8_UTF16toUTF8.
  • After installing the relevant character sets on W2K, I managed to reveal the Japanese streams.
  • For C/C++ only users, a static library can be built using workspace & project files provided.
  • The private routines in the XM8DLL.bas module are to get around C/C++ <-> VB differences.
  • The implementation of 'false' (C/C++ 0, VB -1).
  • VB string addresses to C/C++ routines.
  • VB return-string-parameter is handled in the DLL.

History

  • 1.9 Corrections to XMJ_deProfundis.
  • 1.8 XM8_sNew.cpp bug fixed in putThing.
  • 1.7 Encryption using TinyEncryptionAlgorithm (TEA).
    • XM8_crypt_vb.zip - demonstration of TEA applied to XML files.
    • Four encryption routines to implement TEA: XMLteaCryptKey, XMLteaEncrypt, XMLteaEncryptVal and XMLteaDecrypt.
  • 1.6 Default is now 1-4 byte UTF-8, 22 bit UNICODE usage.
    • New routine XM8_fullCODE, revert to 1-6 byte UTF-8, 31 bit usage.
  • 1.5 XM8_sNew.cpp new loop routine XM8_deProfundis.
    • Third VB demo. XLS files to XML files.
  • 1.4 XM8_sNew.cpp bug fixed in XM8_newStream.
  • 1.3 handles <, &, >, " and ' within values; both read & write.
    • XM8DLL.bas bug fixed in XM8_UTF8toUTF16.
  • 1.2 handles group to attribute & attribute to attribute white space.
    • What took 661 mS now takes 231 mS.
  • 1.1 XM8 handles ASCII encoded XML files because they are a sub-set of UTF-8. Therefore, XMJ may be replaced by XM8. Because XM8 works internally in UCS, it is about 30% slower than XMJ. Any observations on the code that might recover this loss will be much appreciated.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
United Kingdom United Kingdom
BSc (St.Andrews(1963-67))
MSCE
Systems Programmer 39+yrs
Married to first wife 35yrs & counting, four grown-up children
Religious opinions similar to MelG's
It is not the gnosis, but the praxis must be the fruit. (Aristotle)

Comments and Discussions

 
QuestionWhere is the file XM8DLL.bas ? Pin
michcron21-May-10 8:09
michcron21-May-10 8:09 
Questionsupporting utf8 in svg files Pin
migano5-Sep-07 6:29
migano5-Sep-07 6:29 
AnswerRe: supporting utf8 in svg files Pin
nidhin&salih23-Jun-11 23:13
nidhin&salih23-Jun-11 23:13 
Question&lt; 22 bits? Pin
Lars-Inge Tønnessen26-Jun-04 9:32
Lars-Inge Tønnessen26-Jun-04 9:32 
AnswerRe: &lt; 22 bits? Pin
Lymington27-Jun-04 9:52
Lymington27-Jun-04 9:52 
AnswerRe: &lt; 22 bits? Pin
Lymington27-Jun-04 10:28
Lymington27-Jun-04 10:28 
AnswerRe: &lt; 22 bits? Pin
Lymington8-Jul-04 4:19
Lymington8-Jul-04 4:19 
GeneralRe: < 22 bits? Pin
nidhin&salih23-Jun-11 23:07
nidhin&salih23-Jun-11 23:07 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.