Click here to Skip to main content
13,301,619 members (67,754 online)
Click here to Skip to main content
Add your own
alternative version


31 bookmarked
Posted 2 Jun 2004

UTF-8 encoded XML file/stream processing

, 5 Jan 2007
Rate this:
Please Sign up or sign in to vote.
Process an UTF-8 encoded XML file or stream; read group & attribute values; write & delete grps, attribs, values & comments.

Sample Image


This DLL provides routines to manipulate UTF-8 encoded XML files. The set provided is not all-singing-and-dancing but a useful, small collection. Several co-operating executables, living off a common UTF-8 encoded XML file, may find their operating parameters and set parameters for others.


Initially, the read functions were implemented to save incorporating the large overhead of using a proprietary interface. From this grew a certain understanding of the mechanism. Then were added write & delete routines; stream routines that allowed the user program to supply & recover the UTF-8 encoded XML data (without using disk files); some super (i.e. over-arching) routines to shrink the user's code.

Using the code

VC 6.0 projects: Place the XM8DLL.dll in a directory on your path variable. Add the library XM8DLL.lib to the project resources. Add the module XM8calls.h to the project. Use the routines therein.

VB 6.0 projects: Register the XM8DLL.dll with regsvr32. Add the module XM8DLL.bas to the project. Use the public routines therein.

// Sample source to produce the above file


  XM8_newGrpPutVal("Customer","Acme < & > \" ' Ltd");


  XM8_pokeNewGrpPutVal("Description","Production-Class Widget A");

  XM8_pokeNewGrpPutVal("Description","Production-Class Widget B");


Points of Interest

  • Throughout this article, the acronym UTF means UTF-8.
  • Four 'conversion' routines are also supplied. These are not used internally by XM8DLL. The pair XM8_UTFtoUCS, XM8_UCStoUTF. The pair XM8_UTF8toUTF16, XM8_UTF16toUTF8.
  • After installing the relevant character sets on W2K, I managed to reveal the Japanese streams.
  • For C/C++ only users, a static library can be built using workspace & project files provided.
  • The private routines in the XM8DLL.bas module are to get around C/C++ <-> VB differences.
  • The implementation of 'false' (C/C++ 0, VB -1).
  • VB string addresses to C/C++ routines.
  • VB return-string-parameter is handled in the DLL.


  • 1.9 Corrections to XMJ_deProfundis.
  • 1.8 XM8_sNew.cpp bug fixed in putThing.
  • 1.7 Encryption using TinyEncryptionAlgorithm (TEA).
    • - demonstration of TEA applied to XML files.
    • Four encryption routines to implement TEA: XMLteaCryptKey, XMLteaEncrypt, XMLteaEncryptVal and XMLteaDecrypt.
  • 1.6 Default is now 1-4 byte UTF-8, 22 bit UNICODE usage.
    • New routine XM8_fullCODE, revert to 1-6 byte UTF-8, 31 bit usage.
  • 1.5 XM8_sNew.cpp new loop routine XM8_deProfundis.
    • Third VB demo. XLS files to XML files.
  • 1.4 XM8_sNew.cpp bug fixed in XM8_newStream.
  • 1.3 handles <, &, >, " and ' within values; both read & write.
    • XM8DLL.bas bug fixed in XM8_UTF8toUTF16.
  • 1.2 handles group to attribute & attribute to attribute white space.
    • What took 661 mS now takes 231 mS.
  • 1.1 XM8 handles ASCII encoded XML files because they are a sub-set of UTF-8. Therefore, XMJ may be replaced by XM8. Because XM8 works internally in UCS, it is about 30% slower than XMJ. Any observations on the code that might recover this loss will be much appreciated.


This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


About the Author

Web Developer
United Kingdom United Kingdom
BSc (St.Andrews(1963-67))
Systems Programmer 39+yrs
Married to first wife 35yrs & counting, four grown-up children
Religious opinions similar to MelG's
It is not the gnosis, but the praxis must be the fruit. (Aristotle)

You may also be interested in...


Comments and Discussions

QuestionWhere is the file XM8DLL.bas ? Pin
michcron21-May-10 9:09
membermichcron21-May-10 9:09 
Questionsupporting utf8 in svg files Pin
migano5-Sep-07 7:29
membermigano5-Sep-07 7:29 
AnswerRe: supporting utf8 in svg files Pin
nidhin&salih24-Jun-11 0:13
membernidhin&salih24-Jun-11 0:13 
Question&lt; 22 bits? Pin
Lars-Inge Tønnessen26-Jun-04 10:32
memberLars-Inge Tønnessen26-Jun-04 10:32 
AnswerRe: &lt; 22 bits? Pin
Lymington27-Jun-04 10:52
memberLymington27-Jun-04 10:52 
AnswerRe: &lt; 22 bits? Pin
Lymington27-Jun-04 11:28
memberLymington27-Jun-04 11:28 
AnswerRe: &lt; 22 bits? Pin
Lymington8-Jul-04 5:19
memberLymington8-Jul-04 5:19 
GeneralRe: < 22 bits? Pin
nidhin&salih24-Jun-11 0:07
membernidhin&salih24-Jun-11 0:07 
In the module X M 8 _ U T F s b s . c p p

Modification one of two. Poke tongue | ;-P
Last three lines in the routine U T F t o U C S

{ Big Grin | :-D
byte bC; long lC = lChar;}
*lChar = lC;

Thumbs Down | :thumbsdown: if (lC >= 0x00110000) return false;//comform to rfc3629
Sigh | :sigh:
return true;

modification two of two
First three lines in the routiSuspicious | :suss: ne U C S t o U T F
{ Cry | :((
byte bC; long lC = lChar;

*lChar = lC;

return true;
if (lC &amp;lt; 0) { sprintf(L.err,"Invalid UCS train"); return false; }

if (lC < 0 || lCMad | :mad: >= 0x00110000) return false;//comform to rfc3629
(the sprintf is redundant, and IThumbs Up | :thumbsup: will remove it sometime) Sleepy | :zzz:

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.171207.1 | Last Updated 5 Jan 2007
Article Copyright 2004 by Lymington
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid