|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
SummaryMany readers of The Code Project are familiar with various types of XML parsers in the .NET environment. This article series introduces a new XML processing model called VTD-XML to The Code Project community. It goes significantly beyond those traditional models by fundamentally overcoming many tough technical challenges hampering SOA and enterprise XML application development. The first part of this series demonstrates the benefits of VTD-XML as a parser with integrated XPath and as an indexer. The second part shows you how to benefit from VTD-XML's cutting, editing and modifying capabilities, as well as introduces the concept of "document-centric" XML processing. The third part of this series shows you how to code your application in a C version of VTD-XML. IntroductionVTD-XML is a suite of open-source XML processing technologies centered around a "non-extractive" XML processing technique called "Virtual Token Descriptor." It is cross-platform and available in C#, C and Java. The latest version is 2.2, which can be downloaded here. Depending on the perspective, VTD-XML can be viewed as one of the following:
Digging Into "Non-Extractive" ParsingLet me quickly go over some of the new definitions introduced in the section above. "Non-extractive" parsing means the XML text is kept intact in memory and un-decoded while tokens are represented exclusively using offsets and lengths (no string content copying). This is in contrast to "extractive" parsing (on which DOM, SAX and other old XML processing models are based), which allocates small memory blocks (a.k.a. strings) and copies into them the actual token content. "Virtual Token Descriptor" (VTD), whose layout is shown in Figure 1, is a binary encoding format extending the concept of "non-extractive" parsing to XML. A VTD record is a 64-bit integer that encodes the length, offset, nesting depth and type of an XML token. As of VTD-XML 2.2, the bit layout of a VTD record is further defined as follows:
Figure 1. Bit Layout of a VTD Record
Understand the Benefits of VTD-XMLSimply put, VTD-XML fundamentally solves a significant number of XML processing related issues in enterprise, ranging from the obvious ones that you experience every day, to those hidden ones that prevent you from taking your SOA project to the next level. Below is a brief discussion of some of those issues:
To understand the benefits that VTD-XML brings to the table, below is the highlight of some of its features:
As you probably have guessed, VTD is the primary reason why VTD-XML is able to simultaneously achieve all those feats. A typical DOM parser allocates one unit of memory for each token in the XML input file tree. This is costly in both memory performance (due to heap fragmentation) and time because of the sheer quantity of allocation requests. VTD-XML simply stores a verbatim copy of the XML in-memory unparsed and then generates VTD records in front of it to allow for simple navigation and access. Because reading an XML file is by definition a read-only process, it makes sense that you need not have the flexibility of variable-allocation at this point in the parsing. Last, keep in mind that VTD-XML is technically a processing model rather than an API and you can build your own API on top of a VTD-XML model. There are a lot of articles written on various aspects of VTD-XML. They are available at "Links and presentation page". Also if your browser has Java plug-in installed, you can view this demo to help you understand the basic concept of non-extractive parsing. A Typical Use CaseRight now, many applications suffer from serious performance issues when sending large, complex-structured XML documents across your enterprise messaging backbone (using ESB, MQ or BizTalk server). The With VTD-XML, you don't just solve the problem. In fact, there is more than one way to solve the problem. Because of its memory efficiency, random access and XPath support, VTD-XML in parsing mode allows your application to handle much larger documents at higher performance with less coding. In other words, the XML documents appear "smaller" with VTD processing. Moreover, when you send the VTD index along with the XML text, the application at the receiving end can directly perform application logic (e.g. XPath queries, etc.) with zero parsing overhead, further enhancing throughput and reducing latency. Things get even better with VTD-XML when your applications start to modify the documents (to be discussed in the second part of this series). The rest of this article will demonstrate how to use VTD-XML to parse, run the XPath query and index (both generating and loading) XML documents. Before running those code samples, you need to download the VTD-XML project and download the full version of its C# port. Hello World!This example shows you how to parse a file, manually navigate to a desired node and then print out its text content. In the input XML, the text node " hello world! " is nested two-levels deep down the hierarchy. <ns1:a xmlns:ns1="someURL">
<ns1:b> hello world! </ns1:b>
</ns1:a>
The example first instantiates using System;
using System.Collections.Generic;
using System.Text;
using com.ximpleware;
namespace example1
{
class Hello_World
{
static void Main(string[] args)
{
VTDGen vg = new VTDGen();
if (vg.parseFile("test1.xml", true))
{
try{
VTDNav vn = vg.getNav();
if (vn.toElementNS(VTDNav.FIRST_CHILD,"someURL","b")){
int i = vn.getText();
if (i!=-1){
Console.WriteLine(vn.toString(i));
Console.WriteLine(vn.toNormalizedString(i));
}
}
}
catch(NavException e){
}
}
}
}
}
The output shows the difference of the hello world!
hello world!
Running XPath QueryThe second example shows you how to query the document using XPath. Below is the XML document: <?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
<items>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
<comment>Confirm this is electric</comment>
</item>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
<comment>Confirm this is electric</comment>
</item>
</items>
</purchaseOrder>
To evaluate XPath queries, you need to instantiate using System;
using System.Collections.Generic;
using System.Text;
using com.ximpleware;
namespace example2
{
class Program
{
static void Main(string[] args)
{
VTDGen vg = new VTDGen();
int i;
if (vg.parseFile("test2.xml", false))
{
try{
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath(
"/purchaseOrder/items/item[@partNum=\"
872-AA\"]/USPrice/text()");
while ((i = ap.evalXPath())!=-1)
{
Console.WriteLine(vn.toString(i));
}
}catch(NavException e){
}
}
}
}
}
The output simply echoes the qualified text nodes. 148.95
148.95
Index WritingThis example shows you how to write the index file for an XML document to avoid repetitive parsing at a later time. This is mostly done by calling using System;
using System.Collections.Generic;
using System.Text;
using com.ximpleware;
namespace example3
{
public class writeIndex
{
public static void Main(string[] args)
{
VTDGen vg = new VTDGen();
if (vg.parseFile("d:/C#_tutorial_by_code_examples/4/input.xml",true)){
vg.writeIndex("d:/input.vxl");
}
}
}
}
Index LoadingTo load the index file, call using System;
using System.Collections.Generic;
using System.Text;
using com.ximpleware;
namespace example
{
public class loadIndex
{
public static void Main(string[] args)
{
try
{
VTDGen vg = new VTDGen();
VTDNav vn = vg.loadIndex("input.vxl");
// do whatever you want here
}
catch (IndexReadException e)
{
}
}
}
}
RecapDOM, SAX and streaming XML parsing have numerous technical problems, mostly caused by extractive parsing and excessive object creation. VTD-XML is faster, more memory-efficient and easier to use because it resorts to non-extractive parsing to eliminating object creation. However, this article only showed a glimpse of what the future of XML processing is like. In the second article of this series, I will show you more features of VTD-XML that will take your breath away. History
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||