Click here to Skip to main content
15,861,172 members
Articles / Web Development / ASP.NET
Article

A Simple XML Validator, using VOLE

Rate me:
Please Sign up or sign in to vote.
5.00/5 (19 votes)
16 Apr 20077 min read 59.4K   615   30   7
A simple command-line utility that validates XML files, implemented using MSXML via the VOLE COM/Automation driver library

Screenshot - LeadImage.png

Introduction

This article describes xmlValidator, a simple command-line utility that validates XML files, using MSXML. It is written using the VOLE C++/COM Automation driver library, allowing the application to be extremely succinct without binding it to a particular compiler (vendor).

Background

When working with a client recently, we had a requirement to validate the correctness of several hundred XML files, in verifying the configuration of an enterprise Java system. Naturally, we did not want to do this manually, so I knocked together the tool described in this article, xmlValidator, in just a few minutes, and we were able to run the validation in batch mode.

The application has very simple functionality: open and parse the XML file passed as its single command-line argument. If the XML file can be opened and is valid, the application follows the "Rule of Silence", and does nothing, and returns EXIT_SUCCESS. If the XML file cannot be opened, or is not XML, or has errors, then the application emits a description of the error and its location, and returns EXIT_FAILURE.

Because we were on a Windows system, we decided to use Microsoft's XML parser, MSXML. MSXML is a COM component. There are several ways to use it from a C++ program. You can, if you want the heartache, program directly to the COM interfaces - but that's a lot of work. It's a lot easier to use a wrapper library. One such library is VOLE, which is a compiler-independent, open-source project that I released earlier this year. By using VOLE, we were able to effect all the COM operations - create the MSXML XML Document object, cause it to parse an XML file, and elicit parse error information from it - in just six lines of code:

C++
object      xmlDocument =   object::create("Msxml2.DOMDocument");

bool        success     =   xmlDocument.invoke_method<bool>(L"load", argv[1]);

object      parseError  =   xmlDocument.get_property<object>(L"ParseError");
std::string reason      =   parseError.get_property<std::string>(L"reason");
long        line        =   parseError.get_property<long>(L"line");
long        linePos     =   parseError.get_property<long>(L"linepos");

Implementation

Application Structure

The basic structure of the application is as follows:

C++
int main(int argc, char** argv)
{

Strategy:

  1. Check arguments
  2. Initialise the COM libraries
  3. Declare VOLE components to be "use"d
  4. Create an instance of the MSXML document object
  5. Load the XML
  6. If succeeded, return success code to outside world
  7. If failed, elicit parsing error details and display
. . .
}

Includes

Naturally, we need to ensure we have all the requisite #includes. As well as including the main VOLE header file, vole/vole.hpp, we also include the header files for two components from the STLSoft libraries, and the requisite standard C and C++ header files:

C++
// VOLE Header Files
#include <vole/vole.hpp>                // for VOLE

// STLSoft Header Files
#include <comstl/util/initialisers.hpp> // for comstl::com_initialiser;
#include <winstl/error/error_desc.hpp>  // for winstl::error_desc

// Standard C++ Header Files
#include <iostream>                     // for std::cout, std::cerr,     
                                        //     std::endl;
#include <string>                       // for std::string

// Standard C Header Files
#include <stdlib.h>                     // For EXIT_SUCCESS, 

EXIT_FAILURE

Step 1: Check arguments

This is pretty boilerplate stuff:

C++
if(2 != argc)
{
    std::cerr << "USAGE: xmlValidator <xml-file>" << std::endl;
}
else try
{
  . . . // main functioning
}
catch( . . . )
{
  . . . // error handling
}

Step 2: Initialize the COM libraries

This is done using the first of the STLSoft components, the com_initialiser from the COMSTL sub-project. As the comments explain, this ensures that the initialization and un-initialization of the COM libraries is handled appropriately.

This is done by creating a local instance of the comstl::com_initialiser component, which employs internally-initialized RAII to initialize the COM libs in its ctor, and un-initialize them (if successfully initialized) in its dtor.

C++
comstl::com_initialiser coinit; // Initialise COM, via RAII

Step 3: Declare the VOLE components to be used

As with most C++ libraries, the VOLE components are defined within a namespace, vole. We use "using declarations" to save ourselves the eye-strain (and finger-ache) of having to qualify each use of a VOLE component.

The two main public types provided by VOLE are object and collection. vole::object is a generic wrapper for a COM server. vole::collection is a generic wrapper for a COM "collection", and provides STL-compatible iterators for enumerating the collection's elements. vole::collection is not needed in this utility, but I plan to write a follow up article illustrating its use. If you can't wait for that, feel free to check out the examples here.

C++
using vole::object;
using vole::of_type;  // Only required by "old" compilers, such as VC6

vole::of_type is a helper function template that is used by old compilers (i.e. VC++ 6) which have problems with the standard VOLE syntax usable by all modern compilers. That'll be explained in the coming sections. We'll discriminate between the alternate forms shown in the following code using the XMLVALIDATOR_USE_OLD_SYNTAX pre-processor symbol, defined as follows:

C++
#if defined(STLSOFT_COMPILER_IS_MSVC) && \
    _MSC_VER == 1200
# define XMLVALIDATOR_USE_OLD_SYNTAX
#endif /* compiler */

Step 4: Create the MSXML server and wrap it.

This is very simple, using the static method vole::object::create(), as follows:

Create an instance of the MSXML document object. We use the static object::create() method, which can take either a CLSID, a string-form of a CLSID, or, as in this case, a ProgId. If it fails, a vole_exception will be thrown.

C++
object  xmlDocument =   object::create("Msxml2.DOMDocument");

This method has three overloads, allowing creation via a CLSID, a ProgId, or the string-form of a CLSID (i.e. "{F6D90F11-9C73-11D3-B32E-00C04F990BB4}"). Each method has two additional defaulted parameters, with which you can specify the creation context (e.g. CLSCTX_ALL) and the "coercion level" - the degree of effort with which returned values will be coerced from the Automation type VARIANT to C++ types. Neither of these two will feature further in this article.

Step 5: Load the XML

Once again, this is a very simple operation, involving one line:
C++
bool    success     =   

xmlDocument.invoke_method<bool>(L"load", argv[1]); 

Unfortunately for users of Visual C++ 6.0, this syntax makes the compiler have a cow. This is where the vole::of_type() function template comes in. It is used for the sole purpose of providing a type-advisory to the vole::object::invoke_method and vole::object::get_property method templates. Hence, the actual code for step 5 is as follows:

C++
#ifdef XMLVALIDATOR_USE_OLD_SYNTAX

bool    success     =    xmlDocument.invoke_method(of_type<bool>(), L"load", argv[1]);

#else /* ? XMLVALIDATOR_USE_OLD_SYNTAX */

bool    success     =   xmlDocument.invoke_method<bool>(L"load", argv[1]);

#endif /* XMLVALIDATOR_USE_OLD_SYNTAX */

Step 6: Parsing success

This is very simple. All we do is return EXIT_SUCCESS.

C++
if(success)
{
    return EXIT_SUCCESS;
}

Step 7: Parsing failure

This is the grist of our dissertation. If we fail, we need to elicit from the XML Document instance its ParseError project (also an automation object), and then elicit from it the details of the error. All the values are obtained from properties, via the vole::object::get_property method templates.

C++
else
{
    object      parseError  =   xmlDocument.get_property<object>(L"ParseError");
    std::string reason      =   parseError.get_property<std::string>(L"reason");
    long        line        =   parseError.get_property<long>(L"line");
    long        linePos     =   parseError.get_property<long>(L"linepos");

    std::cout << "Parse error at (" << line << ", " << linePos << "): " << reason << 

std::endl;
} 

VOLE provides support for returning values of other COM objects, in the form of vole::object, and for most common C++ types, including long and std::string, so all the above code just works. If you wish to obtain a type not supported you can specialize the vole::com_return_traits traits class template; this is outside the scope of this discussion, but will be covered in a future article.

Just as we saw with the method call, the syntax shown above causes consternation with Visual C++, but there is an alternate syntax that works with all compilers. So, once again, the application code for step 6 actually contains the following:

C++
else
{
#ifdef XMLVALIDATOR_USE_OLD_SYNTAX

    object      parseError  =   xmlDocument.get_property(of_type<object>(), L"ParseError");
    std::string reason      =   parseError.get_property(of_type<std::string>(), L"reason");
    long        line        =   parseError.get_property(of_type<long>(), L"line");
    long        linePos     =    parseError.get_property(of_type<long>(), L"linepos");

#else /* ? XMLVALIDATOR_USE_OLD_SYNTAX */

    object      parseError  =    xmlDocument.get_property<object>(L"ParseError");
    std::string reason      =    parseError.get_property<std::string>(L"reason");
    long        line        =    parseError.get_property<long>(L"line");
    long        linePos     =    parseError.get_property<long>(L"linepos");

#endif /* XMLVALIDATOR_USE_OLD_SYNTAX */

    std::cout << "Parse error at (" << line << ", " << linePos << "): " << reason << std::endl;
}

Handling errors

Because VOLE returns objects and values from its (method and property) functions, it indicates errors by throwing exceptions, those derived from vole::vole_exception. Thus, the last part of the application comprises two catch clauses, as follows:

C++
catch(vole::vole_exception &x)
{
std::cerr << "Validation failed: " << x.what() <<  ": " << winstl::basic_error_desc<char>(x.hr()) << std::endl;
}
catch(std::exception &x)
{
    std::cerr << "Validation failed: " << x.what() << std::endl;
}

// All other code paths lead to a failure (non-0) return code

return EXIT_FAILURE;

The second is a generic clause that will catch all standard exceptions, including std::bad_alloc. The first is more interesting. It catches vole::vole_exception, which derives from the COMSTL exception class comstl::com_exception, which has a hr() accessor that returns the COM error code (HRESULT) associated with the error. This is then used with an ANSI/multibyte WinSTL class template basic_error_desc, which is a helper Facade for the Win32 API function FormatString(). For example, if we change the ProgId to, say, Msxml99999.DOMDocument, the program prints out:

Validation failed: Could not create coclass: 800401f3, Invalid class 
string

which is a lot more useful than:

Validation failed: Could not create coclass

Note: VOLE is a new, and still developing library, and I've not yet completed the actual implementation of the rich exception hierarchy. Thus, all the VOLE exception types:

  • vole::vole_exception
  • vole::creation_exception
  • vole::invocation_exception
  • vole::type_conversion_exception

are currently just aliases for comstl::com_exception, so don't go trying multiple catch clauses involving different VOLE exception types just yet. (Of course, if you want to pitch in on the project to drive along this or any other remaining issues, you'll be most welcome to here.)

Setting up the environment

To build the program, you'll need to have access to the VOLE and STLSoft libraries. Both are open-source. Both use the modified BSD license. The latest version of VOLE is 0.2.2, and this requires STLSoft version 1.9.1 beta 47, or later.

Since both libraries are 100% header-only, setting up for their use involves nothing more than downloading and setting up the requisite environment variables. I suggest the environment variables VOLE (e.g. VOLE=C:\ThirdPartyLibs\VOLE\vole-0.2.2) and STLSOFT (e.g. STLSOFT=C:\ThirdPartyLibs\STLSoft\stlsoft-1.9.1-beta47). Then you can either incorporate the include paths VOLE/include and STLSOFT/include into your project settings, as in:

Screenshot - xmlValidator-VC71-setting.png

or on the command-line, as in:

C:\ThirdPartyTools\xmlValidator>cl -nologo -EHsc 
    -I%VOLE%/include -I%STLSOFT%/include -DWIN32 -D_CRT_SECURE_NO_DEPRECATE 
    ..\xmlValidator.cpp ole32.lib oleaut32.lib

Using the xmlValidator tool

The following two simple XML files illustrate how easy the tool is to use. First a correctly formed XML file.

good.xml:

XML
<?xml version="1.0" 

encoding="UTF-8"?>
<good>
 <no-problem-here />
</good>

Now a badly formed one.

bad.xml:

XML
<?xml version="1.0" 

encoding="UTF-8"?>
<bad>
 <problem-here>
</bad>

The following graphic shows the build command and the responses of the tool to these two XML files:

Screenshot - xmlValidator-example-useSMALL.png

More to come ...

I plan to write another article about VOLE soon, illustrating the vole::collection class's ability to provide STL iterators over a COM Collection's elements (via the IEnumXXXX protocol).

Your comments/criticisms/feature requests for VOLE are welcome via the VOLE project home here.

Your comments/criticisms/feature requests for STLSoft are welcome via the STLSoft newsgroup (here), which is kindly provided by Digital Mars, providers of free high-quality C/C++/D compilers.

History

7th April 2007: First version

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Instructor / Trainer
Australia Australia
Software Development consultant, specialising in project remediation.

Creator of the FastFormat, Pantheios, STLSoft and VOLE open-source libraries.

Author of the books Extended STL, volume 1 (Addison-Wesley, 2007) and Imperfect C++ (Addison-Wesley, 2004).

Comments and Discussions

 
QuestionError Pin
Member 103160967-Jan-14 0:33
Member 103160967-Jan-14 0:33 
Generalnice, but 2 Pin
NiknSt22-Oct-07 6:18
NiknSt22-Oct-07 6:18 
GeneralRe: nice, but 2 Pin
Matt (D) Wilson30-Oct-07 9:23
Matt (D) Wilson30-Oct-07 9:23 
GeneralRe: nice, but 2 Pin
Matt (D) Wilson1-Nov-07 10:09
Matt (D) Wilson1-Nov-07 10:09 
GeneralNice, but ... Pin
Danny-T212-Apr-07 21:21
Danny-T212-Apr-07 21:21 
GeneralFormatting Problems Pin
Adi Shavit12-Apr-07 9:45
Adi Shavit12-Apr-07 9:45 
GeneralRe: Formatting Problems Pin
Matt (D) Wilson12-Apr-07 21:02
Matt (D) Wilson12-Apr-07 21:02 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.