Click here to Skip to main content
15,885,546 members
Articles / Programming Languages / XML

A Dynamic XML API

Rate me:
Please Sign up or sign in to vote.
4.87/5 (16 votes)
28 Apr 2014CPOL6 min read 33.9K   1.1K   32   17
One API to rule them all (all XML file formats that is)

Image 1

Introduction

The objective is to provide one API that can be used to read all XML file formats. The goal is to be able to access the data in the XML as you would any C# object model, where an XML element would be an object and it's attributes & child elements would be properties of that object i.e. Object.Property.Subproperty.Name. The API consists of just one class, DynamicXmlNode, and it’s based on the dynamic dictionary sample from Microsoft. I’ll be using the book store example xml format commonly used on W3Schools as a means of describing how the API works. XML is parsed using the XDocument API

Background

This is my first time publishing anything up here so go easy on me :). A lot of my work involves parsing XML documents and I nearly always end up throwing together a custom API to read these files. Recently I’ve been playing around with C# dynamics and had the idea of creating a dynamic XML API. I did, and here it is.

Before publishing this article here I had a Google around to see what already existed out there. I didn’t find anything quite like what I’ve put together (in particular the handling of arrays) so I've decided it’s worth sharing.

Handling Repeated Elements (Arrays)

The dynamic API can handle repeating elements by detecting sibling elements with the same name and grouping them into a collection, but it can’t tell if an element should be added to a collection if there’s only one of them present in the xml instance document.

Solution

Let the user tell us at runtime. The user of the API knows the format of the file they’re trying to parse, and which elements belong in collections. So by introducing a property naming convention for accessing arrays we can have the user tell us which elements belong in collections. The convention I’ve chosen is __Array (double underscore). Anytime a property is requested ending with __Array, the API will always return an array based on the property name before __Array, even if no such property exists. This is very convenient since you don’t need to check if the property is null before iterating over it.

Handling complex elements that also contain a value

There’s a complication when an element has a value or text but also contains attributes or child elements. The dynamic API will create DynamicXmlNodes for these elements and therefore accessing the corresponding property will not return a string but a DynamicXmlNode. See an example of this situation below:

<pre lang="xml"><title lang="en">The Selfish Gene</title></pre>

<p>The element contains both an attribute and text.  If you wanted to access the lang value you’d just go <code>Book_Array[0].Title.Lang
, but if you wanted to access just the text you can’t because Book_Array[0].Title returns a DynamicXmlNode.

Solution

I have three solutions for this, all of which will return the value of the title element

  1. Implicit String Operator: DynamicXmlNode includes an implicit operator to string, so assigning a DynamicXmlNode object to a string will always return the value of the element being wrapped i.e.
    C#
    string title = bookstore.Book__Array[0].Title;
  2. Book_Array[0].Title.ToString() will also return the value of the element being wrapped
  3. Book_Array[0].Title._ will also return the value of the element being wrapped

Underscore Conventions

  • __Array: If the caller wants an array they get an array! Users just have to append __Array (case insensitive, double underscore) to the property name e.g. BookStore.Book__Array
  • __PropertyName: Including a double underscore before a property will return the XML (attribute or element) that was used to generate that property e.g. Book_Array[0].__Title
  • __: A double underscore will return the XML element being wrapped by the DynamicXmlNode e.g. Book_Array[0].__
  • _: A single underscore will always return the value of the element e.g. Book__Array[0].Title._

How it Works

All property names are case in-sensitive.

All properties requested that don’t exist will return null, except in the case where a request property ends with __Array (as mentions above).

Any element that contains child elements or attributes will be wrapped in a DynamicXmlNode object. This means you can easily drill down into the file like this BookStore.Book_Array[0].Title.Lang

Using the code

I’ll use the book store example commonly used on W3Schools to demonstrate how to use the API. Here’s the xsd schema for the bookstore.

book store schema

As you can see, there are three places in the schema where an array can occur i.e. books, book authors, and CDs. Here’s a sample snippet from a book store instance document (xml file).

XML
<bookstore>
    <book category="COOKING">
        <title lang="en">Everyday Italian</title>
        <author>Giada De Laurentiis</author>
        <year>2005</year>
        <price>30.00</price>
    </book>
    <book category="POPULAR SCIENCE">
        <title lang="en">The Selfish Gene</title>
        <author>Richard Dawkins</author>
        <year>1976</year>
        <price>15.00</price>
    </book>
    <book category="CHILDREN">
        <title lang="en">Harry Potter</title>
        <author>J K. Rowling</author>
        <year>2005</year>
        <price>29.99</price>
    </book>
    <book category="WEB">
        <title lang="en">XQuery Kick Start</title>
        <author>James McGovern</author>
        <author>Per Bothner</author>
        <author>Kurt Cagle</author>
        <author>James Linn</author>
        <author>Vaidyanathan Nagarajan</author>
        <year>2003</year>
        <price>49.99</price>
    </book>
    <book category="WEB">
        <title lang="en">Learning XML</title>
        <author>Erik T. Ray</author>
        <year>2003</year>
        <price>39.95</price>
    </book>
<bookstore>

Let’s say we want to write the title of the first book to the console.

C#
dynamic bookstore = DynamicXmlNode.Load(File);
Console.WriteLine(bookstore.Book_Array[0].Title);

Console output: Everyday Italian

Note that Console.WriteLine calls ToString() on the Title property so the issue of complex elements describe above is masked here.

Now, let’s say we want to write the title of the first book by Richard Dawkins to the console.

C#
dynamic bookstore = DynamicXmlNode.Load(File);
Console.WriteLine
(
   (from book in bookstore.Book_Array as IEnumerable<dynamic>
    where book.Author == "Richard Dawkins" 
    select book).First().Title
);

Console output: Selfish Gene

Now, let’s say we want to find the first book with multiple authors and write it’s title to the console

C#
Console.WriteLine
(
     (from book in bookstore.Book_Array as IEnumerable<dynamic> 
     where book.Author_Array.Count > 1 
     select book).First().Title
);

Console output: XQuery Kick Start

Points of Interest

RunTimeBinderExceptions

When using dynamic, the DLR (Dynamic Language Runtime) first attempts to resolve member calls by looking for statically defined members on the dynamic object. When it doesn’t find any it throws a RunTimeBinderException before calling the TryGetMember and TrySetMember method of the dynamic object. These are just first chance exceptions and nothing to worry about, but they can make debugging a nightmare when you’ve the debugger configured to beak on exceptions. A simple solution to this problem is to add the RunTimeBinderException to the list of exceptions to break on and then uncheck it.

book store schema

Element or Attribute naming clashes with conventions used

The underscore convention I’m using could potentially conflict with element names, but this scenario is very unlikely. For example, in order for the __Array convention to cause a clash the XML schema being followed would have to be using sibling elements with the names like x and x__Array. It’s much more likely that these elements would have a parent/child relationship, i.e. x__Array/x. Also, I’ve purposely chosen to use double underscores to avoid potential clashes.

I could have capitalized on the xml element naming restrictions i.e. element names cannot begin with numbers or "xml", but this would look ugly in API usage.

Element or Attribute names containing dots

It is perfectly valid for XML elements or attributes to contain the dot character ('.') in their names, however the dot character in C# is a special operator for specifying a member of a type or namespace. Since we’re using the element/attribute name as C# property names these dot characters need to be replaces with C# property friendly characters. You’ll never guess which character I’ve chosen to replace them with. The underscore! Here’s an example scenario:

XML
<element some.attribute="12">

Note: In the code below how the dot in the element name has been replaced by an underscore

C#
Element.Some_Attribute

Handling Namespaces

The API works with the element's local name (i.e. without the namespaces prepended), so files containing namespaces are supported but there is the potential for elements to be overridden if the namespace was being used to uniquely qualify sibling elements.

Future Feature Ideas

Lazy Instantiation

Currently the complete XML file is loaded into memory. It may be desirable to have a lazy version of the API.

Write Capabilities

Dynamically reading is one thing, but dynamic write is a much more complex problem. Issues like knowing when to add an attribute or an element when a setter is called, handling namespaces, and satisfying element sequence constraints. These are all things you’d need to handle.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Engineer
Ireland Ireland
I’m an extremely talented, modest, software engineer from Dublin, Ireland. I've been working with C# and .NET for about 7 years now. The end.

Comments and Discussions

 
QuestionLove using this API but i also need to know what type the xml element is. Pin
martintr3-Mar-15 11:25
martintr3-Mar-15 11:25 
AnswerRe: Love using this API but i also need to know what type the xml element is. Pin
Alan Fay5-Mar-15 3:07
Alan Fay5-Mar-15 3:07 
GeneralRe: Love using this API but i also need to know what type the xml element is. Sorry for missing xml. Pin
martintr5-Mar-15 11:18
martintr5-Mar-15 11:18 
GeneralRe: Love using this API but i also need to know what type the xml element is. Sorry for missing xml. Pin
Alan Fay9-Mar-15 2:57
Alan Fay9-Mar-15 2:57 
QuestionSuper Cool Pin
Member 967206-Feb-15 19:01
Member 967206-Feb-15 19:01 
QuestionNice Idea, however... Pin
FatCatProgrammer24-Apr-14 7:19
FatCatProgrammer24-Apr-14 7:19 
AnswerRe: Nice Idea, however... Pin
Alan Fay24-Apr-14 10:51
Alan Fay24-Apr-14 10:51 
QuestionInteresting, but not sure about the syntax Pin
SteveTheThread23-Apr-14 23:40
SteveTheThread23-Apr-14 23:40 
AnswerRe: Interesting, but not sure about the syntax Pin
Alan Fay24-Apr-14 10:40
Alan Fay24-Apr-14 10:40 
GeneralRe: Interesting, but not sure about the syntax Pin
SteveTheThread24-Apr-14 21:52
SteveTheThread24-Apr-14 21:52 
QuestionAny type of XML? Pin
Shao Voon Wong23-Apr-14 21:23
mvaShao Voon Wong23-Apr-14 21:23 
AnswerRe: Any type of XML? Pin
Alan Fay24-Apr-14 10:12
Alan Fay24-Apr-14 10:12 
Hi Wong,
The purpose of the API is to abstract away the XML and provide a domain specific API. Users of the API will know the name of elements and attributes they want to access, so the user directs the API to the corresponding XML node. If you're interested in finding sibling elements and descendant elements then you're better off sticking with your XML/DOM API.
QuestionVery good article about one API that can be used to read ALL XML file formats Pin
Volynsky Alex23-Apr-14 11:07
professionalVolynsky Alex23-Apr-14 11:07 
AnswerRe: Very good article about one API that can be used to read ALL XML file formats Pin
iFayer23-Apr-14 23:38
iFayer23-Apr-14 23:38 
GeneralRe: Very good article about one API that can be used to read ALL XML file formats Pin
Volynsky Alex24-Apr-14 7:05
professionalVolynsky Alex24-Apr-14 7:05 
AnswerRe: Very good article about one API that can be used to read ALL XML file formats Pin
Alan Fay24-Apr-14 1:33
Alan Fay24-Apr-14 1:33 
GeneralRe: Very good article about one API that can be used to read ALL XML file formats Pin
Volynsky Alex24-Apr-14 7:04
professionalVolynsky Alex24-Apr-14 7:04 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.