Click here to Skip to main content
Click here to Skip to main content
Go to top

The XmlTextReader - A Beginner's Guide

, 2 Sep 2006
Rate this:
Please Sign up or sign in to vote.
An introduction to the XmlTextReader.

Introduction

The XmlTextReader class is not the most intuitive class to work with, as the methods and properties are very low level. While the XmlTextReader class is rich in properties and methods, I've found that most of what it provides isn't necessary for the average day-to-day job. So, in this article, I'm going to present a moderately thin wrapper class for the XmlTextReader, which should be a helpful guide to using the XmlTextReader for programmers not familiar with this class. This article is also an introduction to a variety of other disciplines that I feel a beginner should be aware of--code commenting, abstraction and architecture, and unit tests. So, hopefully, there's something here for everybody!

Why Use an XmlReader?

To summarize from this site:

  • Use XmlTextReader if you need performance without schema validation or XPath/XSLT services.
  • Use XmlDocument if you need XPath services or need to update the XML.

Naturally, XmlTextReader is closer to the XML. It is a forward only reader, meaning that you can't go backwards. Contrast this with the XmlDocument class, which pulls in the entire document and allows you to random-access the XML tree. The XmlTextReader supports streams, and therefore should reduce memory requirements for very large documents.

Another advantage of the XmlTextReader is that it provides line and character position information, which can be very useful diagnostic information when there's a problem with the XML.

What Features Do I Want to Support?

There's a core set of features in XML that I want my reader to support:

  • XML declarations
  • Elements
  • Attributes
  • Namespaces
  • Element prefixes (both global and local)
  • Attribute prefixes
  • CDATA blocks
  • Inner text
  • Processing instructions
  • Node graphs (element children)

There are other features of XML, but these are the most common ones, and the ones I want to start with. The link cited above demonstrates a somewhat different approach than I am doing here, and it's useful to briefly discuss the difference. In the SoftArtisans link, many of the code snippets demonstrate looking for and evaluating specific elements and attributes of the XML, including optional ones. In other words, the application has expectations regarding the XML graph and content. The reader that I am presenting is tailored more to processing ad hoc XML, where there are no expectations regarding the graph and the content. Both approaches have there value depending on what you need to get accomplished.

The Unit Tests

The unit tests are written for my Advanced Unit Test application, downloadable here. The reason I'm using my unit test application instead of NUnit is because I want to take advantage of AUT's ability to execute tests in sequence, as I read through the XML. Yes, I could have instead written an XML fragment for each unit test, but I find this more convenient and more realistic, as I can work with the entire XML document.

The XML Test Document

Here's the XML test document, which illustrates each of the features described above:

<?xml version="1.0" encoding="utf-8"?>
<RootNode AnAttribute="AttributeValue" xmlns:foo="Some namespace">
 <ChildNode Attr1="1" Attr2="2" Attr3="3"/>
 <bar:Item xmlns:bar="LocalBar"/>
 <AttributeNamespace foo:MyAttribute="10"/>
 <![CDATA[some stuff]]>
 <!-- My comment -->
 <Element>Text</Element>
 <?do homework?>
 <ChildNode>
  <GrandchildNode Depth="3"/>
 </ChildNode>
</RootNode>

The Reader Architecture

Something this simple doesn't need an architecture, does it? In fact, it does. Even with something this simple, it's a good idea to consider what abstraction you might want (planning for the future) and helper objects that will make understanding and working with the code easier. And of course, we need to consider what kind of exceptions the reader will throw. As a side comment, it always surprises me how a good architecture, even for the simplest of functionality, practically eliminates monolithic code and helps to create nice small methods that are easily unit tested.

The IReaderInterface

I potentially want to read formats other than XML, while staying within the constraints of an XML-ish structure. For example, a comma separated value file (CSV) is a good candidate for an alternative reader implementation. By abstracting the reader, I can support alternative formats without having to change the code that uses the reader. This is a design decision that is best made early on.

The reader implements an IReader interface that provides the necessary abstraction layer:

/// <summary>
/// Defines methods and attributes that a reader must implement.
/// An interpreter interfaces with the reader to read the the
/// elements, attributes, and other aspects of, typically, the
/// xml. By implementing a custom reader, you can interpret other
/// formats, however, note that the NodeType is an XmlNodeType,
/// so even if you were to implement, say, a CSV reader, you would
/// need to map your node type to an XmlNodeType.
/// </summary>
public interface IReader
{
  /// <summary>
  /// If true, end elements are skipped during ReadNode.
  /// </summary>
  bool IgnoreEndElements { get; set;}

  /// <summary>
  /// Returns the number of attributes in the current element.
  /// </summary>
  int AttributeCount { get;}

  /// <summary>
  /// Returns the current depth of the element. Depth is a strange thing,
  /// as the element will be at depth n, but element attributes are at
  /// depth n+1. The root element is at depth 0.
  /// </summary>
  int Depth { get;}

  /// <summary>
  /// Thin wrapper for the underlying Value property, returns the CDATA text.
  /// </summary>
  string CData { get;}

  /// <summary>
  /// Thin wrapper for the underlying Value property, returns the comment text.
  /// The comment text is returned trimmed of leading and trailing whitespace.
  /// </summary>
  string Comment { get;}

  /// <summary>
  /// Thin wrapper for the underlying Value property, returns the element text.
  /// </summary>
  string Text { get;}

  /// <summary>
  /// Returns the line number and line position for the current reader position.
  /// </summary>
  LineInfo CurrentLineInfo { get;}

  /// <summary>
  /// Returns the element prefix, name, and current reader position.
  /// Unlike the XmlTextReader, the Name portion has any prefix stripped off.
  /// </summary>
  ElementTagInfo Element { get;}

  /// <summary>
  /// Returns the attribute prefix, name, value, and current reader position.
  /// Unlike the XmlTextReader, the Name portion has any prefix stripped off.
  /// </summary>
  AttributeTagInfo Attribute { get;}

  /// <summary>
  /// Returns a wrapper instance containing the processing instruction name and value.
  /// </summary>
  ProcessingInstructionInfo Instruction { get;}

  /// <summary>
  /// Returns the node type for the node at the current reader position.
  /// </summary>
  XmlNodeType NodeType { get;}

  /// <summary>
  /// Read the next node, optionally skipping end elements.
  /// </summary>
  XmlNodeType ReadNode();

  /// <summary>
  /// Reads the first attribute associated with the current element.
  /// </summary>
  /// <returns>Returns null if no first attribute exists.</returns>
  AttributeTagInfo ReadFirstAttribute();

  /// <summary>
  /// Reads the next element associated with the current element.
  /// </summary>
  /// <returns>Returns null if attempting to read past the last attribute.</returns>
  AttributeTagInfo ReadNextAttribute();

  /// <summary>
  /// Smart attribute reader, reading either the first attribute
  /// or the next attribute depending on the reader state.
  /// </summary>
  /// <returns>Returns null if no further attributes exist.</returns>
  AttributeTagInfo ReadAttribute();
}

Since this is a beginning article, I want to emphasize something here--there is no excuse for not putting in at least basic comments in your code. None. It is a discipline that I myself have worked hard to achieve, but if you're writing a professional application that you or others may one day need to maintain, you simply have to force yourself to become disciplined about writing comments.

The interface:

  • Abstracts the reading of nodes and attributes.
  • Defines the methods and properties that make it clearer as to what is being read, rather than using the XmlTextReader's Text and Value properties

Anyone interested in implementing a custom reader now knows what the custom reader needs to implement. An application needing a reader can now reference the reader via the IReader interface, and a factory pattern can be used to instantiate the appropriate reader.

The Container Classes

There are several container classes that help encapsulate information relevant to all nodes and relevant to specific nodes. Creating classes that encapsulate fields improves code readability and provides a layer of separation from the underlying implementation (the Reader class, in this case). And no, none of the container classes are unit tested--you have to draw the line somewhere, and these classes are much too simple to spend the time on unit testing.

LineInfo

All XML nodes have line and character position information, which is encapsulated in the LineInfo class:

/// <summary>
/// A wrapper class for the node line information.
/// </summary>
public class LineInfo
{
  protected int lineNumber;
  protected int linePosition;

  /// <summary>
  /// Gets the line position.
  /// </summary>
  public int LinePosition
  {
    get { return linePosition; }
  }

  /// <summary>
  /// Gets the line number.
  /// </summary>
  public int LineNumber
  {
    get { return lineNumber; }
  }

  /// <summary>
  /// Constructor, requiring line number and position.
  /// </summary>
  public LineInfo(int lineNumber, int linePosition)
  {
    this.lineNumber = lineNumber;
    this.linePosition = linePosition;
  }
}

Since this class is instantiated strictly by the reader, the properties are read-only.

NodeInfo

NodeInfo is an abstract class that encapsulates the two common elements of just about every XML node (there are a few exceptions): the node name and the node prefix.

/// <summary>
/// Implements a wrapper for the information relevant to a node:
/// the node line info, name and the node prefix.
/// </summary>
public abstract class NodeInfo
{
  protected string name;
  protected string prefix;

  protected LineInfo lineInfo;

  /// <summary>
  /// Gets the LineInfo
  /// </summary>
  public LineInfo LineInfo
  {
    get { return lineInfo; }
  }

  /// <summary>
  /// Gets the prefix.
  /// </summary>
  public string Prefix
  {
    get { return prefix; }
  }

  /// <summary>
  /// Gets the name.
  /// </summary>
  public string Name
  {
    get { return name; }
  }

  /// <summary>
  /// Constructor.
  /// </summary>
  /// <param name="lineInfo">The LineInfo for the tag.</param>
  /// <param name="prefix">The tag prefix.</param>
  /// <param name="name">The tag name.</param>
  public NodeInfo(LineInfo lineInfo, string prefix, string name)
  {
    this.lineInfo = lineInfo;
    this.prefix = prefix;
    this.name = name;
  }
}

It's an abstract class because we want to make sure that the implementation utilizes an appropriate concrete class derived from NodeInfo. The concrete implementation improves readability (since it qualifies the type of node information), and usually provides additional fields specific to the node type.

ElementNodeInfo

This class is a concrete implementation of NodeInfo, and adds a local namespace field, as elements can have local namespaces:

/// <summary>
/// A concrete implementation for managing xml element nodes. This class adds a local
/// namespace property.
/// </summary>
public class ElementNodeInfo : NodeInfo
{
  protected string localNamespace;

  /// <summary>
  /// Gets the localNamespace.
  /// </summary>
  public string LocalNamespace
  {
    get { return localNamespace; }
  }

  public ElementNodeInfo(LineInfo lineInfo, string prefix, 
         string name, string namespaceUri)
  : base(lineInfo, prefix, name)
  {
    localNamespace = namespaceUri;
  }
}

AttributeNodeInfo

This class is a concrete implementation of NodeInfo, and adds a value field, as attributes have values:

/// <summary>
/// Implements a concrete attribute tag class, that adds a 
/// Value property for the attribute.
/// </summary>
public class AttributeNodeInfo : NodeInfo
{
  protected string val;

  /// <summary>
  /// Gets the attribute value.
  /// </summary>
  public string Value
  {
    get { return val; }
  }

  public AttributeNodeInfo(LineInfo lineInfo, 
         string prefix, string name, string val)
  : base(lineInfo, prefix, name)
  {
    this.val = val;
  }
}

ProcessingInstructionInfo

This class derives from AttributeNodeInfo. A processing instruction has a name and a value, like an attribute, but I've implemented a separate class to represent the concept of a processing instruction, even though it does not extend the AttributeNodeInfo class. This is merely a code readability decision.

/// <summary>
/// A placeholder for the processing instruction.
/// </summary>
public class ProcessingInstructionInfo : AttributeNodeInfo
{
  public ProcessingInstructionInfo(LineInfo lineInfo, 
         string name, string value)
  : base(lineInfo, String.Empty, name, value)
  {
  }
}

The XmlTextReader

Instead of talking about the XmlTextReader as a class and its methods, which you can easily read about yourself, I'm going to show you the XmlTextReader within the context of my Reader wrapper. This way, instead of just looking at documentation, you'll see the XmlTextReader in actual code, and I'll explain what I'm doing in the code and why.

Creating an XmlTextReader

Quite literally, the first stumbling block is in creating an XmlTextReader. It sounds simple, but according to Microsoft:

In the Microsoft .NET Framework version 2.0 release, the recommended practice is to create XmlReader instances using the Create method. This allows you to take full advantage of the new features introduced in this release.

Second, I want to control some aspects of the reading process, specifically, I almost always want to ignore whitespace. The default XmlTextReader returns all whitespace. So, to properly construct an XmlTextReader using Microsoft's recommended method and to have the ability to set some options, we have to do something like this:

/// <summary>
/// Constructor. This initializes an XmlReader that wraps the XmlTextReader and passes 
/// in the setting to ignore whitespace (but not comments).
/// </summary>
/// <param name="xml">The xml to interpret.</param>
public Reader(string xml)
{
  StringReader textStream = new StringReader(xml);
  XmlReaderSettings settings = new XmlReaderSettings();
  settings.IgnoreComments = false;
  settings.IgnoreWhitespace = true;
  xtr = new XmlTextReader(textStream);
  reader = XmlReader.Create(xtr, settings);
  firstAttribute = true;
}

Before I go further, this constructor is the one I use for the unit tests, and it takes an XML string. You might instead want a constructor that takes a stream, and as you can see in the first line, I create a StringReader stream.

The second line creates an XmlReaderSettings instance, and I explicitly (just to show you another useful property) choose not to ignore comments, but I do want to ignore whitespace. Next, I create the XmlTextReader from the stream. But that's not enough. I now have to create an XmlReader, passing in the XmlTextReader and the desired settings. Now, we have properly constructed a reader, complying with Microsoft's guidelines, and having the ability to configure the reader to ignore whitespace.

If you're wondering about the last line, we'll get to that later.

The Constructor Unit Test

/// <summary>
/// Verifies that no errors occur during construction
/// and reader is position on nothing.
/// </summary>
[Test, Sequence(0)]
public void ConstructorTest()
{
  reader = new Reader(UnitTestResources.ReaderTest);
  reader.IgnoreEndElements = true;
  Assertion.Assert(reader.NodeType == XmlNodeType.None, 
                   "Expected 'None' for the node type.");
}

The constructor reveals the fact that the XmlTextReader does not position itself on a valid node immediately after construction, as the NodeType is "None".

Reading the XML Declaration

Reading the XML declaration, as with all other elements, requires calling ReadNode:

/// <summary>
/// Read the next node, optionally skipping end elements.
/// </summary>
public XmlNodeType ReadNode()
{
  do
  {
    reader.Read();
  } while (ignoreEndElements && (NodeType == XmlNodeType.EndElement));

  firstAttribute = true;
  return reader.NodeType;
}

My wrapper for the reader optionally skips end elements. If you don't do this, the reader will return EndElement node types, which, depending on what you are doing with the XML, may be superfluous. In the unit test constructor, this flag is set to true.

The XML Declaration Unit Test

/// <summary>
/// Validates reading the xml declaration.
/// </summary>
[Test, Sequence(1)]
public void XmlDeclarationTest()
{
  reader.ReadNode(); // Reads the xml declaration.
  Assertion.Assert(reader.NodeType == XmlNodeType.XmlDeclaration, 
                   "Expected xml declaration node type.");
  Assertion.Assert(reader.AttributeCount==2, "Expected 2 attributes.");
  AttributeNodeInfo ati1 = reader.ReadFirstAttribute();
  AttributeNodeInfo ati2 = reader.ReadNextAttribute();
  Assertion.Assert(ati1.Name == "version", "Expected version attribute.");
  Assertion.Assert(ati1.Value == "1.0", "Expected version number.");
  Assertion.Assert(ati2.Name == "encoding", "Expected encoding attribute.");
  Assertion.Assert(ati2.Value == "utf-8", "Expected encoding value.");
}

An XML declaration contains attributes just like an element node. I'll demonstrate the ReadFirstAttribute and ReadNextAttribute shortly.

Reading the Root Node and Other Elements

Immediately following the XML declaration should be the root node. My reader provides an Element property which returns an ElementNodeInfo instance that encapsulates the element name, prefix, and optional namespace. Looking at the implementation:

/// <summary>
/// Returns the element prefix, name, and current reader position.
/// Unlike the XmlTextReader, the Name portion has any prefix stripped off.
/// </summary>
public ElementNodeInfo Element
{
  get
  {
    if (NodeType != XmlNodeType.Element)
    {
      throw new ReaderException("Not on an element node.");
    }

    ElementNodeInfo el = new ElementNodeInfo(CurrentLineInfo, 
                         reader.Prefix, NameWithoutPrefix, reader.NamespaceURI);

    return el;
  }
}

You'll see that the ElementNodeInfo also consists of the reader's line and character position, and the element name is stripped of the prefix.

Reading the Root Node Unit Test

Reading an element node is straightforward, as the unit test demonstrates:

/// <summary>
/// Validates reading the xml root node.
/// </summary>
[Test, Sequence(2)]
public void RootNodeTest()
{
  reader.ReadNode();
  Assertion.Assert(reader.NodeType == XmlNodeType.Element, 
                   "Expected element node type.");
  ElementNodeInfo eti = reader.Element;
  Assertion.Assert(eti.Name == "RootNode", 
                   "Expecte root node element.");
  Assertion.Assert(eti.Prefix == "", "Expected a blank prefix.");
  Assertion.Assert(reader.AttributeCount == 2, "Expected 2 attributes.");
}

The ReadNode method is called to move past the XML declaration node and onto the root node. The unit test verifies that this happened correctly.

There's another element test later on, which tests that a local namespace has been correctly read:

/// <summary>
/// Validates reading an element with a local namespace.
/// </summary>
[Test, Sequence(8)]
public void LocalNamespaceTest()
{
  reader.ReadNode();
  Assertion.Assert(reader.NodeType == XmlNodeType.Element, 
                   "Expected element node type.");
  ElementNodeInfo eti = reader.Element;
  Assertion.Assert(eti.Prefix == "bar", 
                   "Unexpected prefix.");
  Assertion.Assert(eti.LocalNamespace == "LocalBar", 
                   "Unexpected namespace.");
}

Reading Attributes

Most XML elements contain attributes, and the root node includes two attributes, one of which is an XML namespace declaration. The XmlTextReader provides two methods for reading an attribute, MoveToFirstAttribute and MoveToNextAttribute, which return a boolean true if successful, false otherwise. I've modified this implementation slightly:

/// <summary>
/// Reads the first attribute associated with the current element.
/// </summary>
/// <returns>Returns null if no first attribute exists.</returns>
public AttributeNodeInfo ReadFirstAttribute()
{
  bool val = xtr.MoveToFirstAttribute();
  AttributeNodeInfo ret = null;

  if (val)
  {
    ret = Attribute;
    firstAttribute = false;
  }

  return ret;
}

and:

/// <summary>
/// Reads the next attribute associated with the current element.
/// </summary>
/// <returns>Returns null if attempting to read past the last attribute.</returns>
public AttributeNodeInfo ReadNextAttribute()
{
  bool val=xtr.MoveToNextAttribute();
  AttributeNodeInfo ret = null;

  if (val)
  {
    ret = Attribute;
  }

  return ret;
}

Both of these methods return an AttributeNodeInfo instance, encapsulating the reader's line and character position and the attribute name, prefix, and value. A null is returned if there are no further attributes to read. You can use these methods, or you can use another method that avoids having to figure out whether to call ReadFirstAttribute or ReadNextAttribute. My reader figures this out automatically for you, and here's where the firstAttribute boolean comes into play:

/// <summary>
/// Smart attribute reader, reading either the first attribute
/// or the next attribute depending on the reader state.
/// </summary>
/// <returns>Returns null if no further attributes exist.</returns>
public AttributeNodeInfo ReadAttribute()
{
  AttributeNodeInfo ret = null;

  if (firstAttribute)
  {
    ret = ReadFirstAttribute();
  }
  else
  {
    ret = ReadNextAttribute();
  }

  return ret;
}

The firstAttribute flag is set whenever ReadNode is called. It's cleared when the first attribute is read, either by calling ReadFirstAttribute or ReadAttribute.

The Attribute Unit Tests

The following sequence of unit tests test the first, next, and "smart" attribute reader:

/// <summary>
/// Validate reading the first attribute.
/// </summary>
[Test, Sequence(3)]
public void ReadFirstAttributeTest()
{
  AttributeNodeInfo ati = reader.ReadFirstAttribute();
  Assertion.Assert(ati.Name == "AnAttribute", 
                   "Unexpected first attribute name.");
  Assertion.Assert(ati.Value=="AttributeValue", 
                   "Unexpected first attribute value.");
}

/// <summary>
/// Validate reading the second attribute.
/// </summary>
[Test, Sequence(4)]
public void ReadNextAttributeTest()
{
  AttributeNodeInfo ati = reader.ReadNextAttribute();
  // Note that we are stripping off the prefix from the name!
  Assertion.Assert(ati.Name == "foo", 
                   "Unexpected second attribute name.");
  Assertion.Assert(ati.Prefix == "xmlns", 
                   "Unexpected second attribute prefix.");
  Assertion.Assert(ati.Value == "Some namespace", 
                   "Unexpected second attribute value.");
}

/// <summary>
/// Verify that a null is returned attempting to read past the last attribute.
/// </summary>
[Test, Sequence(5)]
public void NoFurtherAttributeTest()
{
  AttributeNodeInfo ati=reader.ReadNextAttribute();
  Assertion.Assert(ati == null, "Expected a null after the last attribute.");
}

/// <summary>
/// Verify reading the next element.
/// </summary>
[Test, Sequence(6)]
public void ReadNextElement()
{
  reader.ReadNode();
  ElementNodeInfo eti = reader.Element;
  Assertion.Assert(eti.Name == "ChildNode", "Unexpected element.");
  Assertion.Assert(reader.Depth == 1, "Unexpected depth.");
}

/// <summary>
/// Test the smart attribute reader implementation, which automatically determines
/// whether to call ReadFirstAttribute or ReadNextAttribute.
/// </summary>
[Test, Sequence(7)]
public void SmartAttributeReaderTest()
{
  int i = 0;

  while (reader.ReadAttribute() != null)
  {
    ++i;
  }

  Assertion.Assert(i == 3, "Expected 3 attributes.");
}

/// <summary>
/// Validates reading an attribute with a prefix.
/// </summary>
[Test, Sequence(9)]
public void AttributePrefixTest()
{
  reader.ReadNode();
  AttributeNodeInfo ati=reader.ReadAttribute();
  Assertion.Assert(ati.Prefix == "foo", "Unexpected prefix.");
  Assertion.Assert(ati.Name == "MyAttribute", "Unexpected attribute.");
  Assertion.Assert(ati.Value == "10", "Unexpected value.");
}

Reading CDATA

A CDATA block lets you include freeform text in the XML, such as code. My reader provides a CData property which returns the CDATA text as a string:

/// <summary>
/// Thin wrapper for the underlying Value property, returns the CDATA text.
/// </summary>
public string CData
{
  get 
  {
    if (NodeType != XmlNodeType.CDATA)
    {
      throw new ReaderException("Not on a CDATA node.");
    }

    return reader.Value; 
  }
}

As you can see, the CData property validates the node type that wraps the Value property.

The CDATA Unit Test

/// <summary>
/// Validates reading a CDATA block.
/// </summary>
[Test, Sequence(10)]
public void CDATATest()
{
  reader.ReadNode();
  Assertion.Assert(reader.NodeType == XmlNodeType.CDATA, "Expected CDATA node.");
  Assertion.Assert(reader.CData == "some stuff", "Unexpected CDATA text.");
}

Reading Comments

Reading XML comments is just like getting the CDATA text. Once we know that the node is a comment node, we return the Value property which contains the comment text. The reader also trims any leading and trailing whitespace, which is often used to make the XML comments more readable.

/// <summary>
/// Thin wrapper for the underlying Value property, returns the comment text.
/// The comment text is returned trimmed of leading and trailing whitespace.
/// </summary>
public string Comment
{
  get
  {
    if (NodeType != XmlNodeType.Comment)
    {
      throw new ReaderException("Not on a comment node.");
    }

    return reader.Value.Trim();
  }
}

The Comment Unit Test

/// <summary>
/// Validates reading a comment block.
/// </summary>
[Test, Sequence(11)]
public void CommentTest()
{
  reader.ReadNode();
  Assertion.Assert(reader.NodeType == XmlNodeType.Comment, 
                   "Expected comment node.");
  Assertion.Assert(reader.Comment == "My comment", 
                   "Unexpected comment text.");
}

Reading Inner Element Text

As the XmlTextReader moves through the XML, any inner text is its own Text node type. The reader's Text property is a thin wrapper for the XmlTextReader's Value property:

/// <summary>
/// Thin wrapper for the underlying Value property, returns the element text.
/// </summary>
public string Text
{
  get
  {
    if (NodeType != XmlNodeType.Text)
    {
      throw new ReaderException("Not on a text node.");
    }

    return reader.Value;
  }
}

The Text Unit Test

/// <summary>
/// Validates reading inner element text.
/// </summary>
[Test, Sequence(12)]
public void TextTest()
{
  reader.ReadNode();
  Assertion.Assert(reader.NodeType == XmlNodeType.Element, "Expected element.");
  reader.ReadNode();
  Assertion.Assert(reader.NodeType == XmlNodeType.Text, "Expected text node.");
  Assertion.Assert(reader.Text == "Text", "Unexpected text value.");
}

Reading Process Instructions

Process instructions are another kind of XML nodes. These may contain useful meta-instructions for the engine that is processing the XML. The reader provides a thin wrapper for getting the process instruction:

/// <summary>
/// Returns a wrapper instance containing the processing instruction name and value.
/// </summary>
public ProcessingInstructionInfo Instruction
{
  get
  {
    if (NodeType != XmlNodeType.ProcessingInstruction)
    {
      throw new ReaderException("Not on a processing instruction node.");
    }

    ProcessingInstructionInfo proc = 
      new ProcessingInstructionInfo(CurrentLineInfo, reader.Name, reader.Value);

    return proc;
  }
}

Process Instruction Unit Test

/// <summary>
/// Validates reading a processing instruction.
/// </summary>
[Test, Sequence(13)]
public void ReadProcessingInstructionTest()
{
  reader.ReadNode();
  Assertion.Assert(reader.NodeType == XmlNodeType.ProcessingInstruction, 
                   "Expected processing instruction.");
  ProcessingInstructionInfo pii = reader.Instruction;
  Assertion.Assert(pii.Name == "do", "Unexpected name.");
  Assertion.Assert(pii.Value == "homework", "Unexpected value.");
}

Working with the XML Graph

Lastly, one of the important things about XML is that it is hierarchical. The reader provides a thin wrapper to the XmlTextReader's Depth property (a very thin wrapper):

/// <summary>
/// Returns the current depth of the element. Depth is a strange thing,
/// as the element will be at depth n, but element attributes are at
/// depth n+1. The root element is at depth 0.
/// </summary>
public int Depth
{
  get { return reader.Depth; }
}

The point being though that we need this property implemented by any class that realizes IReader.

The Depth Unit Test

/// <summary>
/// Validates node depth. Note how the third ReadNode returns a depth of 0,
/// as this is the end of the xml. The root node is at depth 0, as soon as
/// the reader is positioned on an attribute of a node or a child element,
/// the depth is incremented.
/// </summary>
[Test, Sequence(14)]
public void DepthTests()
{
  reader.ReadNode(); // Reads the ChildNode element.
  Assertion.Assert(reader.Depth == 1, "Depth should be 1.");
  reader.ReadNode(); // Reads the GranchildNode element.
  Assertion.Assert(reader.Depth == 2, "Depth should be 2.");
  reader.ReadAttribute(); // Reads the Depth attribute.
  Assertion.Assert(reader.Depth == 3, "Depth should be 3.");
  reader.ReadNode(); // Reads to end.
  Assertion.Assert(reader.Depth == 0, "Depth should be 0.");
}

This unit test reveals one of the side-effects of ignoring the XML end element node type, which is that the depth can pop several levels. This should be taken into consideration when writing an application that actually does something with the XML.

About the Download

I've created a solution that contains the following projects:

  • Reader, consisting only of the Reader.cs file. This is the concrete implementation of the IReader interface.
  • ReaderCommon, consisting of the IReader interface and the container classes.
  • ReaderUnitTests, consisting of the unit tests.
  • UnitTestLib, the files necessary for compiling the above unit test project.

These comprise all the pieces necessary to compile the project without error. To actually run the unit tests, you'll need to download the Advanced Unit Test application mentioned earlier.

Wrapping it Up

The Reader is a fairly thin wrapper around the XmlTextReader class. The class is intended to be used by an application that reads nodes, inspects the node type, and determines what to do given the node type. This is an ad-hoc approach to reading XML, whereas the SoftArtisans link provided at the beginning of the article shows a more directed approach in which the application is expecting a certain format to the XML. Hopefully though, this article and the SoftArtisans link provides you with a better understanding of how to work with the XmlTextReader. The reader also provides an abstraction that decouples the application from the specific document type, which is one of the goals that I had in mind.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Marc Clifton

United States United States
Marc is the creator of two open source projets, MyXaml, a declarative (XML) instantiation engine and the Advanced Unit Testing framework, and Interacx, a commercial n-tier RAD application suite.  Visit his website, www.marcclifton.com, where you will find many of his articles and his blog.
 
Marc lives in Philmont, NY.

Comments and Discussions

 
GeneralMy vote of 5 PinmemberMC19721-Jul-11 2:57 
GeneralMy vote of 2 PinmemberMC197230-Jun-11 22:09 
GeneralRe: My vote of 2 PinprotectorMarc Clifton1-Jul-11 2:35 
GeneralRe: My vote of 2 PinmemberMC19721-Jul-11 3:22 
GeneralRe: My vote of 2 PinprotectorMarc Clifton1-Jul-11 3:30 
Generala powerful tool Pinmemberheartsofsea1-Jul-10 10:37 
Generalnot exacly a begginer's guide PinmemberPL0122-Jul-09 0:24 
GeneralSchema Validation PinmemberEd.Poore19-Aug-07 9:43 
QuestionRe: Schema Validation PinmemberRiz Thon17-Sep-08 23:04 
AnswerRe: Schema Validation PinmemberEd.Poore18-Sep-08 13:35 
QuestionHow should I include your files to be able to use your XmlTextReader wrapper? PinmemberDao-Huy Hua28-Mar-07 4:19 
AnswerRe: How should I include your files to be able to use your XmlTextReader wrapper? PinprotectorMarc Clifton28-Mar-07 4:33 
QuestionUnit Tests? PinmemberDoncp14-Nov-06 4:14 
AnswerRe: Unit Tests? PinprotectorMarc Clifton14-Nov-06 5:07 
QuestionWhat really is the message here? PinmemberPaul Selormey3-Sep-06 15:33 
AnswerRe: What really is the message here? PinprotectorMarc Clifton3-Sep-06 15:50 
GeneralRe: What really is the message here? [modified] PinmemberPaul Selormey3-Sep-06 16:39 
GeneralRe: What really is the message here? PinmemberGeorge L. Jackson21-Sep-06 8:58 
AnswerRe: What really is the message here? PinmemberMC197230-Jun-11 22:17 
GeneralRe: What really is the message here? PinprotectorMarc Clifton1-Jul-11 2:34 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140926.1 | Last Updated 2 Sep 2006
Article Copyright 2006 by Marc Clifton
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid