Introduction
This beginners tutorial shows four different ways to represent the
same data in XML and how to select that data using XPath. The data
represented is the page size of a census recording. The page size depends on
the country and the year. Also, there are two sizes (and they may be the same
size) for the page, a large size and a small size.
="1.0" ="utf-8"
<STUFF>
<TYPE1>
<CENSUS COUNTRY="USA" YEAR="1930">
<PAGE SIZE="SMALL">17x11</PAGE>
<PAGE SIZE="LARGE">27x19</PAGE>
</CENSUS>
<CENSUS COUNTRY="USA" YEAR="1880">
<PAGE SIZE="SMALL">17x11</PAGE>
<PAGE SIZE="LARGE">19x25</PAGE>
</CENSUS>
<CENSUS COUNTRY="UK" YEAR="1871">
<PAGE SIZE="SMALL">9.5x15</PAGE>
<PAGE SIZE="LARGE">9.5x15</PAGE>
</CENSUS>
<CENSUS COUNTRY="UK" YEAR="1891">
<PAGE SIZE="SMALL">11x16</PAGE>
<PAGE SIZE="LARGE">11x16</PAGE>
</CENSUS>
</TYPE1>
-->
<TYPE2>
<CENSUS>
<COUNTRY>USA</COUNTRY>
<YEAR>1930</YEAR>
<PAGE>
<SIZE>
<SMALL>17x11</SMALL>
<LARGE>27x19</LARGE>
</SIZE>
</PAGE>
</CENSUS>
<CENSUS>
<COUNTRY>USA</COUNTRY>
<YEAR>1880</YEAR>
<PAGE>
<SIZE>
<SMALL>17x11</SMALL>
<LARGE>19x25</LARGE>
</SIZE>
</PAGE>
</CENSUS>
<CENSUS>
<COUNTRY>UK</COUNTRY>
<YEAR>1871</YEAR>
<PAGE>
<SIZE>
<SMALL>9.5x15</SMALL>
<LARGE>9.5x15</LARGE>
</SIZE>
</PAGE>
</CENSUS>
<CENSUS>
<COUNTRY>UK</COUNTRY>
<YEAR>1891</YEAR>
<PAGE>
<SIZE>
<SMALL>11x16</SMALL>
<LARGE>11x16</LARGE>
</SIZE>
</PAGE>
</CENSUS>
</TYPE2>
-->
<TYPE3>
<CENSUS>
<USA YEAR="1930">
<PAGE SIZE="SMALL">17x11</PAGE>
<PAGE SIZE="LARGE">27x19</PAGE>
</USA>
<USA YEAR="1880">
<PAGE SIZE="SMALL">17x11</PAGE>
<PAGE SIZE="LARGE">19x25</PAGE>
</USA>
<UK YEAR="1871">
<PAGE SIZE="SMALL">9.5x15</PAGE>
<PAGE SIZE="LARGE">9.5x15</PAGE>
</UK>
<UK YEAR="1891">
<PAGE SIZE="SMALL">11x16</PAGE>
<PAGE SIZE="LARGE">11x16</PAGE>
</UK>
</CENSUS>
</TYPE3>
-->
<TYPE4>
<CENSUS>
<COUNTRY>
USA
<YEAR>
1930
<PAGE>
<SIZE TYPE="SMALL">17x11</SIZE>
<SIZE TYPE="LARGE">27x19</SIZE>
</PAGE>
</YEAR>
<YEAR>
1880
<PAGE>
<SIZE TYPE="SMALL">17x11</SIZE>
<SIZE TYPE="LARGE">19x25</SIZE>
</PAGE>
</YEAR>
</COUNTRY>
<COUNTRY>
UK
<YEAR>
1871
<PAGE>
<SIZE TYPE="SMALL">9.5x15</SIZE>
<SIZE TYPE="LARGE">9.5x15</SIZE>
</PAGE>
</YEAR>
<YEAR>
1891
<PAGE>
<SIZE TYPE="SMALL">11x16</SIZE>
<SIZE TYPE="LARGE">11x16</SIZE>
</PAGE>
</YEAR>
</COUNTRY>
</CENSUS>
</TYPE4>
</STUFF>
Background
Deciding when to use an element or an attribute to represent XML data is
confusing for us beginners. Even more confusing is how to select the data when
it is represented in different forms.
Using the code
Just create a new C# console application called ConsoleXMLTest and replace the
body of Class1.cs with the following code. Create a file called data.xml and
place the above XML into that file. Place it in the appropriate directory so
that your application can locate it. I set a build event under properties to
move data.xml from the project directory to the output directory automatically
as thus:
copy "$(PRojectDir)data.xml" "$(TargetDir)"
using System;
using System.IO;
using System.Xml;
using System.Xml.XPath;
using System.Collections;
namespace ConsoleXMLTest
{
class Class1
{
[STAThread]
static void Main(string[] args)
{
string fileName = "data.xml";
FileStream fs = new FileStream(fileName,FileMode.Open,FileAccess.Read);
XmlTextReader reader = new XmlTextReader(fs);
TestOne(reader);
fs.Seek(0,SeekOrigin.Begin);
reader = new XmlTextReader(fs);
TestTwo(reader);
fs.Seek(0,SeekOrigin.Begin);
reader = new XmlTextReader(fs);
TestThree(reader);
fs.Seek(0,SeekOrigin.Begin);
reader = new XmlTextReader(fs);
TestFour(reader);
}
static void TestOne(XmlTextReader reader)
{
System.Console.WriteLine("TestOne");
XPathDocument xdoc = new XPathDocument(reader);
XPathNavigator nav = xdoc.CreateNavigator();
XPathNodeIterator nodeItor = nav.Select(
"STUFF/TYPE1/CENSUS[@COUNTRY='USA' and @YEAR='1930']/PAGE");
nodeItor.MoveNext();
TraverseSiblings(nodeItor);
System.Console.WriteLine();
}
static void TestTwo(XmlTextReader reader)
{
System.Console.WriteLine("TestTwo");
XPathDocument xdoc = new XPathDocument(reader);
XPathNavigator nav = xdoc.CreateNavigator();
XPathNodeIterator nodeItor = nav.Select(
"STUFF/TYPE2/CENSUS[COUNTRY='USA' and YEAR='1930']/PAGE/SIZE");
nodeItor.MoveNext();
TraverseChildren(nodeItor);
System.Console.WriteLine();
}
static void TestThree(XmlTextReader reader)
{
System.Console.WriteLine("TestThree");
XPathDocument xdoc = new XPathDocument(reader);
XPathNavigator nav = xdoc.CreateNavigator();
XPathNodeIterator nodeItor = nav.Select(
"STUFF/TYPE3/CENSUS/USA[@YEAR='1930']/PAGE");
nodeItor.MoveNext();
TraverseSiblings(nodeItor);
System.Console.WriteLine();
}
static void TestFour(XmlTextReader reader)
{
System.Console.WriteLine("TestFour");
XPathDocument xdoc = new XPathDocument(reader);
XPathNavigator nav = xdoc.CreateNavigator();
XPathNodeIterator nodeItor = nav.Select(
"STUFF/TYPE4/CENSUS/COUNTRY[normalize-space(text())='USA']"+
"/YEAR[normalize-space(text())='1930']/PAGE/SIZE");
nodeItor.MoveNext();
TraverseSiblings(nodeItor);
System.Console.WriteLine();
}
static void TraverseSiblings(XPathNodeIterator nodeItor)
{
XPathNodeIterator igor = nodeItor.Clone();
PrintNode(igor.Current);
igor.Current.MoveToNext();
bool more = false;
do
{
PrintNode(igor.Current);
more = igor.Current.MoveToNext();
}while(more); }
static void TraverseChildren(XPathNodeIterator nodeItor)
{
XPathNodeIterator igor = nodeItor.Clone();
igor.Current.MoveToFirstChild();
bool more = false;
do
{
PrintNode(igor.Current);
more = igor.Current.MoveToNext();
}while(more);
}
static void Traverse(XPathNodeIterator nodeItor)
{
Stack nodeStack = new Stack();
nodeStack.Push(nodeItor.Clone());
while(nodeStack.Count > 0)
{
XPathNodeIterator igor = (XPathNodeIterator)nodeStack.Pop();
if(igor.Current.HasChildren == false)
{
PrintNode(igor.Current);
}
else
{
XPathNodeIterator egor = igor.Clone();
egor.Current.MoveToFirstChild();
Stack reverseStack = new Stack();
reverseStack.Push(egor.Clone());
while(egor.Current.MoveToNext() == true)
{
reverseStack.Push(egor.Clone());
}
while(reverseStack.Count > 0)
{
nodeStack.Push(reverseStack.Pop());
}
}
}
}
static void PrintNode(XPathNavigator nav)
{
System.Console.WriteLine(nav.Name + ":" + nav.Value +
" Type : " + nav.NodeType.ToString());
}
}
}
Points of Interest
Learning how to select nodes using XPath is not very difficult. Since I like to
learn by example I made this code to reinforce the things I learned from
studying MSDN and various web sites.
To select a node that has a particular attribute:
XPathNodeIterator nodeItor = nav.Select(
"STUFF/TYPE1/CENSUS[@COUNTRY='USA' and @YEAR='1930']/PAGE");
The above query selects all PAGE nodes that have a CENSUS parent with
attributes of USA and 1930.
To select a node that has a particular value:
XPathNodeIterator nodeItor = nav.Select(
"STUFF/TYPE2/CENSUS[COUNTRY='USA' and YEAR='1930']/PAGE/SIZE");
The above query selects all SIZE nodes of the PAGE nodes that have a CENSUS
parent that has COUNTRY and YEAR children with the respective values of USA and
1930.
TestFour is of particular interest because I have XML elements that have a
value and have children. During my studying of XML I didn't come across any
examples of this and at first I didn't think it could be done. Here is the XML
data for TestFour.
<TYPE4>
<CENSUS>
<COUNTRY>
USA
<YEAR>
1930
<PAGE>
<SIZE TYPE="SMALL">17x11</SIZE>
<SIZE TYPE="LARGE">27x19</SIZE>
</PAGE>
</YEAR>
<YEAR>
1880
<PAGE>
<SIZE TYPE="SMALL">17x11</SIZE>
<SIZE TYPE="LARGE">19x25</SIZE>
</PAGE>
</YEAR>
</COUNTRY>
<COUNTRY>
UK
<YEAR>
1871
<PAGE>
<SIZE TYPE="SMALL">9.5x15</SIZE>
<SIZE TYPE="LARGE">9.5x15</SIZE>
</PAGE>
</YEAR>
<YEAR>
1891
<PAGE>
<SIZE TYPE="SMALL">11x16</SIZE>
<SIZE TYPE="LARGE">11x16</SIZE>
</PAGE>
</YEAR>
</COUNTRY>
</CENSUS>
</TYPE4>
When selecting the YEAR node and displaying the value I would get
all of the whitespace around the value as well. I learned to use this code:
XPathNodeIterator nodeItor = nav.Select(
"STUFF/TYPE4/CENSUS/COUNTRY[normalize-space(text())='USA']"+
"/YEAR[normalize-space(text())='1930']/PAGE/SIZE");
The query selects the COUNTRY node that has the text equal to USA with the
whitespace stripped away. It does the same for the YEAR.
Additionally, I have written several recursive routines to traverse an XML
tree during my studies. In this code I decided to use a non-recursive solution
using a Stack and a while loop.
Notice that I have a habit of naming iterator variables igor. It came from
seeing so many named itor and I couldn't help but think of Igor from Young
Frankenstein. So you will see some Igors and some Egors in the code.
The results of the code is this:
TestOne
PAGE:17x11 Type : Element
PAGE:27x19 Type : Element
TestTwo
SMALL:17x11 Type : Element
LARGE:27x19 Type : Element
TestThree
PAGE:17x11 Type : Element
PAGE:27x19 Type : Element
TestFour
SIZE:17x11 Type : Element
SIZE:27x19 Type : Element
Reference
Here are some references concerning XPath:
History
Master Degree in C.S. .NET, Unix, Macintosh (OS X, 9, 8...), PC server side, and MFC. 17 years experience. Graphics, Distributed processing, Object Oriented Methods and Models.
Java, C#, C++. Webservices. XML. Real name is Geoffrey Slinker.