Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

XML for Beginners

0.00/5 (No votes)
28 Aug 2007 1  
This article will help those who want to learn XML from scratch

Introduction

The Extensible Markup Language (XML) is a general-purpose markup language. It is classified as an extensible language because it allows its users to define their own tags. In Software Engineering, extensible refers to the system that can be modified by changing or adding features. Its primary purpose is to facilitate the sharing of data across different information systems, particularly via the Internet.

XML is recommended by the World Wide Web Consortium (W3C). It is a fee-free open standard. The W3C recommendation specifies both the lexical grammar, and the requirements for parsing.

The basic difference between HTML and XML is: 

  • HTML was designed to display data and to focus on how data looks
  • XML is designed to describe data and to focus on what data is

A Simple Program using XML

<?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> 
<!-- Sample Program --> 
<Note> 
<to>Irshad</to> 
<from>Farhan</from> 
<heading>Test Application</heading> 
<body>Don't forget me this weekend!</body> 
</Note>

Note

A well formed XML must have proper opening and closing tags.

Data can be stored in child elements or in attributes, e.g.:

<Note to=&quot;Irshad&quot;> //XML attribute 
<from>Farhan</from> //XML child element 
<heading>Test Application</heading> 
<body>Don't forget me this weekend!</body> 
</Note> 

Attributes are handy in HTML, but in XML, it is better to avoid them. Use child elements if the information feels like data.

XML Validation

There are two levels of correctness of an XML document:

  1. XML with correct syntax is Well Formed XML
  2. XML validated against a DTD is Valid XML

A Well Formed XML document is a document that conforms to the XML syntax rules like:

  • XML documents must have a root element
  • XML elements must have a closing tag 
  • XML tags are case sensitive 
  • XML elements must be properly nested 
  • XML attribute values must always be quoted 

A Valid XML document is a "Well Formed" XML document, which conforms to the rules of a Document Type Definition (DTD).

Document Type Definition (DTD)

The purpose of a DTD is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements.

A DTD can be declared inline inside an XML document, or as an external reference.

Internal DTD Declaration:

If the DTD is declared inside the XML file, it should be wrapped in a DOCTYPE definition with the following syntax:

<?xml version=&quot;1.0&quot;?> 
<!DOCTYPE Note [ 
<!ELEMENT Note (to,from,heading,body)> 
<!ELEMENT to (#PCDATA)> 
<!ELEMENT from (#PCDATA)> 
<!ELEMENT heading (#PCDATA)> 
<!ELEMENT body (#PCDATA)> ]>

The DTD above is interpreted like this:

  • !DOCTYPE Note defines that the root element of this document is Note
  • !ELEMENT Note defines that the note element contains four elements: "to,from,heading,body"
  • !ELEMENT to defines the to element to be of the type #PCDATA
  • !ELEMENT from defines the from element to be of the type #PCDATA, etc.
External DTD Declaration:

If the DTD is declared in an external file, it should be wrapped in a DOCTYPE definition with the following syntax:

<?xml version=&quot;1.0&quot;?>
<!DOCTYPE note SYSTEM &quot;note.dtd&quot;>
<note>
<to>Irshad</to>
<from>Farhan</from>
<heading>Test Application</heading>
<body>Don't forget me this weekend!</body>
</note> 

And this is the file note.dtd which contains the DTD:

<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)> 

Reference

XML Schema

XML Schema is used to define the legal building blocks of an XML document, just like a DTD. XML Schemas are the successors of DTDs and also referred to as XML Schema Definition (XSD).

XML Schemas are now used in most Web applications as a replacement for DTDs and in the near future, they will completely replace DTDs due to the following reasons:

  • XML Schemas are extensible to future additions
  • XML Schemas are richer and more powerful than DTDs
  • XML Schemas are written in XML
  • XML Schemas support data types
  • XML Schemas support namespaces

Example

An example of a very simple XML Schema Definition to describe a country is given below:

<xs:schema
 xmlns:xs=&quot;http://www.w3.org/2001/XMLSchema&quot;>
 <xs:element name=&quot;country&quot; type=&quot;Country&quot;/>
 <xs:complexType name=&quot;Country&quot;>
  <xs:sequence>
   <xs:element name=&quot;name&quot; type=&quot;xs:string&quot;/>
   <xs:element name=&quot;population&quot; type=&quot;xs:decimal&quot;/>
  </xs:sequence>
 </xs:complexType>
</xs:schema> 

An example of an XML document that confirms to this schema is given below (assuming the schema file name is country.xsd and both files are in the same directory):

<country>
 xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;
 xsi:noNamespaceSchemaLocation=&quot;country.xsd&quot;>

XQuery

The best way to explain XQuery is to say that:

"XQuery is to XML what SQL is to database tables"

XQuery was designed to query XML data. XQuery is also known as XML Query.

The mission of the XML Query project is to provide flexible query facilities to extract data from real and virtual documents on the World Wide Web, therefore finally providing the needed interaction between the Web world and the database world. Ultimately, collections of XML files will be accessed like databases.

XQuery uses XPath (XPath is a language for finding information in an XML document. XPath is used to navigate through elements and attributes in an XML document.) expression syntax to address specific parts of an XML document. It supplements this with a SQL-like "FLWOR expression" for performing joins. A FLWOR expression is constructed from the five clauses after which it is named: FOR, LET, WHERE, ORDER BY, RETURN.

Example

Let's take any XML document:

<?xml version=&quot;1.0&quot; encoding=&quot;ISO-8859-1&quot;?>
<bookstore>
<book category=&quot;Poetry&quot;>
  <title>Bang-e-Dara</title>
  <author>Allama Iqbal</author>
  <year>1930</year>
  <price>100.00</price>
</book>
<book category=&quot;Children&quot;>
  <title>Chocolate Factory</title>
  <author>Amra Alam</author>
  <year>2007</year>
  <price>50.00</price>
</book>
</bookstore>

A simple XQuery can be written to extract a record out of this XML document like:

&quot;doc(&quot;books.xml&quot;)/bookstore/book[price<70]&quot; 

The XQuery above will extract the following:

<book category=&quot;Children&quot;>
  <title>Chocolate Factory</title>
  <author>Amra Alam</author>
  <year>2007</year>
  <price>50.00</price>
</book>

XML Validation Against XSD

XML document can be validated against XML schema (XSD). XSD checks the XML document's complete structure and reports an error if any datatype mismatches or node element does not exist.

Below is sample code written in C# which takes an XML document and an XSD document as input and validates the XML document:

using System.Xml; 
using System.Xml.Schema; 
public void validateXML() 
{ 
	strXMLFileName = Server.MapPath(&quot;XMLDoc.xml&quot;); 
	strXSDFileName = Server.MapPath(&quot;GMACApplicationTypes.xsd&quot;); 
	XmlTextReader tr = new XmlTextReader(strXMLFileName); 
	XmlSchemaCollection sc = new XmlSchemaCollection(); 
	XmlValidatingReader vr = new XmlValidatingReader(tr); 
	try 
	{ 
		sc.Add(null, strXSDFileName); 
		vr.ValidationType = ValidationType.Schema; 
		vr.Schemas.Add(sc); 
		vr.ValidationEventHandler += ValidationCallBack; 
		while ((vr.Read())) 
		{ 
		} 
	} 
	catch (Exception ee) 
	{ 
		Response.Write(ee.Message + &quot;:&quot; + ee.Message); 
	} 
}

public void ValidationCallBack(object sender, ValidationEventArgs args) 
{ 
	//textArea will contain all validation summary 
	Summary.Text += &quot;\nValidation error:\n&quot;; 
	Summary.Text += args.Exception.ToString(); 
	error_count++; 
}

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here