Introduction
The Extensible Markup Language (XML) is a general-purpose markup language. It is classified as an extensible language because it allows its users to define their own tags. In Software Engineering, extensible refers to the system that can be modified by changing or adding features. Its primary purpose is to facilitate the sharing of data across different information systems, particularly via the Internet.
XML is recommended by the World Wide Web Consortium (W3C). It is a fee-free open standard. The W3C recommendation specifies both the lexical grammar, and the requirements for parsing.
The basic difference between HTML and XML is:
- HTML was designed to display data and to focus on how data looks
- XML is designed to describe data and to focus on what data is
A Simple Program using XML
="1.0" ="UTF-8"
-->
<Note>
<to>Irshad</to>
<from>Farhan</from>
<heading>Test Application</heading>
<body>Don't forget me this weekend!</body>
</Note>
Note
A well formed XML must have proper opening and closing tags.
Data can be stored in child elements or in attributes, e.g.:
<Note to="Irshad"> //XML attribute
<from>Farhan</from> //XML child element
<heading>Test Application</heading>
<body>Don't forget me this weekend!</body>
</Note>
Attributes are handy in HTML, but in XML, it is better to avoid them. Use child elements if the information feels like data.
XML Validation
There are two levels of correctness of an XML document:
- XML with correct syntax is Well Formed XML
- XML validated against a DTD is Valid XML
A Well Formed XML document is a document that conforms to the XML syntax rules like:
- XML documents must have a root element
- XML elements must have a closing tag
- XML tags are case sensitive
- XML elements must be properly nested
- XML attribute values must always be quoted
A Valid XML document is a "Well Formed" XML document, which conforms to the rules of a Document Type Definition (DTD).
Document Type Definition (DTD)
The purpose of a DTD is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements.
A DTD can be declared inline inside an XML document, or as an external reference.
Internal DTD Declaration:
If the DTD is declared inside the XML file, it should be wrapped in a DOCTYPE
definition with the following syntax:
="1.0"
<!DOCTYPE Note [
<!ELEMENT Note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)> ]>
The DTD above is interpreted like this:
!DOCTYPE Note
defines that the root element of this document is Note
!ELEMENT Note
defines that the note
element contains four elements: "to,from,heading,body
"
!ELEMENT to
defines the to
element to be of the type #PCDATA
!ELEMENT from
defines the from
element to be of the type #PCDATA
, etc.
External DTD Declaration:
If the DTD is declared in an external file, it should be wrapped in a DOCTYPE
definition with the following syntax:
="1.0"
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Irshad</to>
<from>Farhan</from>
<heading>Test Application</heading>
<body>Don't forget me this weekend!</body>
</note>
And this is the file note.dtd which contains the DTD:
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
Reference
XML Schema
XML Schema is used to define the legal building blocks of an XML document, just like a DTD. XML Schemas are the successors of DTDs and also referred to as XML Schema Definition (XSD).
XML Schemas are now used in most Web applications as a replacement for DTDs and in the near future, they will completely replace DTDs due to the following reasons:
- XML Schemas are extensible to future additions
- XML Schemas are richer and more powerful than DTDs
- XML Schemas are written in XML
- XML Schemas support data types
- XML Schemas support namespaces
Example
An example of a very simple XML Schema Definition to describe a country is given below:
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="country" type="Country"/>
<xs:complexType name="Country">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="population" type="xs:decimal"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
An example of an XML document that confirms to this schema is given below (assuming the schema file name is country.xsd and both files are in the same directory):
<country>
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="country.xsd">
XQuery
The best way to explain XQuery
is to say that:
"XQuery is to XML what SQL is to database tables"
XQuery
was designed to query XML data. XQuery
is also known as XML Query.
The mission of the XML Query project is to provide flexible query facilities to extract data from real and virtual documents on the World Wide Web, therefore finally providing the needed interaction between the Web world and the database world. Ultimately, collections of XML files will be accessed like databases.
XQuery uses XPath (XPath is a language for finding information in an XML document. XPath is used to navigate through elements and attributes in an XML document.) expression syntax to address specific parts of an XML document. It supplements this with a SQL-like "FLWOR expression" for performing joins. A FLWOR expression is constructed from the five clauses after which it is named: FOR
, LET
, WHERE
, ORDER BY
, RETURN
.
Example
Let's take any XML document:
="1.0" ="ISO-8859-1"
<bookstore>
<book category="Poetry">
<title>Bang-e-Dara</title>
<author>Allama Iqbal</author>
<year>1930</year>
<price>100.00</price>
</book>
<book category="Children">
<title>Chocolate Factory</title>
<author>Amra Alam</author>
<year>2007</year>
<price>50.00</price>
</book>
</bookstore>
A simple XQuery
can be written to extract a record out of this XML document like:
"doc("books.xml")/bookstore/book[price<70]"
The XQuery
above will extract the following:
<book category="Children">
<title>Chocolate Factory</title>
<author>Amra Alam</author>
<year>2007</year>
<price>50.00</price>
</book>
XML Validation Against XSD
XML document can be validated against XML schema (XSD). XSD checks the XML document's complete structure and reports an error if any datatype mismatches or node element does not exist.
Below is sample code written in C# which takes an XML document and an XSD document as input and validates the XML document:
using System.Xml;
using System.Xml.Schema;
public void validateXML()
{
strXMLFileName = Server.MapPath("XMLDoc.xml");
strXSDFileName = Server.MapPath("GMACApplicationTypes.xsd");
XmlTextReader tr = new XmlTextReader(strXMLFileName);
XmlSchemaCollection sc = new XmlSchemaCollection();
XmlValidatingReader vr = new XmlValidatingReader(tr);
try
{
sc.Add(null, strXSDFileName);
vr.ValidationType = ValidationType.Schema;
vr.Schemas.Add(sc);
vr.ValidationEventHandler += ValidationCallBack;
while ((vr.Read()))
{
}
}
catch (Exception ee)
{
Response.Write(ee.Message + ":" + ee.Message);
}
}
public void ValidationCallBack(object sender, ValidationEventArgs args)
{
Summary.Text += "\nValidation error:\n";
Summary.Text += args.Exception.ToString();
error_count++;
}