The XML Basics Handbook
Authored by: Itech Consulting
©www.itechconsulting.co.in
Contents:
Introduction
XML Rules
Elements and Attribute values
Root Element
Empty Element
Nesting Elements
Comments in XML
XML Browsers
Escape Characters
XML Namespaces
XML Data-Binding
Introduction:
This handbook contains basics of xml. The examples given in this handbook can be executed using Notepad, Altova's XML Spy or EditPlus software. This handbook focuses on basic concepts of xml and targets the novice xml audiences. Also some portions tend to be for intermediate xml programmers. In case, you find that the following document requires any modifications, please do let us know and we will incorporate the necessary changes.
Please send all feedback to training@itechconsulting.co.in
Sample xml document:
<?xml version="1.0" encoding="ISO-8859-1"?>
<rss chapter=”6”> <topic1>Introduction to RSS</topic1> <topic2>RSS Syntax</topic2> <topic3>RSS Channel</topic3> <topic4>RSS Item</topic4> <!--Topic 5 --> <topic5> <name>Comments in RSS</name> <description type=”short”> Learn how to put comments in an RSS document </description> </topic5> <topic6>Optional Elements in RSS Channel and Item</topic6> <topic7>Validate your RSS</topic7> </rss> |
Example 1
The above xml document is simple to understand & self-explanatory. It gives information about the topics covered under RSS.
The first line in the document refers to the xml declaration, defining xml version and the character encoding being used. It means that this xml document obeys the 1.0 specification of xml and uses ISO-8859 (Latin 1/West European) character set. The next line is the root element of this document. Here, “rss” is the root element followed by the child elements (topic1-topic7) describing the topics covered under rss.
2) Rules:
XML documents must follow certain syntax rules. Taking example 1 as reference,
1. All XML documents must have a root element. In the example 1, “rss” is the root element.
2. All XML elements must have a closing tag. In the example 1, the root (rss), all the child elements (topic1-topic7) have the start tag and the close tag. Omission of the close tag would result in an error.
3. XML tags are case-sensitive i.e. the start tag and the end tag must be written in the same case. Best practice would be to write all the tags in an xml document in the same case to make it look consistent.
4. XML elements must be properly nested. Refer to child element “topic5” and its sub-child elements “name” & “description”.
5. The attributes values must always be in quotation marks. Refer attributes in root element and in the sub-child element “description” under child element “topic5”. Violating this rule will result to an error.
6. Like HTML, comments in XML document can be written within
<!-- comment -->
3) Elements:
An xml document consists of elements. Elements in an xml document describe the data it contains. Elements can contain other elements and attributes. Root element being mandatory followed by child elements and sub-child elements and further.
Elements in an xml document cannot start with a number or underscore or letters like ‘XML’. The element name cannot contain spaces. If you require an element name to be combination of two words then you can either put an underscore sign in between two words or write the first letter of the second word in capitals, as shown in the example 2 – element ‘myForumId’
<?xml version="1.0" encoding="ISO-8859-1"?> <!--Registration Form -->
<myForumId> <name age=”27” gender =”male”>todd</name> <address>USA</address> <forumRecog Id=”todd_1977” password=”password”></forumRecog>
<!--Birthday Details--> <birthday>Optional <day>10</day> <month>Dec</month> <year>1970</year> </birthday>
</ myForumId> |
Example 2
Elements in xml can have different content types. The types of content in an xml element can be – element, simple, mixed or empty.
In example 2,
· Element ‘myForumId’ has element content because it contains other elements.
· Element ‘address’ has simple content because it contains only plain text.
· Element ‘forumRecog’ has no contents but only attributes.
It can also be written as:
<forumRecog Id=”todd_1977” password=”password” />
‘formRecog’ can also be called as empty tag since it contains no information. Empty tags need not have closed tag as shown above. But the start tag can end with a space and a forward slash.
· Element ‘birthday’ contains mixed content because it contains text as well as other elements.
4) Attributes:
Attributes are simple name/value pairs associated with an element. They are used to provide additional information about the elements. Attributes are attached to the “start tag” of the element.
Attributes must have values and must be in either single or double quotes. If the value contains a word in double quotes, then enclose the attribute value in single quote. Reverse incase if the value contains a word in single quotes.
Example 3
| <course name=”’Delphi’ for beginners”></course> |
Example 3A
| <course name=’”Delphi” for beginners’></course> |
The naming rules for an attribute are similar to that of an element. Also, two attributes cannot be of the same name in an element.
Take a look at example 2 again,
<?xml version="1.0" encoding="ISO-8859-1"?> <!--Registration Form -->
<myForumId> <name age=”27” gender =”male”>todd</name> <address>USA</address> <forumRecog Id=”todd_1977” password=”password”></forumRecog>
<!--Birthday Details--> <birthday>Optional <day>10</day> <month>Dec</month> <year>1970</year> </birthday>
</ myForumId> |
Here the “name” element contains two attributes – age, gender representing the users personal details.
Guidelines on using Attributes –
7. Use attributes if your data is simple, default or fixed.
8. In worst scenarios like if your xml file size increases, you can convert some of the child elements to attributes.
9. Use child elements instead of using attributes. Child elements help you expand your data and manage it easily.
5) Root Element:
In all xml documents, there exists a root element. In a well-formed xml document, there cannot be more than one root element in an xml document, which results in an error. Likewise, an xml document containing no root element would generate an error. These rules are presented in the following examples for Root Element.
Example 4 – Represents a well-formed xml document with element ‘hello’.
| <hello>Hi, this is xml</hello> |
Example 4A – Represents a well-formed xml document containing one root element – ‘employee’ and its child elements.
|
<employee> <empId>001</empId> <empName>todd</empName> <salary>55555</salary> </employee> |
Example 4B – Erroneous xml document containing no root element.
Example 4 – Erroneous xml document containing more than one root element
|
<hello>Hi, this is xml</hello> <hello>I am a markup language</hello> <hello>I am used for data exchange</hello>
|
6) Empty Elements:
The element that contains no element content is called empty element or empty tag. The empty element can have attributes representing some data. Empty elements can contain attributes to provide some kind of fixed value data. There can be zero or more empty elements in an xml document.
Example 5
<employees> <engineer id=”0001” name=”todd”></engineer> <!--OR--> <engineer id=”0001” name=”todd” /> <department /> <manager></manager> </employees>
|
7) Nesting Elements:
Elements in an xml document must be properly nested. If child’s start element is in the content of parent’s element, then the child’s end element must also be in its parent’s content. Elements not properly nested results in error. In an xml document there can be many elements representing blocks of data; the elements must be well nested to avoid errors and indenting them properly helps the user understand the xml document structure easily.
Example 6 - A well-formed document with elements nested properly
<book> <chapter> <topic>xml</topic> </chapter> <chapter> <topic1>xsl</topic1> <topic2>xsd</topic2> </chapter> <chapter> <topic>rss</topic> </chapter> </book>
|
Example 6A – Elements not properly nested, generates error
<book> <chapter> <topic1> xsl <topic2> </topic1> xsd </topic2>
</chapter> </book>
|
8) Comments in XML:
Comments in an xml can be put at any place inside the document. It can be before the start tag of the root element, inside the root element, child elements, sub-child elements or so on. Care must be taken to not to have “--“ (double-hyphen) in between the comments content, which results in an error.
Example 7
<!--This is an xml document representing data of the employees --> <employees> <engineer id=”0001” name=”todd”></engineer> <!--OR--> <engineer id=”0001” name=”todd” /> <department />
<!-- The manager has not yet been appointed --> <manager></manager> </employees>
<!—End - - of -- Xml Document --> =>not valid
|
9) XML Browsers:
XML browsers besides supporting XML must support style languages like CSS or XSL. Besides this the browser must support a scripting language like VBScript or JavaScript. XML browsers must support all these requirements. There are few Internet Browsers providing full support of xml.
Few of them are as listed below:
1. Internet Explorer 6: The most powerful Internet Browser support XML is Internet Explorer 6 (IE6). IE6 can display XML documents, provides support for scripting languages like JavaScript, VBScript, also provides full support for style sheet languages.
2. Netscape Navigator 6: Netscape also support XML to a good extent. Has a good style sheet support.
3. Mozilla: supports XML + XSLT + CSS.
4. FireFox: Has full XML support along with XSLT & CSS.
5. Opera: supports XML & CSS on MS Windows and Linux.
10) Escape Characters:
The elements and the data in an xml file is parsed by the xml parsers and displayed on the Internet browsers. However, there are certain illegal characters which the parsers do not accept and generate error on parsing.
Characters such as “<” and “&” are treated as illegal characters in xml. See the following example:
<conditions> <check>check for conditions one & two</check> <one>age < 20</one> <two>age < 30</two> </conditions>
|
In the above example, there would be an error generated in for contents in the <check> tag because of and illegal/invalid character = “&” and for <one> and <two> tag because of “<” illegal character. Replacing & = & and < = < eliminates the error.
Documents containing too many escape characters can be written within:
<function> <![CDATA[ function addition() { if(a<b) { c=a+b } } ]]> </function>
|
11) XML Namespaces:
Namespaces in xml are used to group related information and to avoid element name conflict. They define a mechanism to uniquely name elements and attributes so that different terminologies can be mixed in an xml document without name conflicts.
Syntax:
xmlns:namespace-prefix="namespaceURI"
Where,
xmlns is a special attribute placed in the start tag of the element.
Namespace-prefix refers to the namespace URI.
NamespaceURI = unique resource identifier.
Namespaces are required to be unique. Elements belonging to same namespace must be prefixed with the same name. Namespaces are placed in the start tag of the parent element and all the child elements with the same prefix are associated with the same namespace.
<n:contact xmlns:n=”www.itechconsulting.co.in/training”> <n:name>todd</n:name> <n:age>27</n:age> <n:address>texas</n:address> <n:phone>7777777777</n:phone> </n:contact>
|
In the above example, element names with prefix are qualified names.
Default Namespace:
<contact xmlns="http://www.itechconsulting.co.in/training"> <name>todd</name> <address>texas</address> <age>27</age> </contact>
|
In the above example, the element names are not prefixed; hence they come under the default namespace mentioned. Here, element names without prefix are called local names.
Scenario:
Suppose, at a given point you may want to refer to a same named element defined in two separate DTD’s having different terminologies in your xml document. Use of namespace helps you identify which terminology you are referring to for that element at any point of time.
For e.g.
one.dtd: <!ELEMENT state (#PCDATA) > <!ATTLIST state type (solid | gas) #REQUIRED >
two.dtd: <!ELEMENT state (#PCDATA) > <!ATTLIST state type (Maharashtra | Gujarat | MP) #REQUIRED >
|
Using namespaces in your xml document will help you differentiate which state element you refer to at all times.
three.dtd: <!ENTITY % onedtd SYSTEM "one.dtd" > %onedtd; <!ENTITY % twodtd SYSTEM "two.dtd" > %twodtd;
xml document: <?xml version="1.0"?> <!DOCTYPE note SYSTEM "three.dtd">
<geography> <state>Maharashtra</state> <particle> <state>gas</state> </particle> </geography>
|
In the above example, the two ‘state’ elements are ambiguous. To resolve this conflict, use namespaces as below:
one.dtd: <!ELEMENT state (#PCDATA) > <!ATTLIST state type (solid | gas) #REQUIRED> <!ATTLIST particle xmlns CDATA #FIXED "www.itechconsulting.co.in/particle">
two.dtd: <!ELEMENT state (#PCDATA) > <!ATTLIST state type (Maharashtra | Gujarat | MP) #REQUIRED> <!ATTLIST geogstate xmlns CDATA #FIXED "www.itechconsulting.co.in/geogstate">
three.dtd: <!ENTITY % onedtd SYSTEM "one.dtd" > %onedtd; <!ENTITY % twodtd SYSTEM "two.dtd" > %twodtd;
xml document: <?xml version="1.0"?> <!DOCTYPE note SYSTEM "three.dtd">
<geography> <geogstate:state xmlns:geogstate="www.itechconsulting.co.in/geogstate"> Maharashtra </geogstate:state>
<particles> <particle:state xmlns:particle="www.itechconsulting.co.in/particle"> gas </particle:state>
</particles> </geography>
|
12) XML Data-binding:
Xml data-binding is used to draw xml data from an XML Data Island and present it on the browser. It is a process of nailing a webpage to an xml data island (xml data). The data island can be internal or an external one. Internal Data Island is a part of the html webpage. Employing external data island helps in data separation from its presentation thereby eases the maintenance. This process does not involve any scripting and is unlike the transformation process.
To create data islands in html document, html element ‘xml’ is used that embeds xml in html. This is achieved as follows:
I) Implementing External Data Island:
1. External xml document: contacts.xml – sample xml document.
|
<?xml version=”1.0”?> <contact> <name>todd</name> <age>27</age> <sex>male</sex> <_address>texas</_address> <phone>7777777777</phone> </contact>
|
2. Create external data island to html page using ‘xml’ html tag:
| <xml id=”contacts” src=”contacts.xml” |
The xml tag in html helps in creation of XML Data Island in html. The ‘id’ attribute creates Data Island (named “contacts” in our example. The ‘src’ attribute specifies the xml document file location.
3. Embedding xml into html. Html: contacts.html
The HTML page demonstrates xml data binding. Assuming contacts.xml and contacts.html resides in the same directory/folder, the ‘src’ attribute of xml tag does not mention the path of the .xml file. The xml data is presented in an html table using xml tag, datasrc (data source) pointing to the data island, and the dataFld attribute specifying the element name in the xml document.
<html> <body> <xml id="contacts" src="contacts.xml"></xml>
<font face="verdana" size="2"> <table datasrc="#contacts" border="1"> <th>Name</th> <th>Age</th> <th>Sex</th> <th>Address</th> <th>Contact No.</th> <tr> <td><span datafld="name"></span></td> <td><span datafld="age"></span></td> <td><span datafld="sex"></span></td> <td><span datafld="_address"></span></td> <td><span datafld="phone"></span></td> </tr> </table> </font>
</body> </html>
|
II) Implementing Internal Data Island:
<html> <body>
<xml id="contacts"> <?xml version="1.0"?> <contacts> <name>todd</name> <age>27</age> <sex>male</sex> <_address>texas</_address> <phone>7777777777</phone> </contacts> </xml>
<font face="verdana" size="2"> <table datasrc="#contacts" border="1"> <th>Name</th> <th>Age</th> <th>Sex</th> <th>Address</th> <th>Contact No.</th> <tr> <td><span datafld="name"></span></td> <td><span datafld="age"></span></td> <td><span datafld="sex"></span></td> <td><span datafld="_address"></span></td> <td><span datafld="phone"></span></td> </tr> </table> </font>
</body> </html>
|
References:
1. http://www.w3schools.com/xml/default.asp - XML tutorial
2. http://www.zvon.org/xxl/XMLTutorial/General/book.html - XML Tutorial with examples
3. http://www.xml.org - Community focusing on various xml topics
4. http://www.topxml.com - Provides great bunch of xml help