This article assumes that you have at least a fair idea of HTML
By now, you must have a fair idea of what XHTML is and why it should be adopted. If not, read Paul Watson’s article on XHTML, from an HTML starting point.
Most articles on the web are currently authored in HTML, and not very good HTML at that. So, how do you go about transforming poorly written HTML into XHTML?
This was the question I was faced with while doing a redesign of a Federal government website. The website was huge (40,000 pages) and I was working on the top 30 page templates. This is what I learnt from this challenging and exciting project:
The DOCTYPE is the SGML declaration of the version of the current document. The DOCTYPE should be the first line of your XHTML document unless you use an xml declaration such as the one below to specify character encoding, in which case the DOCTYPE should immediately follow the xml declaration. The default character encoding is UTF-8 or UTF-16.
There are three types of Document Type Definitions or DTDs: Strict, Transitional and Frameset.
The Strict DTD should be referenced when the markup in the document follows all HTML rules to the letter.
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
The Transitional DTD is the best to use when moving from HTML to XHTML.
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
The Frameset DTD is referenced when the page layout uses frames.
PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
For the document to be a valid (more on that later) XHTML document, it is necessary to specify the namespace – browsers use the default in the absence of a specified namespace.
Change your opening html (<html>) tag to look like this:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
There is no change to the closing html (</html>) tag.
You may be used to writing your HTML tags in uppercase to differentiate the tags from your data, but XHTML requires that all tags be written in lowercase. XHTML is case-sensitive, so
<HTML> is not considered the same as
<html> which is the correct way of writing it.
We know that in HTML, certain tags have to be closed and others like the
<img> tag, don’t. In XHTML, all tags have a closing tag. Tags like
<br> (empty elements) should now be written as
<br /> with a space between the tag and the forward slash and no space between the forward slash and the angle bracket. The space between the tag and the forward slash allows backward compatibility with older browsers. Tags with data such as the
<p> tag can be closed by using the
</p> tag. The img tag is considered as an empty tag and is closed as below:
<img src=”myimage.jpg” alt=”The author with the Pope” />
That means the innermost tag should be closed first, followed by the one opened before it, and so on.
<p> <b> This is wrong </p> </b>
It should be correctly written as:
<p> <b> This is right </b> </p>
In XHTML, it is mandatory that all attribute values are listed within quotes. This means the following is wrong.
<img src=myimage.jpg />
The correct way to do it is as below. Also, for every image on the screen (except for spacers and non-informative, decorative images), it is a good practice to have meaningful, interpretive text listed in the alt tag.
<img src="myimage.jpg" alt="The author with the Pope" />
The name attribute is being deprecated and the id attribute is taking its place. However, for backward compatibility, it is best to leave in the name attribute and to add the id attribute. The id attribute should be added wherever name attributes are used, such as the
a, applet, form, frame, iframe, img, and
tags. The syntax of the id attribute is similar to that of the name attribute and is written as
Now that you are done, how do you know that you got it all right? That’s where validation comes in – remember I said I would talk about it later? Now that you have an XHTML document, it is time to check that it really follows all the rules listed above. One way to do this is to go over the entire document and look at each and every line of code. Two things – a) This is boring and, b) It doesn’t guarantee anything.
So, the next best thing to do is to use a validator, such as the one provided by the W3C at http://validator.w3.org. You may upload a file from your local machine or provide the url to be validated.
Another easy way to do it is to save the file as an xml file – filename.xml and to open it in your browser. If it displays correctly, it is valid XHTML. If not, you will be able to tell immediately where the problem lies. There you go – its so simple, you have no excuse to put it off anymore.