Click here to Skip to main content
15,875,017 members
Articles / Web Development / XHTML
Article

XHTML, from an HTML starting point

Rate me:
Please Sign up or sign in to vote.
4.85/5 (20 votes)
29 Jan 20026 min read 215.9K   610   41   38
A starting point for people familiar with HTML who want to start using XHTML

Summary

This article will help people familiar with HTML to start producing XHTML 1.0 compliant documents. This articles approach is very simple, as is XHTML. However it is this very simplicity, of XHTML, which baffles some web developers and designers.

This article is not an indepth look into XML or why XHTML has replaced HTML.

Requirements

A basic understanding of HTML and CSS is recommended for this article. No JavaScript, ASP or XML knowledge is required. Though XML knowledge will help in understanding the XHTML approach and reasons for it.

Why change to XHTML?

In this article I won't be going into the whys of using XHTML or the benefits involved. That will be a topic for a later article. However if you want some good reasons to use XHTML then check these links out:

Won't XHTML break my sites in visitors browsers?

No, put simply. XHTML is very backwards compatible and a page coded using XHTML 1.0 Transitional will work in all browsers that support HTML 4.01. The W3C have done a very good job of moving web documents closer to XML but without breaking compatibility or sending more web developers over the proverbial cliff.

HTML vs. XHTML Examples

So you want to get started in either creating new XHTML compliant documents or converting your current HTML 4.01 documents into XHTML 1.0 documents. Lets start with some actual HTML vs. XHTML, and then move onto the differences in point form.

An HTML document

HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
    <head>
        <title>HTML to XHTML Example: HTML page</title>
        <link rel="Stylesheet" href="htmltohxhtml.css" type="text/css" media="screen">
        <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
    </head>
    <body>
        <p>This is the HTML page. It works and is encoded just like any HTML page you    
         have previously done. View <a href="htmltoxhtml2.htm">the XHTML version</a> of 
         this page to view the difference between HTML and XHTML.</p>
        <p>You will be glad to know that no changes need to be made to any of your CSS files.</p>
        <hr>
        <h1>Standards</h1>
        <p>Standards are important for, and this is only one reason, the simple fact that with a 
         standardised web you will only have to code your site once and it will work on all 
         browsers, on all platforms and on all devices.</p>
        <p>Following are some useful web standards links.</p>
        <h2>Useful Links</h2>
        <table cellpadding="0" cellspacing="0">
            <tr class="tblheader">
                <td>Name</td>
                <td>Link</td>
            </tr>
            <tr>
                <td class="tbldata">Web Standards Project, WASP</td>
                <td class="tbldata"><a href="http://www.webstandards.org">webstandards.org</a></td>
            </tr>
            <tr>
                <td class="tbldata">The W3C</td>
                <td class="tbldata"><a href="http://www.w3c.org">w3c.org</a></td>
            </tr>
            <tr>
                <td class="tbldata">XHTML, HTML Validator</td>
                <td class="tbldata"><a 
                 href="http://www.nypl.org/styleguide/">nypl.org/styleguide/</a></td>
            </tr>
            <tr>
                <td class="tbldata">New York Public Library Style Guide</td>
                <td class="tbldata"><a 
                 href="http://validator.w3.org/">validator.w3.org/</a></td>
            </tr>
            <tr>
                <td class="tbldata">Standards Evangelist, Paul Watson</td>
                <td class="tbldata"><a 
                 href="mailto:paulmwatson@email.com">paulmwatson@email.com</a></td>
            </tr>
        </table>
        <hr>
        <p>
            <a href="http://validator.w3.org/check/referer"><img border="0" 
             src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!" 
             height="31" width="88"></a>
        </p>
    </body>
</html>
This is a well formed and valid HTML 4.01 Transitional document. You can validate it against the W3C HTML Validator Service.

An XHTML document

HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
        <title>HTML to XHTML Example: XHTML page</title>
        <link rel="Stylesheet" href="htmltohxhtml.css" type="text/css" media="screen" />
        <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
    </head>
    <body>
        <p>This is the XHTML page. As you can see the result between the two pages 
         is identical, even though one is in HTML 4.01 and the other is in XHTML 1.0. View 
         <a href="htmltoxhtml.htm">the HTML version</a> of this page to view the difference 
         between HTML and XHTML.</p>
        <hr />
        <h1>Standards</h1>
        <p>Standards are important for, and this is only one reason, the simple fact that 
         with a standardised web you will only have to code your site once and it will work 
         on all browsers, on all platforms and on all devices.</p>
        <h2>Useful Links</h2>
        <p>Following are some useful web standards links.</p>
        <table cellpadding="0" cellspacing="0">
            <tr class="tblheader">
                <td>Name</td>
                <td>Link</td>
            </tr>
            <tr>
                <td class="tbldata">Web Standards Project, WASP</td>
                <td class="tbldata"><a 
                  href="http://www.webstandards.org">webstandards.org</a></td>
            </tr>
            <tr>
                <td class="tbldata">The W3C</td>
                <td class="tbldata"><a href="http://www.w3c.org">w3c.org</a></td>
            </tr>
            <tr>
                <td class="tbldata">XHTML, HTML Validator</td>
                <td class="tbldata"><a 
                  href="http://www.nypl.org/styleguide/">nypl.org/styleguide/</a></td>
            </tr>
            <tr>
                <td class="tbldata">New York Public Library Style Guide</td>
                <td class="tbldata"><a href="http://validator.w3.org/">validator.w3.org/</a></td>
            </tr>
            <tr>
                <td class="tbldata">Standards Evangelist, Paul Watson</td>
                <td class="tbldata"><a 
                  href="mailto:paulmwatson@email.com">paulmwatson@email.com</a></td>
            </tr>
        </table>
        <hr />
        <p>
            <a href="http://validator.w3.org/check/referer"><img border="0" 
             src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0!" 
             height="31" width="88" /></a>
        </p>
    </body>
</html>    
This is a well formed and valid XHTML 1.0 Transitional document. You can validate it against the W3C HTML Validator Service.

The Differences

Frankly the difference between HTML 4.01 and XHTML 1.0 is almost laughable. Don't think your are missing something important just because it is so easy, you aren't, because it really is very easy. I will list the differences and then explain each one in detail:

  • DOCTYPE reference has changed
  • xmlns reference in the HTML tag
  • All tags in lowercase
  • Valid structure
  • Attribute quotes are mandatory
  • "Empty" tags must be closed now
That is it, nothing very earth shattering at all. Lets get into the details.

DOCTYPE

Naturally from HTML 3 to HTML 4.01 your DOCTYPE changed. Similarly from HTML 4.01 to XHTML 1.0 your DOCTYPE must change.

What is a DOCTYPE? It is a declaration at the top of your document. A DOCTYPE, simply put, is a declaration of what standard or specification the web browser should use to interpret the web document. You are telling the web browser that what follows conforms with a certain specification, e.g. XHTML or HTML 4.01. The web browser can then take advantage of this knowledge. It is becoming very important for you to use a DOCTYPE declaration and in fact it is mandatory for XHTML 1.0. If you don't put it in then XHTML 1.0 compliant browsers will not render your page at all.

If you are writing ASP pages then put the DOCTYPE just under the <%@ Language=VBScript %> declaration. Essentially the clients web browser must see the DOCTYPE on the first line of the web document.

An HTML 4.01 DOCTYPE looks like this: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

So for your XHTML documents simply put <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd"> at the top of your page.

xmlns

The xmlns, or XML NameSpace, declaration simply tells the browser, once again, to use the XHTML specification located at W3C. This declaration is carried over from the XML specification and has no correlation in HTML 4.01. People familiar with VML will recognise this usage.

You should locate xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" in the HTML tag, like so:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

All tags in lowercase

Since XHTML is a valid XML specification it is case sensitive. This means that <STRONG> is not the same thing as <strong>.

What this all means to you is that from henceforth you should put all tags and attributes in lowercase, not a mix or just uppercase.

*On this topic: As with the English language there are exceptions to every rule. In this case ensure that your DOCTYPE declaration has DOCTYPE in uppercase. If you don't, then it is not valid and the browser or validator won't pick the declaration up. I found this out the hard way :)

Valid structure

A lot of web developers create invalidly structured HTML, I know I used to. For instance this snippet:

<p><b>This is invalid</p></b>
is not valid because the paragraph tag is closed inside the strong tag, while the strong tag is opened inside the paragraph tag. However HTML 4.01 lets you off without even a warning.

XHTML 1.0 however will crack down on this and your web document will not be valid. To be valid you should maintain a valid structure, like so:

<p><b>This is invalid</b></p>

Mandatory attribute quotes

Attribute quotes are the quotes around the value of an attribute. For instance the src attribute of an image must have its value surrounded by quotes, like so: src="images/bob.gif"

Culprits like Microsoft Visual Interdev do not put quotes around attribute values and web browsers allow this (though Netscape can sometimes get confused, as it is wont to do.) XHTML compliant browsers will not render your document if you do not use quotes. Single quotes btw do not count.

So for XHTML 1.0 never do <p style=font-weight: bold>Where are your quotes?</p> but rather do <p style="font-weight: bold">Ahhh, there they are!</p>

Close "empty" tags

An empty tag is a tag such as <img> or <br>. Essentially it is a tag without a closing tag.

Because XHTML is a specification of XML all tags must be closed. Either by <p>closed</p> or by <p />.

So for XHTML all you need to do is make sure you put a / before the closing bracket of any empty tags.

It must be noted that you should also put a space inbetween the / and the rest of the tag's attributes, like so <img src="images/bob.gif" width="50" height="50" alt="Bob, cavorting" />. The reason for this is that Netscape will definitley fall over if you put the / in without a space.

Wrapping Up

Yes, I am dead serious. That is all there is to it.

Remeber to use a DOCTYPE, put in your xmlns, use lowercase for attributes and tags, always use valid structure, put attribute values in quotes and always close empty tags. Once you do that, you are well ahead of the curve and preparing your web documents for the promises of XML.

Please note that this article is based on the Transitional XHTML spec and no the Strict spec. The reason for this choice is that the Strict spec is nowhere near as backwards compatible as the Transitional spec.

So XHTML is really simple and really only involves a bit more dedication and concentration from web developers. If you want another article on the why of XHTML please write to me and I will do it.

I learnt XHTML through zeldman.com and the incredibly to-the-point New York Public Library Style Guide.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer Caliber AI
South Africa South Africa
My name is Paul Watson and I have been a professional web-developer since 1997. I live and work remotely from Cape Town, South Africa.

I have many years of experience with HTML, CSS, JavaScript, PostgreSQL, and Ruby on Rails. I am capable in Python and Machine Learning too.

Currently I am the CTO of CaliberAI. Formerly I worked with Kinzen (CTO & co-founder), Storyful (CTO, acquired by News Corp), FeedHenry (co-founder, acquired by Red Hat), and ChangeX.

Now that you know a bit about me why not say hello.

Comments and Discussions

 
GeneralMy vote of 5 Pin
BobbyWD30-Jul-11 4:11
BobbyWD30-Jul-11 4:11 
GeneralWhy xhtml i NOT so great! Pin
Jakub Nadulski31-Jan-08 10:42
Jakub Nadulski31-Jan-08 10:42 
GeneralRe: Why xhtml i NOT so great! Pin
Paul Watson31-Jan-08 10:55
sitebuilderPaul Watson31-Jan-08 10:55 
GeneralRe: Why xhtml i NOT so great! Pin
Jakub Nadulski31-Jan-08 11:13
Jakub Nadulski31-Jan-08 11:13 
GeneralRe: Why xhtml i NOT so great! Pin
Paul Watson31-Jan-08 11:21
sitebuilderPaul Watson31-Jan-08 11:21 
GeneralRe: Why xhtml i NOT so great! Pin
Jakub Nadulski31-Jan-08 11:55
Jakub Nadulski31-Jan-08 11:55 
GeneralRe: Why xhtml i NOT so great! Pin
Paul Watson31-Jan-08 12:02
sitebuilderPaul Watson31-Jan-08 12:02 
GeneralGreat points Pin
Jamie Nordmeyer29-Dec-07 8:26
Jamie Nordmeyer29-Dec-07 8:26 
GeneralLet's try then... Pin
KaЯl27-Jan-05 9:24
KaЯl27-Jan-05 9:24 
GeneralWoohoo! Pin
Rohit  Sinha16-Nov-02 9:09
Rohit  Sinha16-Nov-02 9:09 
GeneralRe: Woohoo! Pin
Paul Watson17-Nov-02 4:51
sitebuilderPaul Watson17-Nov-02 4:51 
GeneralRe: Woohoo! Pin
Rohit  Sinha17-Nov-02 6:58
Rohit  Sinha17-Nov-02 6:58 
GeneralRe: Woohoo! Pin
Paul Watson17-Nov-02 7:18
sitebuilderPaul Watson17-Nov-02 7:18 
GeneralRe: Woohoo! Pin
Rohit  Sinha17-Nov-02 7:34
Rohit  Sinha17-Nov-02 7:34 
GeneralRe: Woohoo! Pin
Paul Watson17-Nov-02 7:47
sitebuilderPaul Watson17-Nov-02 7:47 
GeneralRe: Woohoo! Pin
Rohit  Sinha17-Nov-02 8:38
Rohit  Sinha17-Nov-02 8:38 
GeneralI'm voting 5, and here's why.... Pin
Barry Lapthorn16-Nov-02 4:54
protectorBarry Lapthorn16-Nov-02 4:54 
GeneralRe: I'm voting 5, and here's why.... Pin
Paul Watson17-Nov-02 4:44
sitebuilderPaul Watson17-Nov-02 4:44 
GeneralRe: I'm voting 5, and here's why.... Pin
Barry Lapthorn17-Nov-02 4:50
protectorBarry Lapthorn17-Nov-02 4:50 
QuestionHow about javascript etc.? Pin
6-Mar-02 6:51
suss6-Mar-02 6:51 
AnswerRe: How about javascript etc.? Pin
Paul Watson13-Mar-02 6:00
sitebuilderPaul Watson13-Mar-02 6:00 
Anonymous wrote:
Does that remain the same?

JavaScript is a seperate standard and is not affected by XHTML. You just have to ensure your SCRIPT tags follow XHTML standards however.


Anonymous wrote:
And a note on the advantages would be nice, since if (almost) nothing changed, why bother with it?

If you read the links provided in the article there are some pretty obvious benefits to using XHMTL. The bottom line is that by using XHTML you make your HTML pages XML compliant which will in the future aid moving to a pure XML/XSL web.

regards,
Paul Watson
Bluegrass
Cape Town, South Africa

"The greatest thing you will ever learn is to love, and be loved in return" - Moulin Rouge

Sonork ID: 100.9903 Stormfront
AnswerRe: How about javascript etc.? Pin
Barry Lapthorn16-Nov-02 4:51
protectorBarry Lapthorn16-Nov-02 4:51 
GeneralRe: How about javascript etc.? Pin
Graham Nimbley11-Apr-06 7:26
Graham Nimbley11-Apr-06 7:26 
GeneralXHTML :-) Pin
Nish Nishant1-Feb-02 4:32
sitebuilderNish Nishant1-Feb-02 4:32 
GeneralRe: XHTML :-) Pin
Paul Watson4-Feb-02 1:51
sitebuilderPaul Watson4-Feb-02 1:51 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.