Channels ▼


, January 01, 2002 I Learned this Summer: XHTML

As summer begins, so does our next major topic: XHTML. This latest member of the standard HTML family tree offers some features and a number of nits and bothersome details. As we'll see in the coming weeks, you already know most of XHTML, and absorbing the rest will be easy.


To understand XHTML, you must first know a bit about XML, the eXtensible Markup Language.

As we all know by now, HTML was cobbled together over the years with a less-than-rigorous eye towards standards. As it matured, HTML was firmed up and standardized, so that the latest 4.01 standard is fairly consistent and well-defined using SGML, the Standard Generalized Markup Language. SGML is very rich, very powerful, and very difficult for mere mortals to use. As a result, it is not a good tool for most people to create new markup languages or to define extensions to HTML.

HTML has its shortcomings, and lots of people would like to extend HTML to handle new kinds of markup, like chemical formulae or musical notation. With SGML so difficult to use, the W3C has created a subset of SGML known as XML, The Extensible Markup Language. XML keeps the best features of SGML, drops the hard and confusing stuff, and is intended to be the standard language for describing new kinds of HTML-like markup languages.

XML has a special syntax that is used to define the elements of a markup language: the beginning and end tags, the attributes, and the correct way to arrange these elements. You place these rules in a Document Type Definition (DTD). You can use the elements in a DTD to create a document, and a browser (or"processing agent," as the W3C likes to say) can use the DTD to figure out how to parse and process your document.


While you can use XML to create almost any kind of markup language, the first important task using XML at the W3C was to rewrite the HTML 4.01 standard using XML. The resulting markup language is known as XHTML 1.0, the Extensible Hypertext Markup Language. XHTML 1.0 is very similar to HTML, with a few notable differences that we'll cover in the coming weeks.

Is XHTML important? Yes. Clearly, there are billions of existing Web pages that do not, and never will, conform to the XHTML 1.0 standard. Most of these pages don't even conform to the HTML 4.01 standard! To be honest, no one has the time or inclination to convert all these pages to XHTML, especially if the end user will never be able to tell the difference.

In spite of all those legacy HTML pages, there are many more pages yet to be created. There is no reason why those pages cannot be created using XHTML, especially if they are built using authoring tools that emit XHTML automatically. While browser support for XHTML is nonexistent at this point, that will change over time, making XHTML a more attractive Web authoring language. What's more, XHTML syntax is so similar to HTML that browsers have little difficulty reading XHTML documents. And, what they don't understand, they ignore, which allows you to put XHTML 1.0 to use today.

The bottom line is that XHTML is here to stay. All future versions of HTML will be built upon the XHTML 1.0 standard, making HTML 4.01 the end of the line for the original language of the Web. A good Web author will learn XHTML now, understand the differences between XHTML and HTML, and design pages that are compatible with both standards whenever possible. Your easiest path to all this new knowledge is right in this column for most of the summer. Get ready to start next week, as we cover the basic syntax of every XHTML document.

Chuck is the author of the bestselling HTML: The Definitive Guide. He also writes on a variety of Internet and Web-related topics for a number of online magazines.

Previously in Tag of the Week:

Controlling Element Properties - the "box model" for managing space around elements.

Using the Color Property - using the "color property" to engage your visitors.

Condensing Background Properties - how to use the single background property.

Custom Bullets -Get ready, aim, and fire off a few rounds of custom bullets.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.