Channels ▼

Our Last, Best Hope for Clean Code XHTML: Our Last, Best Hope for Clean Code

Rank: 2

What is XHTML?

•XHTML is a stricter, cleaner version of HTML.

•Why do we need cleaner code? Current browsers are bloated with code to handle sloppy or proprietary HTML. As more devices read web content (handhelds, set-top boxes, pagers), we'll need more compact browsers.

•XHTML documents are all lowercase.

•All tags, including empty elements, must be closed.

XML is a hot topic among web developers. Its promise of a standardized markup that separates display and layout code from syntax really hits home if you've ever experienced the frustration of parsing loose code.

But that's just one example of how a strict, standardized markup standard can make programming easier. As we watch the growing trend of portable web-enabled devices, we realize they will require a compact code using standardized markup. Getting to that point, however, isn't going to be easy.

Whether your site has 10 pages or 10,000, it's likely that the HTML code is a mix of standard HTML and browser-specific, proprietary markup. If you've been thinking about making the transition to XML, or even just standardizing your HTML code, here's the solution: XHTML (Extensible Hypertext Markup Language).

XML + HTML = XHTML (sort of)

Let's take a quick look at how these markup languages fit together.

  • HTML is a markup language described in SGML (Standard Generalized Markup Language).
  • XML is a restricted form of SGML, removing many of SGML's more complex features, but preserving most of SGML's power and commonly used features.
  • XHTML is the reformulation of HTML 4.0 as an application of XML.

The W3C (World Wide Web Consortium) has taken the logical step of expressing the HTML 4.0 standard in XML instead of using the more complicated SGML.

The minute details aren't too important for the average web coder, but the main difference is found in the document type definitions (DTDs) used by HTML and XHTML. A DTD, according to the W3C, is "a collection of declarations that, as a collection, defines the legal structure, elements, and attributes that are available for use in a document that complies to the DTD."

In other words, it's a definition of what is legal syntax in HTML (or XHTML) and what isn't. The DTD for XHTML is more restrictive than the DTD for HTML because XML is more restrictive than SGML.

The W3C gives two main reasons for recommending XHTML as the next step from HTML 4.0. First, XHTML, since it's an XML application, is designed to be extensible—that's the "X" in all the acronyms. This means that new tags or "elements" in the official W3C jargon can be added without altering the entire DTD that the document is based on.

Second, XHTML is designed for portability. web browsers have become behemoths of code bloat. You name it, there's code in the newest browsers to do it. But according to some estimates cited by the W3C, by 2002, 75 percent of web document viewing will be through non-desktop devices like palm computers, televisions, toasters, and other alternative platforms, not through browsers on PCs. Your web-enabled toaster will have less room for bloated code, and its browser will need to be able to count on standards-based documents. It may not be able to display current incarnations of HTML because of the non-standard code involved.

To help you see the main differences between HTML and XHTML, we've included a number of examples in the following section, "Differences." You'll see that most of the variances are simply stricter definitions of common HTML tags.

There are, however, some new features, which we cover in the third section, "What's New."

XHTML Part 2: Differences
What are the differences when coding in XHTML? For many developers the changes are minor, that is, unless you love uppercase tags.

XHTML Part 3: What's New
XHTML does require a doctype declaration and won't allow open <br> tags. But the changes are less different than you think.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.