Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼

XML Abuse

Tool Abuse
It's said that a craftsman can abuse a tool in at least five different ways. XML confirms this adage: I've witnessed my fair share of XML bloopers and compiled a list of annoying, pointless or inefficient ways to employ the language. While some of these designs may allow the architect to claim "buzzword compliance," they're unlikely to assist in the development effort.

XML + Notepad = IDE
I can't count the number of vendor presentations I've attended where the answer to flexibility and configurability was "This is all driven by an XML file." We know that one of the strong points of XML is human readability, but may forget that human readability does not imply human writability! Even the W3C reminds us that "XML files are text files that people shouldn't have to read, but may when the need arises." (www.w3.org/XML/1999/ XML-in-10-points.html.en)

We're starting to see some improvement in metadata-aware XML editors (as long as you have a DTD or a Schema), but in most cases, the desperate attempts to make any meaningful updates to such files still result in the copy-paste methodology: Find a piece of XML that looks similar to what you want to do, copy it, make a few changes and see what happens.

Extra, Extra Large
XML-based configuration files do have advantages. They integrate well into source-control systems, can be converted into HTML documentation via XSL and—if need be—can be inspected by humans. It's a shame, then, that many XML configuration files come only in one size: Extra, Extra Large. A single XML file consisting of thousands of lines eliminates most of the potential benefits by making version control and concurrent editing by multiple developers impossible. Do we store all our Java code in a single file?

Loose Cannons
Loose coupling is a valuable architectural principle. Loosely coupled systems make fewer assumptions about each other and can be implemented in different languages or on different platforms. Due to its platform and language independence, XML data exchange supports loosely coupled architectures. However, the various architectural advantages of loose coupling can turn into development disadvantages. If all my methods simply take a string argument, which is supposed to contain an XML document, it's clear that I won't ever have to change the syntax of my function calls. Sounds good, right? Well, maybe not. The reason I won't have to change the syntax of my methods is that I decoupled syntax and semantics. The method signature (syntax) tells me nothing about the meaning of the method (semantics) or what data I'm supposed to pass. If I'm lucky, I can look up the data format in some cryptic DTD or I get an example XML document. Either way, I lose any compile-time validation—if I pass an invalid document, I won't find out until runtime, when I get an obscure error message or things simply don't work.

With explicit, strongly typed value (or transfer) objects, on the other hand, my IDE can offer me a drop-down list of all the properties and methods as I type (nice!). Better yet, I can define custom types and constraints for each field (the semantics!). If I try to pass invalid data types, my compiler will warn me before my code goes into testing.

Loose coupling has its place in enterprise architecture. However, consider the trade-offs—I don't have to loosely couple every object I call within my application. If you're working with an XML data interchange across systems, consider using XML data binding frameworks such as JAXB (http://java.sun.com/xml/jaxb) or Castor (http://castor.exolab.org) to create strongly typed objects that represent the XML document.

Integration is a hot topic these days. Most applications that want to sport the label of an "enterprise" application must offer some form of integration. All too often, this integration takes the form of "We can receive XML data—so we can interface with any system." I liken this to the use of the Roman alphabet: Just because I type this article in characters of that script doesn't mean that every person in the Western hemisphere can actually read and understand what I write—many languages use the same characters. XML is similar: It solves many issues related to data representations, but some of the stickiest problems in integration are structural transformations and semantic transformations (comparable to having to translate this article into Danish). To be fair, XML wasn't meant to address all these problems, so let's stop pretending that it does. On the upside, XSL helps us quite a bit in implementing transformations, but there's little doubt that integration and transformation remains a difficult problem that's usually solved not by XML magic, but by plain, old hard work.

Metadata = More Than Data
One of the great boons of modern programming languages such as Java, C# or Smalltalk is the ability to use reflection, allowing programs access to other objects' and classes' metadata, such as a list of all methods and the parameters they accept. Many development environments, compilers and linkers (yep, my age is showing) use this feature to make the programmer's life easier. In the case of XML data structures, the metadata is defined separately in form of Document Type Definitions (DTDs) or XML Schemas.

Metadata is a critical part of XML. XML documents without associated DTD or Schema are not very useful for developers—we can't be sure if our documents are valid or which constraints apply. Again, too often, we rely on the "interface contract by example" method: If your XML looks like this, it's probably valid.

Bruiser Interface
The UI-agnostic application is one of the most publicized uses of XML. As long as our data is represented as XML documents, we'll be able to support new user interfaces such as cell phones, PDAs, speech synthesis, brain waves, what have you. I actually render my website (see www.enterpriseintegrationpatterns.com) from XML source documents and am working on rendering PDF files from the same source. XML does a great job there, but then I'm dealing with written documents, not an interactive online application.

I also render to quite similar presentation media (HTML and PDF). Rendering a complete, interactive user interface to a variety of devices generally requires a user interface redesign to ensure ease of use, plus some heavy-duty transformation work. Once the requirement for multiple access paths becomes real (for example, via Web services support), it makes sense to evaluate the use of XML. Until then, carefully weigh the trade-offs between XML and native data representation.

Not Everything Is a Nail
With a tool as widely applicable as XML, let's focus on using it where it makes sense. In most cases, "XML everywhere" is not the best choice. And when the next software application vendor tells you that "We have an XML file," tell them to start reading Software Development!

Gregor Hohpe is a senior architect with ThoughtWorks, an Internet systems integrator and consulting company. His current interests include agile methodologies and patterns of enterprise integration. Reach him at [email protected].

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.