Channels ▼

Bil Lewis

Dr. Dobb's Bloggers

What is XML good for, anyway?

March 12, 2009

What is XML good for, anyway?

I've never quite figured out XML.

I mean, it's obviously a serialization format for objects, but there
are lots of serialization formats. Why should I care about this
particular one?

It's semi-human readable/writable.

For very small files, you can read it directly, but it's painful and
very inaccurate (ie, it's unlikely you'd notice a typo). There are
some editor packages that will do a little bit of formatting and error
checking for you, but they don't do much. And that's OK with me,
because I have no intention of writing them by hand.

So, what is it good for?

I don't see it as a persistance format. We have object databases and
object front ends to relational databases (eg, Hibernate) that work
quite well.

As a serialization format, it... isn't very impressive. The format is
very bulky and... I don't think there's anything else to be said. What
else do I care about?

In a vague sense it has a small advantage when dealing with different
programming languages. It'll be easier to debug than a binary format I
suppose. But I don't intend to get into the XML debugging business. I
just want to serialize my objects and read 'em back in.

The RMI serialization format works fine for me. I don't know anything
about it and I don't have to. And that is exactly what I want out of
my serialization functions. I never want to see the format. I just
want it to work, so I can spend my time analyzing Ribosomal genes.

With XML, I don't get the automatic generation of reader/writer
either. I have to go in and write it myself. Huh? The XML parsers I've
seen expect you to do string searches for keys, while in RMI I simply
access an instance variable.

String searches: for example I might write

Node node1 = doc.findNode("Tank");
Tank tank1 = Tank.convertXML(node1);


Tank tank1 = configuration.getBattalion(0).getTank(0);

This does not fill me with warm fuzzies.

I notice that XML is a big favorite in the configuration file world. I
don't quite understand why it's so popular there. And I don't quite
understand why they have so much configuration anyway.

I wrote an application server for a class I was teaching at Tufts a
couple of years ago. It required exactly ZERO configuation--no
configuration files, no annotations, nothing. You wrote your
application, jar'd it up, put in the server's application directory,
and it ran.

So I have no idea why Tomcat wants all that redundant (?) information.

But I'm getting off the point.


Let's say we do want a configuration file. Would XML be a good format
for it?

What do I want from a configuration file?

I want it to encode some data. We can think of it as being a single
object. (I actually parse my configuration files directly into
Configuration objects and then pull values from there.)

Do I want it to be human readable/writable?

If I never looked at it with a text editor, I wouldn't care.

But I do want to edit it directly. That's the whole point. I want to
be able to change values in the file. I may well have a pile of
similar files that I want to change in a uniform fashion and using
EMACS beats the heck out of some specialized configuration editor,
where I might have to edit each of my 600 configuration files

So, human readable/writable is valuable here.

Is XML the way to go?

Here's a typical configuration file I used for an AI testbed:


Dimensions: 500, 500
Map: /home/bil/maps/

# One King Tiger against 6 Shermans is about even.
BattalionCommander: Abrams
Tank: Sherman Tactics1 6
Tank: Grant Tactics2 10

BattalionComander: Rommel
Tank: PKW_4 Tactics1 5
Tank: King_Tiger Tactics6 1

The same information in XML would look about like this:


<?XML header stuff>

  <! One King Tiger against 6 Shermans is about even.>

So which one would you rather edit? 

(I trust you noticed there was one typo in each file?)

So if XML isn't good for configuration either, what is it good for?

Is there something major that I'm missing??


Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.