Channels ▼


XML, SQL, and C

Source Code Accompanies This Article. Download It Now.

Using xmlToSql

xmlToSql converts an XML file into a directory full of tab-separated files and SQL statements for loading a relational database. The program is fast and efficient. On a 334 megabase XML dump I recently received from a gene imaging project, xmlToSql ran in 59 seconds. The tab-separated output size was a mere 23 megabases. xmlToSql's input is an XML file, and DTD and .Stats files produced by autoDtd on the same XML file. xmlToSql does need to be run on a machine with enough RAM to store the entire output database in memory. Because the output is so much smaller than the input, this generally should not be a problem.

Like autoXml, xmlToSql converts XML tag names into mixed-case names legal in both C and SQL. If necessary, it will also rename tag names that correspond to reserved words in either language. If the conversion results in a name that is already used, the program appends successive integers to the name until it is unique.

In general, xmlToSql maintains a one-to-one relationship between tables and tags, and between columns and attributes. There are a few exceptions. xmlToSql needs each table to contain a field it can use as a primary key. If there is an integer attribute or a string attribute 12 characters or less long, and this attribute is unique in each instance of the tag, xmlToSql uses it as the primary key. Otherwise, xmlToSql creates a new integer field ID to use as the primary key. xmlToSql also creates a "foreign key" field in a parent table to link to the child table. This foreign key has the same name as the child table when possible. This field contains 0 or the empty string to indicate a missing optional child. If the parent can have more than one child of this type, instead of a child field in the parent, the program creates a parentToChild table. This table has one field for the parent and one for the child.

The other exception to the one-to-one rule is for nonattributed tags that surround short stretches of text. Rather than creating a separate table for these, xmlToSql treats them as attributes of the parent. By default, tags surrounding text no more than 32 bytes long are turned into attributes, but this is adjustable from the command line.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.