Using xmlToSql
xmlToSql converts an XML file into a directory full of tab-separated files and SQL statements for loading a relational database. The program is fast and efficient. On a 334 megabase XML dump I recently received from a gene imaging project, xmlToSql ran in 59 seconds. The tab-separated output size was a mere 23 megabases. xmlToSql's input is an XML file, and DTD and .Stats files produced by autoDtd on the same XML file. xmlToSql does need to be run on a machine with enough RAM to store the entire output database in memory. Because the output is so much smaller than the input, this generally should not be a problem.
Like autoXml, xmlToSql converts XML tag names into mixed-case names legal in both C and SQL. If necessary, it will also rename tag names that correspond to reserved words in either language. If the conversion results in a name that is already used, the program appends successive integers to the name until it is unique.
In general, xmlToSql maintains a one-to-one relationship between tables and tags, and between columns and attributes. There are a few exceptions. xmlToSql needs each table to contain a field it can use as a primary key. If there is an integer attribute or a string attribute 12 characters or less long, and this attribute is unique in each instance of the tag, xmlToSql uses it as the primary key. Otherwise, xmlToSql creates a new integer field ID to use as the primary key. xmlToSql also creates a "foreign key" field in a parent table to link to the child table. This foreign key has the same name as the child table when possible. This field contains 0 or the empty string to indicate a missing optional child. If the parent can have more than one child of this type, instead of a child field in the parent, the program creates a parentToChild table. This table has one field for the parent and one for the child.
The other exception to the one-to-one rule is for nonattributed tags that surround short stretches of text. Rather than creating a separate table for these, xmlToSql treats them as attributes of the parent. By default, tags surrounding text no more than 32 bytes long are turned into attributes, but this is adjustable from the command line.