INFO-LINK



Development Tools

Digital Libraries & XML-Relational Data Binding


April, 2005: Digital Libraries & XML-Relational Data Binding

Avoiding semantic hardcoding in XML-relational conversion

Rene is an associate professor and Venkata a computer science graduate student at Oregon State University. Brandon is a business analyst at the University of Washington. They can be contacted at reitsmar@bus.oregonstate.edu.


Digital libraries are digitally accessible, organized collections of knowledge. Although under this broad definition any digitally accessible data set might be considered a digital library, the term is generally reserved for collections whose structures are carefully documented and made available in the form of so-called "metadata." For any item in the library, structured summary information about the item—title, author(s), date of creation, date of last modification, size, ownership, copyright information, list of keywords, quality control information, or any other item of metainformation—might be stored and made accessible.

Such organization is, of course, what traditionally characterizes any sort of library, regardless of whether the contents are stored digitally. Since metadata can be efficiently generated from a digital source and likewise stored digitally, documents in digital libraries can be described with very rich sets of metadata. In turn, highly organized sets of such metadata facilitate the implementation of powerful and flexible search engines, which improve the likelihood that patrons will find the information they seek. Digital libraries also offer another advantage over traditional, material-based libraries—the catalogs of metadata as well as the items that the metadata describe, can be accessed from anywhere. Finally, unlike materials-based libraries, digital libraries do not have to physically store collections. Instead, collections can be distributed across the Internet, provided that metadata can be electronically harvested for inclusion in the libraries' metadata catalog.

National Science Digital Library: Dublin Core and OAI

Anticipating the rapid, world-wide growth of digital libraries, the U.S. National Science Foundation funds and guides the development of the National Science Digital Library (NSDL; http://www.nsdl.org/). The NSDL covers a wide variety of topics ranging from a collection of digital cartographic resources (Alexandria Digital Library, http://www.alexandria.ucsb.edu/), or a collection of animal diversity information (Animal Diversity Web, http://animaldiversity.ummz.umich.edu/site/index.html), to a collection of information on MIT theses (http://theses.mit.edu/). Our development group at Oregon State University is part of a collaboration including the University of Colorado at Boulder, Colorado School of Mines; Duke University; Worcester Polytechnic Institute; and the American Society for Engineering Education (ASEE), that is developing TeachEngineering, an NSDL-funded collection of standards-based, K-12 math and science curricula that employs problem solving and engineering as its core learning tools (http://www.teachengineering.com/).

Of the 405 collections currently registered to NSDL, only about 50 or so have been directly funded by the NSDL program. The reason that the remaining collections can still be registered with NSDL rests in the protocols used for publishing and collecting metadata. NSDL, for instance, supports the Dublin Core (DC; http://dublincore.org/) Standard for metadata and the Open Archive Initiative Protocol for Metadata Harvesting (OAIPMH; http://www.openarchives.org/) for respectively creating and disseminating a collection's metadata. An alternative, although not NSDL-supported, Standard for exposing metadata is the National Information Standards Organization's Z39.50 (http://www.niso.org/z39.50/z3950.html). XML representations of both the OAIPMH and Z39.50 protocols and a variety of open source OAIPMH and Z39.50 client/server implementations that generate ("expose") and consume ("harvest") these representations are readily available. Listing One is an XML fragment of Dublin Core metadata for a document in the TeachEngineering.com collection entitled "A House is a House for Me."

Searching Here, Searching There...

Although Standards such as OAIPMH and Z39.50 have made it relatively easy to expose and harvest a collection's metadata, collection developers must weigh the costs and benefits of exposing high-quality metadata for others to use in search engine implementations against implementing collection-specific, native search engines. Search engines of generic libraries, such as nsdl.org generally do not offer the same richness offered by collection-specific search engines. For instance, a TeachEngineering lesson document contains attributes such as a K-12 grade band cost estimates, group size, duration, and a set of state and national educational standards that the lesson supports. Accordingly, K-12 instructors may wish to search the TeachEngineering library for curricular materials designed for fifth-grade students that can support a group of 10 students, can be completed in under an hour, and are aligned with a specific educational standard. Clearly, even if all the necessary search parameters were made available through published metadata, the likelihood that a harvester of that metadata would make available a user interface and search engine that supports such a query is rather small. Of course, generic web search engines such as Google or Yahoo and even nsdl.org might be able to search documents using keywords, but such a search is a poor substitute for what library patrons often need. Consequently, many digital collections offer their own, native search engine and user interface, yet expose their metadata to others, so that their collections can be found and accessed through other libraries as well. This is the case also for TeachEngineering. As a result, the TeachEngineering infrastructure consists of seven integrated components that support a diverse range of data demands:

  • A document authoring facility that includes a document structure specification (XML Schema), a document authoring tool (Altova's Authentic), and a document repository (CVS).
  • A document spider (Java with JDOM and XPath [Jaxen]) for extracting searchable data from the documents and a relational database (MySQL) for storing data used by the (native) search facilities.
  • A metadata generator (PHP with DOM-XML) and an OAIPMH (Perl) server.
  • A web-based search engine (PHP with SQL).
  • A web-based document rendering engine (PHP with DOM-XML).
  • A web service for programmatic searching (PHP-SOAP).
  • Quality assurance and control tools.

XML-Relational Mapping

Following the practice of many digital library implementations, we chose to encode and store our documents using XML. XML effectively separates presentation from content, which ensures that data remain platform independent and flexible. Further, using XML Schema, one can specify flexible and adaptable data structures, which allow for the representation of a wide variety of phenomena. Finally, XML documents can be easily validated for syntactic compliance using XML DTDs or XML Schemas, which provide valuable quality control mechanisms for digital library collection items.

Although XML documents can be readily parsed with technologies such as SAX, Document Object Model (DOM), and XPath, searching a large collection of XML documents in real time, for instance by a search engine, is impractical. Hence, a common approach to quickly finding data originally stored in XML is to build a second, more efficiently queried representation of the information, such as a relational database, and use this second representation when searching a collection. The XML representation is then only used as the original source of the information and, if so desired, when rendering the item when requested by a library patron.

Conversion from XML to the relational model, however, can be problematic in that the conversion code typically contains the semantics of the XML model, those of the relational model, or both. Listing Two is pseudocode that provides an example of such semantic hardwiring. Listing Two contains both the names of the elements to be extracted from the source XML document—represented by the title and author strings—and the names of both the relational table (documentTable) and the columns in the table where these elements must be stored (titleData and authorData). Clearly, this approach is not ideal as it implies that every time the XML or relational schema changes, the code must be modified accordingly. Although one might correctly object that the likelihood of such changes is an inverse function of the care taken to design the schemas, it would still be rather nice if we could consider the XML-relational mapping rules as data consumed by the mapping program rather than embedding them into the program code itself.

In this article, we present a method that avoids this semantic hard coding entirely. Our approach to treating both the XML and relational semantics as data is to store both sets of information external to the mapping program; in our case, in a relational database. The mapping program retrieves this information as data and uses it to decide which elements to extract from the XML document and where and how to store them in the database.

Most forms of XML data manipulation involve programming specific methods using a variety of XML technologies including SAX, DOM-XML, JAXP, and JAXB. Of these, DOM-XML (http://www.w3.org/DOM/) is probably the most commonly understood. DOM requires that the XML document be loaded into memory as a tree, whereby every element of the document is treated as a node in that tree. XML documents can, of course, always be represented as trees because XML requires that document elements be properly nested, and that all documents have a single root node. Some nodes can contain children whereas others are terminal.

Unfortunately, implementing data extraction routines that exclusively rely on DOM implies the use of static methods. For example, consider Listing Three—a simplified version of a TeachEngineering Activity XML document activity.xml. The document's root element is activity. The activity element in turn contains four child element types: a title, an author, a pub_year, and two text_section elements. The text_section elements themselves each contain two text_block elements.

To fetch the text from the title element in this simple XML document using DOM only, you would use something like this:

title = xmlDocument.getRootNode().
getElementsByTagName("title").getChildNode(0).
getTextNode(0).getValue();

To fetch the second text_block of the first text_section, one would use:

title = xmlDocument.getRootNode().
getElementsByTagName("text_section").
getChildNode(0).
getElementsByTagName("text_block").
getChildNode(1).
getTextNode(0).getValue();

Hence, processing begins at the root of the document, and nodes have to be fetched in order.

Although less obvious, the aforementioned code still contains semantic hard-coding, albeit in the form of processing the DOM tree. XPath technology (http://www.w3c.org/TR/xpath/) solves this problem. XPath is a syntax for traversing and extracting information from XML documents using simple strings similar to file paths. For example, the XPath expression /activity/author would gather a list of all author elements in a document whose parent element is activity, which is the root node of the document, as denoted by the single / character. Because the sample XML document in Listing Three contains only one <author> element, XPath returns a list containing a single string with the value University of Colorado at Boulder. Although this is a simple example, the XPath syntax provides for the selection of any component or group of components in an XML document.

Unlike using static DOM methods, fetching a document's data can be parameterized when using XPath methods. Consider this abstract method in which the variable doc holds an XML document stored as a DOM tree:

values[] extractData
(Document doc, String XPathExpression) {
XPath expression = new XPathEvaluator
(XPathExpression);
return expression.evaluate(doc);
}

An evaluator object is created using an XPath expression as a string input parameter. The evaluator is then passed a reference to the XML document stored as a DOM tree in memory. The function always returns an array (values[]). Hence, we have a single function that can extract any data from any XML file provided that the appropriate XPath expression is passed in. Both the XML files and the XPath expressions used to evaluate the files can be loaded as data at runtime, thereby creating the ability to extract XML data dynamically.

Considering that we want to avoid hard coding the XML extraction methods, we would also like to avoid hard coding the database structure. Fortunately, SQL lets database tables be created and manipulated at runtime with statements such as CREATE TABLE and ALTER TABLE. To create a table, at a minimum, we need to specify the table name, a primary key, and a variable number of columns, where each column has a column name, a data type, and a flag indicating whether it is allowed to contain NULL values. For example, this SQL CREATE TABLE statement generates a table to hold some of the information for the simplified TeachEngineering Activity defined in Listing Three:

CREATE TABLE activity (
id INT(10) NOT NULL AUTO_INCREMENT,
title VARCHAR(200) NOT NULL,
author VARCHAR(100) NULL,
year INT(10) NOT NULL)

The first column, id, is a self-incrementing integer that serves as the primary key. The title column is a string with a length from 0-200 characters and which may not be NULL, and so on.

Finally, data can be entered into the database at runtime using SQL INSERT statements. For instance, this statement would enter the sample document from Listing Three into the activity table:

INSERT INTO activity (title, author, year)
VALUES ('A TeachEngineering Activity',
'University of Colorado at Boulder', 2004)

More important than the statements themselves, however, is the fact that the CREATE TABLE and INSERT statements can be built and executed dynamically using parameterized string manipulation functions. This, once again, prevents us from including the specifics of collection documents, be it their XML or their relational semantics, in the mapping code.

The first step in developing the XML-relational mapping is the specification of a relational metamodel for storing the information needed to build both the XPath data extraction expressions and the database to store the extracted data. Consider the definitions in Table 1 in relation to the structure of the XML document of Listing Three.

The types table, Table 1(a), stores information about specific document elements; for example, an activity, title, or text_section. The relation table, Table 1(b), stores information about the relationships between document elements. The values in the group and component columns of the relation table correspond to values in the name column of the types table. For example, an activity contains a title element, an author element, and one or more text_section elements. A text_section, in turn, contains a text_section_name attribute and one or more text_block elements. Where the values in the group and component columns of the relation table could use the id column of the types table for primary/foreign key relationships, the values in the name column are used to improve the readability of the relation table. The types and relation tables above represent a metamodel that contains the information needed to dynamically generate database tables, extract data from XML documents, and store that information in the database tables.

Data Extraction and Storage Algorithm

At this point, we have a relational metamodel that represents both the data to be extracted from the XML documents and the structure of the database into which these data will be stored. All that remains is the construction of a coding framework that uses the data stored in the types and relation tables to construct the relational database tables and then extracts and stores data in the tables.

Our algorithm consists of three distinct actions: table creation, data extraction, and data storage. We start by defining the following four classes:

  • XPathRootHandler handles entries in the types table with cast set to root.
  • XPathGroupHandler handles entries in the types table with cast set to group.
  • XPathExpressionHandler handles entries in the types table with cast set to string or number.
  • DataManager is the class that oversees the process of instantiating objects of the aforementioned classes and invoking their methods.

Processing the document in Listing Three begins with a SQL query in DataManager for all entries in the types table that have their cast set to root. For each entry returned, an instance of the XPathRootHandler class is instantiated.

Query:
SELECT name, expression
FROM types WHERE cast = 'root'
Result:
name: 'activity'
expression: '/activity'
Objects:
XPathRootHandler('activity', '/activity')

Following this first (activity) case, XPathRootHandler creates an instance of the XPathExpressionHandler class for each entry in the types table where the group column of the relation table equals activity and the cast column of the types table equals string or number.

Query:
SELECT name, expression,
types.nullable as nullable, cast, datatype
FROM relation, types
WHERE (relation.component = types.name)
AND (relation.group = 'activity')
AND ((types.cast = 'string')
OR (types.cast = 'number'))
Result:
name: 'title'; expression: '/title''; nullable: 'no';
cast: 'string'; datatype: 'varchar(200)'
name: 'author'; expression: '/author'; nullable:
'yes'; cast: 'string'; datatype: 'varchar(100)'
name: 'year'; expression: '/pub_year'; nullable:
'no'; cast: 'number'; datatype: 'int(10)'
Objects:
XPathRootHandler ('activity', '/activity')
XPathExpressionHandler
('title', '/title', 'no', 'string','varchar(200)')
XPathExpressionHandler
('author', '/author', 'yes', 'string', 'varchar(100)')
XPathExpressionHandler
('year', '/pub_year', 'no', 'number', 'int(10)')

Next, XPathRootHandler builds the table creation SQL statement using data stored by each new instance of XPathExpressionHandler, and executes the statement to create a root table; in our case, the activity table. Root tables contain two additional columns: id, which serves as the primary key for the table; and filename, which stores the filename of the document from which data will be extracted (either as a file path or a URL).

Query:
CREATE TABLE activity (
id int(10) UNSIGNED NOT NULL,
filename varchar(250) NOT NULL,
title varchar(200) NOT NULL,
author varchar(100) NULL,
year int(10) NOT NULL)

Once the table is created, XPathRootHandler creates an XPathGroupHandler for each entry in the types table where the group column of the relation table equals activity and the type indicated in the cast column of the types table equals group.

Query:
SELECT name, expression
FROM relation, types
WHERE (relation.component = types.name)
AND (relation.group = 'activity')
AND (types.cast = 'group')
Result:
name: 'text_section'; expression: '/text_section'
Objects:
XPathRootHandler ('activity', '/activity')
XPathExpressionHandler
('title', '/title', 'no', 'string', 'varchar(200)')
XPathExpressionHandler
('author', '/author', 'yes', 'string', 'varchar(100)')
XPathExpressionHandler
('year', '/pub_year', 'no', 'number', 'int(10)')
XPathGroupHandler
('text_section', '/text_section')

As each XPathGroupHandler object is created, it executes the same process as that of the XPathRootHandler class with one difference. Whereas each root table contains an id column and a filename column, each group table contains id, parentName, parentID, and index columns. The id column again serves as a primary key for the group table. The parentName column contains the name of the parent table. For example, the parentName of entries in the text_section table would be activity. The name of the parent table is included for each row because entries specified in the types table may be reused as the component part of more than one group/component relationship in the relation table. For example, if we introduced another root type called lesson, both activities and lessons could contain data stored in the text_section table. The parentID column is a foreign key referencing the primary key in the table indicated by the parentName column. Finally, the index column indicates which occurrence of the component appears in relation to the parentID. For example, if a document described in the activity table contains four text_section elements, the first text_section is stored in the text_section table with an index of 1, the second with an index of 2, and so on.

The object instantiation process continues recursively, with each XPathGroupHandler instance creating child XPathExpressionHandler and XPathGroupHandler instances until all entries described by the types and relation tables have been defined. When the process is complete, we will have the object structure in Listing Four. We also have the (empty) database tables listed in Table 2.

At this point, all of the necessary tables and objects exist to support a second algorithm that extracts data from XML source files using XPath and that stores the extracted data in the newly created tables. We will illustrate the process using the document of Listing Three.

The process begins with the DataManager, which calls the extractData method of the XPathRootHandler class, which in turn receives the XML to be processed, stored in memory as a DOM-XML tree, as well as the filename or URL of the file:

String xmlFile <= "activity.xml";
Document xmlTree =
DOMLoadFile(xmlFile);
XPathRootHandler
rootHandler.extractData(xmlTree, xmlFile);

Next, the method queries the database to determine the largest primary key id in the activity table, so that a unique primary key can be generated: 1 if the table is empty; otherwise, max(id)+1.

To create the entry in the activity table, the XPathRootHandler's extractData method must generate and evaluate XPath expressions using fragments from each XPathExpressionHandler object contained by the XPathRootHandler object. The XPathRootHandler activity object contains three child instances of the XPathExpressionHandler class; one each for title, author, and year. As the extractData method of each child XPathExpressionHandler is called, the XPath expression fragment of the parent (/activity) is passed along with the document tree resulting in this parameterized invocation:

XPathExpressionHandler.extractData
(xmlTree, "/activity")

In each child extractData method, the parent XPath fragment is combined with an index (a default of 1 is used if none is supplied by the parent so that a unique XML element is always selected), and the child XPath fragment to create a complete expression, which is then evaluated on the document tree. The result is returned to the parent:

String expression <= "/activity" + "[1]" + "/title";
XPathEvaluator evaluator =
new XPathEvaluator(expression);
String value =
evaluator.evaluate(xmlTree);
return value;

After the extractData methods of all child XPathExpressionHandler objects have been called, these XPath expressions (results in parenthesis) have been evaluated:

/activity[1]/title
("A TeachEngineering Activity")
/activity[1]/author
("University of Colorado at Boulder")
/activity[1]/pub_year (2004)

These data are then combined with the primary key and filename or URL to create a SQL INSERT statement for storing the information in the activity table:

INSERT INTO activity
(id, filename, title, author, year)
VALUES (1, 'activity.xml',
'A TeachEngineering Activity',
'University of Colorado at Boulder', 2004)

After executing this statement, the extractData method of the XPathRootHandler activity object then calls the extractData method of each child instance of the XPathGroupHandler class passing the XPath expression fragment of the parent (/activity), along with the document tree and the primary key of the row just inserted into the activity table of the database; Table 3(a):

XPathGroupHandler.extractData
(xmlTree, "/activity", 1)

The extractData method of the XPathGroupHandler class works similar to that of the XPathRootHandler class. However, whereas a well-formed XML document can have only one instance of the root element (that is, only one activity element), it can have more than one instance of group elements (two text_section elements). Therefore, a mechanism is added to the extractData method of the XPathGroupHandler class to process multiple occurrences of a group within an XML document.

Like the XPathExpressionHandler class, the XPathGroupHandler class is below the top level of the object tree and, therefore, must build its XPath expression by combining its own fragment with the one from its parent class. Using the text_section's XPathGroupHandler object as an example, the extractData method combines the fragment passed by the parent, /activity, with its own fragment, /text_section, to create a base expression:

/activity/text_section

The base expression is then used to form an XPath count query to determine how many occurrences of the current group appear in the source XML tree:

count(/activity/text_section)

In our example, there are two text_section elements in the document, so the aforementioned XPath expression returns the value 2 when evaluated. Once this value is determined, the process just described for XPathRootHandler.extractData is completed inside a loop, with a few modifications. First, the expression that is passed to the extractData method of child XPathGroupHandler and XPathExpressionHandler objects has an index based on the loop appended:

"/activity/text_section" + "[1]" =>
"/activity/text_section[1]"

Second, the SQL INSERT statement includes the name of the parent class, parentName, which was passed to the constructor of the current class, and the primary key of the entry in the parent table, parentID, which was passed to the extractData when it was called:

INSERT INTO text_section
(id, parentName, parentID,
index, text_section_name)
VALUES (1, 'activity', 1, 1, 'Section 1')

Finally, calls to the extractData method of a child XPathGroupHandler object include the current loop value as an index. This parameter has a default value of 1.

XPathGroupHandler.extractData
(xmlTree, "/activity/text_section[1]", 1, 1)

This process continues recursively as long as XPathGroupHandler objects contain child instances of the XPathGroupHandler class. When the process is complete, the XPath queries (results in parenthesis) and SQL INSERT statements will have been executed; see Listing Five. We will also have the data in Table 3 in the database. The extractData method of the XPathExpressionHandler class is the only place where data are actually extracted from the source XML tree. The resulting data are always returned to the calling extractData method of the XPathRootHandler and XPathGroupHandler classes, where SQL INSERT queries are built and executed.

TeachEngineering Application

The methods described here have been applied in the TeachEngineering digital library. At this writing, the library contains 335 XML documents, subdivided in 13 subjects, 19 curricular units, 106 lessons, and 197 hands-on engineering activities. The relation and types tables required to process these documents and relationally store their searchable content contain 57 and 29 rows, respectively. Since multiple organizations contribute and modify documents at random times, the conversion process runs periodically; currently, once daily. Contributors check their documents into a central (CVS) repository. Processing starts with a checkout of the documents to a location under the control of a (Apache) web server. After checkout and some additional preprocessing, the conversion program reads the relation and types tables and recursively processes all XML documents, generating a new database containing the documents' searchable information. The process is implemented in Java and is available under the GNU General Public License at http://www.TeachEngineering.com/download/spider/.

The methods described here are aimed at avoiding semantic hardcoding in XML-relational conversion. For our project, this has paid off quite nicely, especially in cases where users request changes to the library's search engine. For instance, if users request additional search criteria, these must be captured from the XML documents and stored in the relational database. Using this approach, all that is required is the addition of one or more records in the types and/or relation tables and a fresh run of the conversion program. Naturally, in such a case, the search engine itself has to be extended with facilities to search using these new criteria, but on the data collection side, accommodating these changes using our method is quite simple and does not require any code changes.

DDJ



Listing One

<?xml version="1.0" ?> 
<rdf:Description rdf:about="" xmlns:dcterms="http://purl.org/dc/terms/" 
mlns:dc="http://purl.org/dc/elements/1.1/" 
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
<dc:title xml:lang="en-US">A House is a House for Me</dc:title> 
<dc:creator>TeachEngineering.com</dc:creator> 
<dc:subject>adobe, climate, region, structural design, hut, igloo, lodge, 
pagoda, tepee , treehouse, wigwam</dc:subject> 
<dc:description xml:lang="en-US">Students brainstorm and discuss the 
different types of materials used to build houses in various 
climates. Small models of houses are built and tested against 
different climates.</dc:description> 
<dc:publisher>TeachEngineering.com</dc:publisher> 
<dc:type xsi:type="dcterms:DCMIType">Collection</dc:type> 
<dc:type>Activity</dc:type> 
<dc:format xsi:type="dcterms:IMT">text/xml</dc:format> 
<dc:identifier xsi:type="dcterms:URI">http://www.teachengineering.com/
collection/wpi_/activities/wpi_a_house_for_me/a_house_for_me.xml<
/dc:identifier> 
<dc:language xsi:type="dcterms:RFC1766">en-US</dc:language> 
<dc:coverage xsi:type="dcterms:TGN">United States</dc:coverage> 
<dc:rights>TeachEngineering.com</dc:rights> 
<dc:created xsi:type="dcterms:W3CDTF">2004-12-14</dc:created> 
</rdf:Description>
Back to article


Listing Two
XML2Relational(XMLDocument) {
String titleData = XMLParse("title", XMLDocument);
String authorData = XMLParse("author", XMLDocument);
SQLTransaction("INSERT INTO documentTable VALUES ('" + titleData + "', 
'" + authorData + "')");
} 
Back to article


Listing Three
<?xml version="1.0" encoding="UTF-8">
<activity>
...
<title>A TeachEngineering Activity</title>
<author>University of Colorado at Boulder</author>
<pub_year>2004</pub_year>
...
<text_section name="Section 1">
<text_block>A block of text in section 1.</text_block>
<text_block>Another block of text in section 1.</text_block>
</text_section>
...
<text_section name="Section 2">
<text_block>A block of text in section 2.</text_block>
<text_block>Another block of text in section 2.</text_block>
</text_section>
...
</activity>
Back to article


Listing Four
XPathRootHandler ('activity', '/activity')
XPathExpressionHandler ('title', '/title', 'no', 'string','varchar(200)')
XPathExpressionHandler ('author', '/author','yes', 'string', 'varchar(100)')
XPathExpressionHandler ('year', '/pub_year', 'no', 'number', 'int(10)')
XPathGroupHandler('text_section', '/text_section')
XPathExpressionHandler ('text_section_name', '/@name',
'yes', 'string', 'varchar(50)')
XPathGroupHandler ('text_block', '/text_block')
XPathExpressionHandler ('text', '', 'yes', 'string', 'text')
Back to article


Listing Five
/activity[1]/title ("A TeachEngineering Activity")
/activity[1]/author ("University of Colorado at Boulder")
/activity[1]/pub_year (2004)
count(/activity/text_section) (2)
/activity/text_section[1]/@text_section_title ("Section 1")
count(/activity/text_section[1]/text_block) (2)
/activity/text_section[1]/text_block[1] ("A block of text in section 1.")
/activity/text_section[1]/text_block[2] ("Another block of text 
in section 2.")
/activity/text_section[2]/@text_section_title ("Section 1")
count(/activity/text_section[2]/text_block) (2)
/activity/text_section[2]/text_block[1] ("A block of text in section 1.")
/activity/text_section[2]/text_block[2] ("Another block of text 
in section 2.")
Back to article


Listing Six
INSERT INTO activity (id, filename, title, author, year) 
VALUES (1, 'activity.xml', 'A TeachEngineering Activity', 
'University of Colorado at Boulder', 2004)
INSERT INTO text_section (id, parentName, parentID, index, text_section_name) 
VALUES (1, 'activity', 1, 1, 'Section 1')
INSERT INTO text_block (id, parentName, parentID, index, text) 
VALUES (1, 'text_section', 1, 1, 'A block of text in section 1.')
INSERT INTO text_block (id, parentName, parentID, index, text) 
VALUES (2, 'text_section', 1, 2, 'Another block of text in section 1.')
INSERT INTO text_section (id, parentName, parentID, index, text_section_name) 
VALUES (2, 'activity', 1, 2, 'Section 2')
INSERT INTO text_block (id, parentName, parentID, index, text) 
VALUES (3, 'text_section', 2, 1, 'A block of text in section 2.')
INSERT INTO text_block (id, parentName, parentID, index, text) 
VALUES (4, 'text_section', 2, 2, 'Another block of text in section 2.')
Back to article


Around the Web

Honeypot Detection in Advanced Botnet Attacks

Honeypots have been successfully deployed in many computer security defense systems.

Quick Read

Swarm: A True Distributed Programming Language

The Swarm prototype is a simple stack-based language, akin to a primitive version of the Java bytecode interpreter.

Quick Read

Key Software Development Trends

Several trends are emerging within the area of software development. Here are some of the most important trends S. Somasegar has been thinking about recently.

Quick Read

Understanding Parallel Performance

Understanding parallel performance. How do you know when good is good enough?

Quick Read

Short and Tweet: Experiments on Recommending Content from Information Streams

The authors used 12 algorithms to study the URL recommendation on Twitter as a means of better directing attention in information streams.

Quick Read





Video

Forty finalists will gather in Washington, D.C. from March 11-16 to compete for $630,000 in awards.; DDJ; Intel; science; Dr. Dobb's talks with Commonsware's Mark Murphy about what's involved in developing software for the Android operating system; Android; apple; DDJ; tablet development; The new method uses analytics technology developed by the Mayo and IBM collaboration, Medical Imaging Informatics Innovation Center, and has proven a 95 percent accuracy rate in detecting aneurysm.; Algorithm; DDJ; diagnostics; ibm; imaging; T-Mobile USA is enabling phone calls to Haiti without charges for international long distance through January 31 and retroactive to the earthquake on January 12; DDJ; mobile; wireless; Al Williams gives you a demor of One-Der: The One Instruction CPU; DDJ; At the 2010 International Consumer Electronics Show, the auto industry's first working smartphone application was unveiled; DDJ; mobile; The Bluetooth Special Interest Group (SIG) has announced the adoption of BLUETOOTH low energy wireless technology.; bluetooth; DDJ; wireless; IBM has unveiled its list of five innovations that have the potential to change how people live, work and play in cities around the world over the next five to ten years; DDJ; ibm; TeliaSonera's LTE mobile broadband commercial network in Stockholm is now the fastest and largest in the world.; broadband; DDJ; ericsson; mobile; Google has introduced, google Goggles, a visual search application on Android devices that allows users to search for objects using images rather than words; Android; DDJ; google; mobile; Visual Search Applications; Dr. Dobb's talks with David Intersimone, Vice President of Developer Relations and Chief Evangelist at Embarcadero Technologies, about RAD Studio 2010, SQL optimization and his reflections on the software industry.; database programming; DDJ; sql; Researchers from Intel Labs have created an experimental, 48-core Intel processor or "single-chip cloud computer."; cloud computing; DDJ; Intel; multicore; parallelism; The Large Hadron Collider will produce roughly 15 million gigabytes of data annually, to be accessed by a distributed computing and data storage infrastructure called the LHC Computing Grid.; CERN; DDJ; grid computing; physics; A mobile handheld device designed to let users can point, shoot and listen to printed text.; DDJ; Intel; mobile; Ericsson has become the first vendor to prove end to end interoperability in TD-LTE, another standard of 4G radio technologies designed to increase the capacity and speed of mobile telephone networks.; DDJ; ericsson; mobile; TD-LTE; According to a recent study, 80 percent of US respondents feel there are unspoken rules about mobile technology usage, and approximately 69 percent agreed that violations of these unspoken mobile manners are unacceptable.; DDJ; Intel; mobile; IBM and Canonical will introduce a software package for netbooks and other thin client devices in Africa. This is the first cloud- and premise-based Linux netbook software package offered by IBM and Canonical.; cloud computing; DDJ; ibm; His unprecedented ability to manipulate individual atoms signaled a quantum leap forward in in nanoscience experimentation and heralded in the age of nanotechnology.; DDJ; ibm; nanotechnology; IBM honored for its invention of the Blue Gene family of supercomputers. Adobe founders also recognized.; adobe; DDJ; ibm; Former U.S. President Bill Clinton addressed thousands of online entrepreneurs from around the world gathered for the third APEC Business Advisory Council SME Summit in Hangzhou, China.; DDJ; e-business; With free cooling for several months a year, Sweden is an ideal location for cost-efficient data centers.; data centers; DDJ; PNC Bank introduces a new mobile App for the iPhone and iPod touch that provides Virtual Wallet customers with a high-def view of their money while on the go.; DDJ; iphone; The Swedish LTE site will be part of a commercial network scheduled to go live in 2010, bringing data rates far above what is possible in today's mobile broadband networks.; DDJ; ericsson; mobile broadband; Nanotechnology advancement could lead to smaller, faster, more energy efficient computer chips.; circuit boards; DDJ; nanotech; semiconductor; Dr Dobbs talks with with Claudia Backus, Senior Director of Ecosystem Programs at Motorola, regarding the company's recently released MotoDEV Studio for their Android-powered phones.; Android; DDJ; mobile; motodev; The Extremadura Regional Government of Spain and IBM have launched an electronic prescription system in 680 pharmacies in western Spain.; DDJ; ibm; Ericsson to Acquire Majority of Nortel's North American Wireless Business; DDJ; ericsson; mobile; telecom; Nintendo's Wii Sports Resort is an immersive, expansive active-play game that includes a dozen resort-themed activities.; DDJ; nintendo; video games; OnStar can remotely send a signal to the electronic system in the subscriber's stolen vehicle and the vehicle will not be able to be re-started.; cellular; DDJ; wireless; In celebration of the historic Apollo Moon landing, Google has released Moon in Google Earth.; DDJ; google; Ericsson has been awarded contracts with the three telecom operators in China to provide fixed broadband access.; broadband; DDJ; mobile; tv; wireless; Dr. Dobb's talks with Adobe's Adam Lehman about the upcoming release of ColdFusion specifically optimized for Flash and Adobe AIR platform delivery.; adobe; ColdFusion; DDJ; eclipse; Companies team to develop computing device and chipset architectures that will combine the performance of powerful computers with high-bandwidth mobile broadband communications and ubiquitous Internet connectivity.; broadband; DDJ; Intel; mobile; nokia; Adobe Systems and HTC recently announced that the new HTC Hero will be the first Android phone to ship with support for Adobe Flash Platform technology.; adobe; Android; cell phones; DDJ; flash; mobile; mobility; 3.2 million Euros awarded across eight prize categorie recognizing world-class scientific research and artistic creation.; DDJ; A parody of Paul Simon's "50 Ways to Leave Your Lover," but for software security nerds.; DDJ; sql; Dr. Dobb's Mike Riley talks with Jim Manias of Advanced Systems Concepts.  In this conversation, Jim discusses the new ActiveBatch 7 and how it can provide significant productivity gains for application developers and business process owners alike.; ActiveBatch; DDJ; Sun cofounder Scott McNealy and Oracle CEO Larry Ellison discussed Java's role in computing. Sun has also released OpenSolaris 2009.06.; DDJ; java; opensolaris; oracle; sun; Spotlight on NATO's centre of excellence on cyber defense in Tallinn, Estonia.; cyber defense; DDJ; nework security; security; Create Data Access Layers in ASP.NET; DDJ; In this demonstration you will learn how to layout a WPF application. We will explore the major layout panels that come with WPF, contrasting them with each other and describing when to use each.; DDJ; web development; windows; wpf; The Intel Foundation has announced the top winners of the Intel International Science and Engineering Fair; DDJ; Intel; News; science; Matt Hester demonstrates Internet Explorer’s 8 new feature Selectors API for utilizing CSS selectors for quick and easy element lookups.; DDJ; IE8; microsoft; windows; The NATO Virtual Silk Highway provides affordable, high-speed Internet access via satellite to the academic communities of the Caucasus and Central Asia.; DDJ; On a Windows Mobile device, applications are typically not closed down, but they stay in the background. Maarten Struys shows you a simple way to preserve battery power inside your own applications.; DDJ; microsoft; power consumption; windows; Windows Mobile Devices; Cadillac is now offering wireless Internet access with its CTS sedan.; DDJ; wireless broadband; By default, Windows Mobile Standard (Smartphone) applications launched from Visual Studio are not accessible on the device/emulator once they are minimized. In this video, Jim Wilson demonstrates two simple techniques to solve the problem.; DDJ; microsoft; smartphone; VIsual Studio; Mike Riley talks with the brass from Everypoint, creators of the NEMO mobile application development platform.; DDJ; Developers; development environments; mobile applications; Symmetric and asymmetric encryption algorithms, the SHA256 hash encryption algorithms, and how to implement in a simple application using Microsoft's Azure Services Platform.; Azure; DDJ; encryption; microsoft; security; windows; T-Mobile has introduced the Sidekick LX, which features enhanced video capability.; DDJ; Mobile Smartphone; Bluetooth 3.0 offers speedier transmission of large amounts of video, music and photos between devices wirelessly.; bluetooth; DDJ; mobile networks; wireless broadband; Cities around the world are battling with stressed transportation networks, so IBM has announced plans for three new smart rail projects in China, Taiwan and The Netherlands.; DDJ; ibm; ILOG; CASMOBOT is a Nintendo Wii remote controlled slope lawn mower.; DDJ; Denmark; nintendo wii; research; robotics; Project ensures documents, images, video and other Internet-based data growing at over 100 terabytes per month will live on for future generations; data storage; DDJ; history; Intenet; research; Sun Microsystems; Dr. Dobb's talks with Dave McAllister, Director of Standards and Open Source for Adobe, about the Open Screen Project.; adobe; DDJ; Open Screen Project; open source; The Facebook Connect SDK provides the code to let third-party developers embed hooks into their applications so users can connect to their Facebook accounts and exchange information using iPhone apps.; apple; cocoa; DDJ; Facebook; iphone; Mars in Google Earth Updated; DDJ; google; google earth; Google mars; red planet; The Sun Cloud is built on the Sun Open Cloud Platform that leverages the best in world-class open source technologies. The Sun Open Cloud Platform brings together Java, MySQL, OpenSolaris and OpenStorage.; cloud computing; DDJ; java; open solaris; sun; DDJ; High School; Intel; science; ILOG Elixir is a suite of professional user interface controls that gives developers a rich collection of innovative and interactive data display components for Adobe Flex and Adobe Air.; adobe; air; DDJ; elixir; flash; flex; ILOG; The inaugural San Diego Science Festival being held this month is touted as one of the largest multicultural, multigenerational, multidisciplinary celebrations of science ever seen on the West Coast; DDJ; lockheed; News; science; IBM has announced Innov8 version 2, a new version of its serious game that helps students and professionals hone their business and technology skills in a compelling, familiar video game format.; DDJ; ibm; serious games; Swiss Automobile Visionary Frank M. Rinderknecht builds a concept car with adaptive energy concept and iPhone controls.; apple; Concept Car; DDJ; iphone; j; siemens; Two-Year Plan to Focus on 32 Nanometer Manufacturing Technology; 32 nanometer technology; chip; cpu; DDJ; gpu; Intel; manufacturing; Nehalem; Westmere; New version features ocean layer, historical imagery, and more.; DDJ; google; Dr. Dobb's talks with Marty Alchin, author of "Pro Django" about his book and the deep internals of the Django framework.; DDJ; Django; A new content-authoring solution for learning professionals; adobe; DDJ; toolkits; web authoring; In a Second Life setting, Danny Coward discusses Java FX with Dr. Dobb's Jon Erickson.; DDJ; java; JavaFX; sun; The Core i7 processor is the first member of a new family of Nehalem processor designs with new technologies that boost performance on demand.; chip; DDJ; Intel; processors; Dan Diephouse, creator of XFire, a high-performance open-source SOAP framework (which became the Apache CXF project), shares the five common mistakes in SOA governance and insight about the Apache CXF and Mule RESTpack development environments.; apache; Apache CXF; DDJ; mule; open source; soa; soap; Xfire; Adrian Kaehler and Gary Bradski discuss the Open Computer Vision Library (sourceforge.net/projects/opencvlibrary/) and their book "Learning OpenCV".; DDJ; Open Computer Vision Library; OpenCV; In the first part of this two-part interview, Stephen Wolfram reflects on the 20-year anniversary of Wolfram Research.; DDJ; Mathematica; Mathematics; science; In the second part of this two-part interview, Stephen Wolfram discusses his book "A New Kind of Science."; DDJ; Mathematica; Mathematics; science; Nick Hodges talks about Delphi 2009, a RAD tool for Windows, and Delphi Prism, a database engine for Windows, Mac OS X, and Linux.; DDJ; delphi; RAD; windows; Dr. Dobb's talks with Tony Lombardo, lead Technical Evangelist at Infragistics, about all new UI tools for Windows and .NET.; .net; DDJ; silverlight; ui; windows; wpf; Dr. Dobb's talks with Eric Schulz about his International Mathematica User's Conference 2008 presentation on the Mathematica Essentials Palette and the future digital educational material; DDJ; Mathematica; Mathematics; Dr. Dobb's talks with ActiveState's Trent Mick about the recently released Komodo IDE 5.0.; DDJ; ide; open source; Dr. Dobb's talks with Continuity Logic's Kris Carlson about "Why We Die: Simulation of the Evolution of Senescence" and why he programs with Mathematica's functional programming language.; DDJ; functional programming; Mathematica; simulation; Ericsson collaborates with Intel; DDJ; ericsson; Intel; Mobile technology; Dr. Dobb's talks with Schoeller Porter about the grid and cloud versions of Mathematica; clouds; DDJ; Grid; Mathematica; Dr Dobb's interviews Yehuda Katz, maintainer of the Merb project, about the advantages this highly optimized Ruby on Rails alternative offers to web application developers.; DDJ; Ruby on Rails; Dr. Dobb's talks with Thomas Roman, Professor of Mathematics at Central Connecticut State University, about "Mathematica Visualization in a Theoretical Physics Problem - Negative Energy in an Unusual Quantum State."; DDJ; Mathematica; physics; quantum; science; The Forbidden City: Beyond Space & Time is a fully immersive, three-dimensional virtual world that recreates a visceral sense of space and time.; Blade Server; China; DDJ; ibm; linux; mac; online; virtual world; windows; Dr. Dobb's interviews open source luminary Miguel de Icaza about his latest milestone of achieving Microsoft .NET 2.0 Framework compatibility with the Mono Project .; DDJ; Dr. Dobb/s interviews Paul Kimmel, author of "LINQ Unleashed for C#", about Microsoft's new query technology that lets developers poll any information from any data source regardless of location or structure. I; C#; DDJ; Dr. Dobb's; LINQ; microsoft; It takes a supercomputer to build a super car. ; DDJ; HPC; simulation; Dr. Dobb's shows how to install and execute cross-platform scripting languages on the Windows Mobile platform. In this installment, Mike Riley examines Perl for Windows Mobile devices.; DDJ; mobile devices; perl; windows; Dr. Dobb's shows how to install and execute cross-platform scripting languages on the Windows Mobile platform. In this installment, Mike Riley examines Python CE which is optimized for Windows Mobile devices.; DDJ; mobile devices; python; windows; Dr. Dobb's shows how to install and execute cross-platform scripting languages on the Windows Mobile platform. In this installment, Mike Riley examines Ruby for Windows Mobile devices.; DDJ; mobile devices; ruby; windows; Young participants at ITU TELECOM ASIA 2008 in Bangkok, Thailand received free laptops as part of ITU’s initiative to promote affordable devices to increase access to information and communication technologies.; communication; DDJ; itu; Currently technical strategist to Microsoft's Chief Software Architect, Rebecca Norlander has had a tremendous impact on Excel, Internet Explorer, Windows XP SP2, and Windows Vista Security. ; DDJ; microsoft; Contributing authors to the book "Beautiful Code" got together at Dr. Dobb's SD West Conference in March, 2008. Part 1 of 3.; DDJ; programming; software development; Contributing authors to the book "Beautiful Code" got together at Dr. Dobb's SD West Conference in March, 2008. Part 2 of 3.; DDJ; programming; software development; Contributing authors to the book "Beautiful Code" got together at Dr. Dobb's SD West Conference in March, 2008. Part 3 of 3.; DDJ; programming; software development; Anders Hejlsberg discusses C#, Turbo Pascal, and what it means to design a programming language. ; C#; DDJ; microsoft; Turbo Pascal; Solar powered laptops given to youths at ITU Asia 2008.; DDJ; News; telecommunications; IBM breakthrough stands to impact future direction of information technology.; DDJ; Mike Riley spoke to ActiveState's Jeff Hobbes about the new features in Tcl Dev Kit and Perl Dev Kit including the code coverage and hot-spot analysis tool and Mac OSX support.; DDJ; Tim O'Reilly addressed the OSCON convention in his Wednesday keynote titled "Degrees of Freedom, Open Source in the Wed 2.0 Era.; DDJ;