XSLT Querying & XML Documents

This XSLT-based method of searching XML documents is easier to use, more flexible, and has better performance than the DOM- and SAX-based methods you're probably familiar with.


December 01, 2002
URL:http://www.drdobbs.com/web-development/xslt-querying-xml-documents/184407890

Dec02: XSLT Querying & XML Documents

Giuseppe is a software developer and technical writer in London (UK) and can be reached at [email protected].


The Document Object Module (DOM) and Simple API for XML (SAX) are standard methods by which you can search XML documents without generating specific parsers in different programming languages. In this article, I propose an alternative method that takes advantage of Extensible Stylesheet Language Transformations (XSLT) features that can then form a Java component query on XML documents. The advantages of this approach to the classic SAX/DOM strategies include ease of use, greater flexibility, and better performance.

XSLT was designed to transform XML into other forms such as HTML, XHTML, or other XML documents. An XSLT processor, using one or more eXtensible Stylesheet Language (XSL) stylesheets, performs the transformation. The stylesheets are also XML documents, written according to the specifics of the XSL, containing tags interpretable by the XSLT processor. Usually, XSLT is applied in web-based systems to show—in a better form—data coming from XML. In this context, the web browser represents the client side of the system. It deals only with the HTML produced by the XSLT processor with input provided by the XSL stylesheet and the XML document; see Figure 1.

The XML code in Listing One (employee.xml) contains a list of employees. The attendance tag contains the day-to-day start/end work time for each employee. From this XML code, you display attendance information inside a web browser in a user-friendly format. To do this, I wrote the XSL stylesheet in Listing Two (employee.xsl), which tells the XSLT processor to create an HTML file containing all the employee attributes with information about attendance.

For the purposes of this article, I am interested in a particular feature that the statement for-each holds. Thanks to the select attribute, you can indicate the search criteria for extracting nodes. For example, if you substitute <xsl:for-each select="employees/employee"> with <xsl:for-each select="employees/employee[surname='White']">, then only the employees with the surname "White" are shown. This lets you perform a query on an XML document using XSLT.

To specify the search criteria on the XML nodes, insert a pattern inside square brackets after the node name in the select attribute of the for-each statement. XSLT provides diverse operators and functions to make comparisons among values. In the previous example, I used the "=" operator. You can also use the function contains that returns True if a queried string is contained in the attributes. By using <xsl:for-each select="employees/employee[contains(surname,'White']">, you get each employee whose surname contains the string "White." Numeric operators can also be used. If you are looking for employees whose age is greater than 30, you can write the following, where age is an attribute of employee: <xsl:for-each select="employees/employee[age $gt$ 30]">.

To take advantage of XSLT features for performing a query on XML nodes, you need to carry out a Java class (say, XMLQueryManager), whose instantiated objects are related one-to-one to the XML documents. The class has a method to perform a query (called query) on the related XML. The input is the string representing the query statement written with the XSL syntax. The method creates, on the fly, the XSL stylesheet able to perform the required query.

Suppose that the instance of the XMLQueryManager works on the employee.xml file. You could pass the employees/employee[surname='White'] statement to the query method that creates an XSL stylesheet similar to that of Listing Two, except that the for-each command on the employee node will be <xsl:for-each select="employees/employee[surname= 'White']">. Successively, the query method calls the XSLT processor performing the transformation in HTML code. Finally, the HTML code is returned. You can now imagine the following scenario—a web-based application accepts requests from the web browser (for example, "take all the employees whose surname is White"). It then sends this information to a servlet (or JSP) that, using XMLQueryManager, returns the HTML code showing the result; see Figure 2.

The XMLQueryManager Class

Before you start to implement XMLQueryManager, you have to make some choices. First, you have to decide which XSTL processor to use. In this implementation, I used Xalan 2, an open-source Java XSLT processor produced by the XML Apache Group (http://www.apache.org/). This XSLT processor follows the Sun JAXP 1.2 specifications and is no longer compatible with the first Xalan version. JAXP 1.2 is also included in the new J2SE 1.4. Thus, if you use this version of Java, you don't need to download Xalan 2.

You must also have a way to generate the XSLT stylesheet on the fly. One approach would be to hard-code XSL commands into the query method. But this would not be a good general-purpose solution. A better solution would be to store the XSL code in a template file. This file is an XSL file where the query statement will be replaced at run time by the XMLQueryManager class. The complete implementation of the XMLQueryManager class is available electronically; see "Resource Center," page 5). To get an XSL template file from the stylesheet in Listing Two, you replace the statement <xsl:for-each select="employees/employee[surname='White']"> with <xsl:for-each select="$">. The $ symbol is replaced at run time with the search-criteria string. The resulting XSL stylesheet and XML file become inputs to the XSLT processor. The class has two private attributes:

The XMLQueryManager constructor accepts the name of the XML file and the name of the XSL template file as input. With the first parameter, the constructor creates the SourceStream object and stores it into the _xmlSource private attribute. The constructor also stores the xslFile parameter in the _xmlFileTemplate attribute, which is later used by the query method.

The query method accepts the XSL query statement (xslStatement) and creates the concrete XSL stylesheet calling the makeXSL private method. This method returns the name of the XSL stylesheet created from the XSL template file and the query statement. The name of the XSLT stylesheet is then used to build the StreamSource object that the XSLT processor needs.

Before you call the processor, you must specify the result format—I used a file to store the result of the transformation. The name of the file will be built on the fly by the query method concatenating the string "result" with the system time in milliseconds. In this way, you can be sure the file name will be unique. This creates the ResultStream object. The class implements the Result interface and has the same meaning as SourceStream; the only difference being that ResultStream is used for the output, whereas SourceStream is used for the input.

At this point, you are ready to invoke the XSLT processor. The code that makes it possible is:

TransformerFactory factory = Transformer- Factory.newInstance();

Transformer trans = factory.newTransfor- mer(xslSource); trans.transform(_xml- Source,result);

Thanks to the transform method, the Transformer JAXP class performs the XSLT processing. It needs Source and Result as input. To get an instance of the Transformer, you have to get a TransformerFactory instance by calling the newTransformer method and passing the Source that represents the concrete XSL stylesheet. After the invocation of the transform method, you can find the output file on the directory where your application is working. Finally, the query method puts this file into a string using the fileToString private method—and returns it.

Before quitting, the method deletes the generated XSL stylesheet that was temporarily generated for the XSLT processor. The fileToString method also deletes the resulting file after its content is placed in the string.

In summary, the makeXSL method creates a concrete XSL stylesheet starting from the XSL template, replaces the $ symbol with the xslStatement string in input (see replace method), and then returns the created file.

The Employee Servlet

The XMLQueryManager class can be used from any Java application. You can make a traditional server application using the XSL template file and the XML file as input, and send the result of the XSL transformation to a client application. This type of application can be useful in web programming, where the server application making the transformation could be a servlet and the client could be the web browser.

For example, I use a straightforward web-based application that, by means of an HTML page, takes an input string representing the surname of an employee, then passes this string to a servlet. The servlet builds the right XSL query statement starting with the surname of the employee, and creates an instance of XMLQueryManager associated with the employee XML document using the XSL template in Listing Two with the $ symbol instead of the query statement.

The generated XSL query statement is passed to the query method of the XMLQueryManager class. It returns a string representing the result of the transformation that is sent to the web browser as XML or HTML code, depending on the implementation of the XSL template. In this case, the code is HTML.

For this application, you start by implementing the HTML code that lets the surname of the employee be input. It is a small HTML page that shows a submitting form (available electronically). When the Submit button is pressed, the string in the text area is sent to the EmployeeServlet servlet using the HTTP-GET method.

The code for the EmployeeServlet servlet is available electronically. The servlet sets the response object as text/html and gets an instance of the PrintWriter (variable out). Then it gets the surname parameters sent by the web browser. At this point, the servlet builds an instance of the XMLQueryManager. The constructor takes the XML file name and the XSL template file names (employee.xml and employee.xsl), and the query method is called. The XSL query statement is made inside the servlet using the surname parameter: employees/employee[contains(surname,"'+surname+"')], which lets the XSLT processor include only the employees/employee nodes where the surname attribute contains the string inside the surname Java variable in the result.

Extendibility

I introduced XMLQueryManager to present the possibilities of using an XSLT-based component to perform queries on XML documents. It is straightforward and much can be done to improve this method.

For example, you can add multiquery parameter features to the XMLQueryManager class. As you have seen, XMLQueryManager only works with one query statement that will replace the $ symbol into the XSL template file. You can modify the class to accept two or more query statements; in the XSL template, you can put more parameters ($1, $2, $3, and so on). The setQueryStatement method would accept a number and query-statement string, storing these couples into an array. In this way, the query method would be able to replace the parameters in the XSL template file with the values present in the array.

Another improvement would be to provide the ability to take the input and return the output using other kinds of input/output systems such as String, DOM tree, or SAX processes.

DDJ

Listing One

//  The employee.xml file
<?xml version="1.0" encoding="UTF-8"?>
<employees>
<employee id="10808">
  <surname>White</surname>
  <name>Liza</name>
  <birthday>02/28/1976</birthday>
  <attendance>
    <date>01/02/2002</date>
    <starttime>9:08 am</starttime>
    <endtime>18:45 pm</endtime>
  </attendance>
  <attendance>
    <date>01/03/2002</date>
    <starttime>8:54 am</starttime>
    <endtime>17:55 pm</endtime>
  </attendance>
  <attendance>
    <date>01/04/2002</date>
    <starttime>9:13 am</starttime>
    <endtime>19:21 pm</endtime>
  </attendance>
</employee>

<employee id="10990">
  <surname>Hill</surname>
  <name>James</name>
  <birthday>01/16/1979</birthday>
  <attendance>
    <date>01/02/2002</date>
    <starttime>7:43 am</starttime>
    <endtime>17:25 pm</endtime>
  </attendance>
  <attendance>
    <date>01/03/2002</date>
    <starttime>8:02 am</starttime>
    <endtime>17:11 pm</endtime>
  </attendance>
  <attendance>
    <date>01/04/2002</date>
    <starttime>8:11 am</starttime>
    <endtime>17:41 pm</endtime>
  </attendance>  
</employee>
<employee id="11145">
  <surname>Hill</surname>
  <name>Marion</name>
  <birthday>03/04/1968</birthday>
  <attendance>
    <date>01/02/2002</date>
   <starttime>7:50 am</starttime>
    <endtime>17:12 pm</endtime>
  </attendance>
  <attendance>
    <date>01/03/2002</date>
    <starttime>8:45 am</starttime>
    <endtime>17:54 pm</endtime>
  </attendance>
  <attendance>
    <date>01/04/2002</date>
    <starttime>8:34 am</starttime>
    <endtime>17:34 pm</endtime>
  </attendance>  
</employee>
</employees>

Back to Article

Listing Two

// The employee.xsl file
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheetversion="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<xsl:template match="/">
  <html>
  <body>    
    <xsl:for-each select="$">
     <h2>
     <xsl:value-of select="@id"/>
     <xsl:text> </xsl:text>
     <xsl:value-of select="name"/>
     <xsl:text> </xsl:text>
     <xsl:value-of select="surname"/>
     <xsl:text> </xsl:text>   
     <xsl:value-of select="birthday"/>     
     </h2>
     <table border="2" width="50%">
     <tr bgcolor="yellow">
     <td width="34%"><b>Date</b></td>
     <td width="33%"><b>Start Time</b></td>
     <td width="33%"><b>End Time</b></td>
     </tr>
     <xsl:for-each select="attendance">
       <tr>
       <td><xsl:value-of select="date"/></td>
       <td><xsl:value-of select="starttime"/></td>
       <td><xsl:value-of select="endtime"/></td>
       </tr>
     </xsl:for-each>    
    </table>
    <br/><br/> 
    </xsl:for-each>    
  </body>
  </html>
</xsl:template>
</xsl:stylesheet>

Back to Article

Dec02: XSLT Querying & XML Documents

Figure 1: The XSLT processor has, as input, the XSL stylesheet and XML document.

Dec02: XSLT Querying & XML Documents

Figure 2: Proposed XSLT-servlet architecture.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.