January 01, 2002
URL:http://www.drdobbs.com/xsl-the-extensible-style-language-web-te/184414038
<?xml version='1.0'?> <doc><title>My Document</title> <para>This is a <em>short</em> document.</para> <para>It only exists to <em>demonstrate a <em>simple</em> XML document</em>.</para> <figure><title>My Figure</title> <graphic fileref="myfig.gif"/> </figure> </doc>
Table 1
Some equivalences between CSS2 selectors and XSL patterns.
CSS2 SELECTOR | XSL PATTERN | DEFINITION |
title | title | Any title |
doc>title | doc/title | A document title (title is a direct child of doc) |
doc title | chapter/title | A title that is any decsendant of doc |
em[role] | em[role] | An em element with a role attribute |
em[role="bold"] | em[role="bold"] | An em element when role is "bold" |
listitem:first-child | listiem [first-of-type()] | The first item of a list |
n/a | listitem [last-of-type()] | The last item of a list |
n/a | equation [child(title)] | An equation that has a title |
corpauthor + author | n/a | An author preceded by a corpauthor |
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"> <xsl:template pattern="doc"> <HTML> <HEAD> <TITLE>A Document</TITLE> <HEAD> <BODY> <xsl:process-children/> </BODY> </HTML> </xsl:template> <xsl:template pattern="title"> <H1> <xsl:process-children/> </H1> </xsl:template> <!-- this stylesheet handles only a subset of the sample document --> </xsl:stylesheet>
<HTML> <HEAD> <TITLE>A Document</TITLE> <HEAD> <BODY> <H1>My Document</H1> <P>This is a <I>short</I> document.</P> <P>It only exists to <I>demonstrate a <B>simple</B>XML document</I>.</P> <DIV> <B>Figure 1.</B><BR/> <IMG src="myfig.gif"/><BR/> <B>My Figure</B> </DIV> </BODY> </HTML>
XSL The Extensible Style Language
Styling XML Documents
By Norman Walsh
From the earliest days of the Web, we've been using essentially the same set of tags in our documents. Web pages written in HTML use HTML tags and the meaning of those tags is well understood:
<H1>
makes a heading,<IMG>
loads a graphic,<OL>
starts an ordered list, and so on. The number of tags has slowly grown, and there have been numerous browser-compatibility issues, but the basic tag set is still the same.
There's a significant benefit to a fixed tag set with fixed semantics: portability. A Web page that uses the standard tags can be viewed by just about any browser, anywhere in the world. However, HTML is very confining; Web designers want more control over presentation and many processes would benefit from more descriptive tagging.
Enter XML. With XML, we can use any tags we want. We can write documents using our own tag names -- names that are meaningful in the context of our subject matter and offer the possibility of far greater control over presentation. But this freedom comes at a price: XML tag names have no predefined semantics. An
<H1>
might just as legitimately identify a tall hedge as a first-level heading. Is<IMG>
an image, or an imaginary number? Who knows?
The style sheet knows. From the very beginning of the XML effort, it was recognized that in order to successfully send XML documents over the Web, it would be necessary to have a standard mechanism for describing how they were to be presented. That's why we need style sheets.
The Extensible Style Language (XSL) is the style language for XML. At the time of this writing (October 1998), XSL is under active development by the W3C. On August 18, 1998, the XSL Working Group (WG) released its first Working Draft. This article introduces XSL as described in that document. (Visit www.w3.org/TR/WD-xsl to view the Working Draft for yourself.)
By the time this article is published, a second Working Draft may be available. It doesn't seem likely that any of the topics covered here will change substantially between the first and second Working Drafts, but it's always possible.
What Does a Style Sheet Do?
In simplest terms, a style sheet contains instructions that tell a processor (such as a Web browser, print composition engine, or document reader) how to translate the logical structure of a source document into a presentational structure.
Style sheets typically contain instructions like these:
- Display hypertext links in blue.
- Start chapters on a new, left-hand page.
- Number figures sequentially throughout the document.
- Speak emphasized text in a slightly louder voice.
Many style-sheet languages augment the presentation of elements that have a built-in semantic meaning. For example, a Microsoft Word paragraph style can change the presentation of a paragraph, but even without the style, Word knows that the object in question is a paragraph.
The challenge for XSL is slightly greater. Because there's no underlying semantic to augment for XML, XSL must specify how each element should be presented and what the element is. For this reason, XSL defines not only a language for expressing style sheets, but also a vocabulary of "formatting objects" that have the necessary base semantics.
For the purpose of this article, we're going to consider a simple XML document, shown in Example 1.
This document contains only a few elements:
doc
defines document element;title
defines titles;para
defines paragraphs;em
indicates emphasis;figure
andgraphic
define external graphics.
How Does XSL Work?
Before discussing XSL in more detail, it's necessary to consider the XSL processing model. An XSL processor begins with a style sheet and a "source tree." The source tree is the tree representation of the parsed XML source document. All XML documents can be represented as trees.
Conceptually, the XSL processor begins at the root node in the source tree and processes it by finding the template in the style sheet that describes how that element should be displayed. Each node is then processed in turn until there are no more nodes left to be processed. (In fact, it's a little more complicated than this because each template can specify which nodes to process, so some nodes may be processed more than once and some may not be processed at all. We'll examine this later.)
The product of all this processing is a "result tree." If the result tree is composed of XSL formatting objects, then it describes how to present the source document. It's a feature of XSL that the result tree doesn't have to be composed of XSL formatting objects -- it can be composed of any elements. One common alternative to XSL formatting objects will be HTML element names. When HTML is used in the result tree, XSL will transform an XML source document into an XML document that looks very much like HTML. It's important to realize, however, that the result is XML, not HTML. In particular, empty elements will use the XML empty-element syntax, and it's impossible to produce documents that are not well-formed XML.
What Does XSL Look Like?
XSL style sheets are XML documents. A short XSL style sheet can be seen in Example 2. This style sheet transforms source documents like the XML document in Example 1 into HTML. A style sheet is contained within a style sheet element and contains template elements. (Style sheets can contain a small handful of elements in addition to the template, but most style sheets consist of mostly templates.)
Don't worry if this looks a little confusing at first. There's a lot going on. We'll revisit this style sheet in the "Understanding XSL" section.
One thing that stands out in an XSL style sheet is the use of namespaces. Defined in the W3C's "Namespaces in XML" (www.w3.org/TR/WD-xml-names), namespaces are what all the colon-delimited prefixes are about.
In XSL, there can be no reserved element names, so it's necessary to use some other mechanism to distinguish between elements that have XSL semantics and other elements. This is the problem that namespaces were designed to solve.
If you're not familiar with namespaces, here are some simple guidelines:
The prefix is significant when comparing element names; therefore
xsl:template
andtemplate
are different.
The prefix string is arbitrary. What's important is the association of a prefix string with a URI. That's the function of the "
xmlns:
" attribute on thestylesheet
.
The attribute
xmlns:xsl="http://www.w3.org/TR/WD-xsl"
associates the namespace prefix "
xsl
" with the URI that follows it:
("http://www.w3.org/TR/WD-xsl")
.
If it were instead
xmlns:xyzzy="http://www.w3.org/TR/WD-xsl"
then the prefix
xyzzy:
would replace every instance ofxsl:
in the example, and the style sheet would be exactly the same.
From the preceding points, it follows that
xsl:template
andxyz:template
are different (unless the two namespace prefixes are associated with the same URI).
Comparing XSL and CSS
XSL and Cascading Style Sheets (CSS) have similar goals, and it's useful to compare them. XSL is more powerful than CSS in many ways, but it's also more complex. XSL and CSS are not competitors. For some common applications (like HTML+ documents that use mostly HTML but have a few extra non-HTML tags thrown in), CSS will be the easiest solution. For others, the manipulative power of XSL will be required.
Although very different, XSL and CSS have two things in common: Each provides a mechanism for selecting elements and for specifying how the selected elements are to be presented. CSS uses selectors and properties in this way:
selector { properties; }
XSL uses patterns and formatting objects:
<xsl:template pattern="pattern">
<formatting objects/>
</xsl:template>
Selectors and Patterns. CSS2 (which is considerably more complex than CSS1 with respect to selectors) and XSL each provide a fairly rich set of features for selecting elements. Table 1 compares a few CSS2 selectors and XSL patterns.
Much more complex XSL patterns are also possible. For example, this XSL pattern selects an item, other than the first, of a bulleted list in an appendix:
appendix//list[type="bullet", child(title)]/listitem[not-first-of-type()]
Properties and Formatting Objects. CSS properties let you specify a wide range of display characteristics for an element. These properties are "decoration" on the source tree. However, in XSL, you must specify both the result object and its properties.
For example, the following CSS fragment formats a quote as an indented block with some font changes:
quote { display: block; font-size: 90%; margin-left: 0.5in; margin-right: 0.5in; }
In XSL, the same formatting could be achieved with XSL formatting objects using this template:
<xsl:template pattern="quote"> <fo:block font-size="90%" indent-start="0.5in" indent-end="0.5in"> <xsl:process-children/> </fo:block> </xsl:template>
The advantage of both constructing a new object and applying properties to it can be seen when you consider the things that you can't do with CSS properties alone:
- change the order of elements for display;
- process elements more than once;
- suppress elements in one place and present them in another;
- add generated text to the presentation (CSS2 introduced a simple form of pre- and post-element generated text, but falls short of solving the general problem).
Consider the task of presenting names in "Last, First" format. Given this source element:
<author>
<firstname>Norman</firstname>
<surname>Walsh</surname>
</author>
You need the powerful capabilities of XSL to obtain the desired result:
<xsl:template pattern="author">
<xsl:sequence>
<xsl:process select="surname"/>
<xsl:text>, </xsl:text>
<xsl:process select="firstname"/>
</xsl:sequence>
</xsl:template>
With CSS, you can apply properties to the
<filename>
and<surname>
elements, but there is no way to reorder them.
XSL formatting objects are being developed in coordination with the Cascading Style Sheets and Formatting Properties (CSS/FP) Working Group (www.w3.org/Style/Activity). The goal of this coordinated effort is to define a single formatting model for both systems. Using these formatting objects, it will be possible to write style sheets that can be rendered on many different devices with reasonably comparable results.
At present, the Working Draft does little more than lay the groundwork for future drafts. It describes a number of formatting objects and outlines their formatting semantics. Most of the formatting objects draw their semantics from a combination of the Document Style Semantics and Specification Language (DSSSL, defined by ISO/IEC 10179:1996) and CSS formatting models. With considerable effort and substantial success, a first attempt at harmonizing these two formatting models has been completed. Over subsequent drafts, these semantics will be harmonized further.
When XSL is complete, XSL formatting objects will provide a device-independent representation for online and print publishing that will include support for sophisticated features such as layout-driven formatting.
The following is a list of common formatting objects defined by the first XSL Working Draft:
page-sequence
defines a sequence of pages. The formatting of pages in a sequence is described by thepage
master
. Currently only asimple-page-master
is defined, sufficient for simple, single-column Web or print publishing.queue
gathers content for later insertion into an area or set of areas.sequence
is a general wrapper for inline or block content. Asequence
provides a wrapper on which shared, inherited properties can be hung.block
represents a block of text. Paragraphs, titles, and figure captions are all examples of blocks.list
defines a list.List
elements containlist-item
elements which further contain alist-item-label
and alist-item-body
.graphic
holds an image or vector graphic.link
defines a link. Alink-end-locator
defines the target of a link.
Understanding XSL
With that background, let's take a closer look at the style sheet in Example 2. XSL contains many more features than can be covered in an article of this size. We'll consider just the features needed to write a simple style sheet for the sample XML document in Example 1.
In order to display the sample document, we must handle five cases:
1. the document element,
2. the document title,
3. paragraphs,
4. emphasis (can be nested),
5. figures.
In this example, we'll use XSL to transform our XML document into HTML (see Example 3). Each template in our style sheet "instantiates" a small part of the result tree. XSL knits all of these fragments together to form the complete result tree.
The Document Element. Since we know that the document element,
doc
, always comes first, we'll use it to build the basic structure of our HTML page. That's what the following rule does:
<xsl:template pattern="doc">
<HTML>
<HEAD>
<TITLE>A Document</TITLE>
</HEAD>
<BODY>
<xsl:process-children/>
</BODY>
</HTML>
</xsl:template>
Every element in the template is either an XSL processing instruction or is copied literally into the result tree. In this rule, each element is copied into the result tree until
xsl:process-children
is encountered.
When
xsl:process-children
is encountered, the XSL processor processes each of the children of the current node. For each node, it finds the matching template and instantiates it. The sequence of instantiated templates is placed in the result tree at the location of thexsl:process-children
element in the template.
It's perfectly legitimate for a template to contain more than one occurrence of
xsl:process-children
. However, the same processing is performed each time.
The Document Title. For the document title, we simply want to output an
<H1>
:
<xsl:templatepattern="doc/title">
<H1>
<xsl:process-children/>
</H1>
</xsl:template>
Note that we've used the pattern
"doc/title"
, which distinguishes document titles from figure titles.
Example 2 can be extended with the following templates. A style sheet that incorporates all the templates listed is available online, see "Source-Code Availability" on page 3.
Paragraphs. Formatting paragraphs is easy:
<xsl:template pattern="para">
<P>
<xsl:process-children/>
</P>
</xsl:template>
Emphasis. Designating emphasis is a little more interesting because it can be nested. The following template handles the simple, unnested case:
<xsl:template pattern="em">
<I>
<xsl:process-children/>
</I>
</xsl:template>
If this is the only template for
em
, the result will be nested<I>
tags in the output. We could rely on the browser to handle this case, but let's not. The following rule applies boldface to text that is nested within an already emphasized text segment:
<xsl:template pattern="em/em">
<B>
<xsl:process-children/>
</B>
</xsl:template>
If necessary, additional rules could be added for triply nested emphasis and beyond.
Figures. Presentation of figures involves a bit more processing. The goal is to enumerate the figures in a document and present the figure title as a caption below the graphic (although it appears before the graphic in the source document).
Here's the template for
f
igure:
<xsl:template pattern="figure">
<DIV>
<B>Figure <xsl:number level=
"any"count="figure"/>.</B><BR/>
<xsl:process select="graphic"/>
<xsl:process select="title"/>
</DIV>
</xsl:template>
The
f
igure template begins by constructing aDIV
. Every template must construct a single fragment of the result tree, so there must be a top level wrapper for everything in the figure template. In HTML,DIV
andSPAN
are reasonable wrappers; in XSL,sequence
serves this role.
Next we output the word "Figure" and use
xsl:number
to output the figure number. Thexsl:number
processing instruction counts elements in the source tree. Withxsl:number
you can select single or multilevel numbering, which nodes to count, where to start counting, and the format of the resulting number. In this case, we're countingf
igure nodes anywhere in the document (preceding the current node). If our document were divided into sections or chapters, we might wish to count figures only within the current section. The result will be an arabic number (1, 2, and so on) since we did not specify a format.
The
xsl:process
instruction processes only selected children (or selected nodes from elsewhere in the tree). Thexsl:process
element has a requiredselect
attribute. All of the elements in the source tree that match the pattern specified in theselect
attribute are processed, and their instantiated templates are inserted into the result tree at the location of thexsl:process
element. By default, the select pattern is "anchored" at the current node, but there are facilities for relative and absolute positioning to move the anchor elsewhere in the tree.
First the
graphic
element is processed, then thetitle
. Technically, these elements process all graphics and all titles within thefigure
. If multiple graphics or titles were provided, a more complex select pattern would be required to process only the first. (See the "Suggested Exercises" section.)
Formatting Graphics. The
graphic
element must be transformed into anIMG
tag. Note that theIMG
tag is empty and must therefore use XML empty-element syntax:
<xsl:template pattern="graphic">
<SPAN>
<IMG
src="{attribute(fileref)}"/>
<BR/>
</SPAN>
</xsl:template>
The interesting point here is the use of curly braces in the
src
attribute. XSL provides thexsl:value-of
instruction for computing generated text. Since elements cannot occur in attributes, curly braces in an attribute value are treated as calls toxsl:value-of
.
The
xsl:value-of
instruction takes an expression (implicitly the content of the curly braces), and returns the content of the element or attribute located by that expression. So the template above places the value of thefileref
attribute ongraphic
into thesrc
attribute onIMG
.
Formatting Titles. Finally, the title of the figure must be formatted. Like the document title template, the pattern on this template must be qualified:
<xsl:template pattern="figure/title">
<B>
<xsl:process-children/>
</B>
</xsl:template>
Suggested Exercises
If you're inspired by the examples you've seen so far, here are a few exercises to consider. Some of them will require additional tools not covered here, but described in the first Working Draft.
1. Rewrite the select patterns in the
figure
template to process only the firstgraphic
ortitle
.
2. Correctly handle the HTML
TITLE
element in theHEAD
so that it contains the proper document title rather than a fixed, literal string.
3. Write the style sheet using XSL formatting objects. Using formatting objects will allow your document to be rendered equally well in a variety of media, rather than simply with a Web browser.
Conclusion
The first XSL Working Draft substantially defines the XSL language. Although there is still a long way to go, one only has to look at the original XSL submission (www.w3.org/TR/NOTE-EXL-970910) to see how far we've come.
In this article, I've tried to present some of the motivations for XSL, to demonstrate in a small way its expressive power, and to whet your appetite to review the Working Draft.
The XSL Working Group will continue to make changes to XSL, some of which will not be backwards compatible, but it seems likely that the general direction of XSL can be well understood from the first Working Draft. There are many important and complex issues that must still be resolved, among them: interactivity, support (if any) for a more powerful scripting language, further harmonization of the formatting object semantics, and the definition of many additional formatting objects.
(Get the source code for this article here.)
Norm is a senior application analyst at ArborText (www.arbortext.com). He serves as ArborText's alternate representative on the XSL Working Group. He is also the principal author of DocBook: The Definitive Guide, an O'Reilly & Associates book under development. You can reach him at [email protected].
"The Extensible Style Language (XSL)" by Norman Walsh Web Techniques, January 1999 Web Techniques grants permission to use these listings (and code) for private or commercial use provided that credit to Web Techniques and the author is maintained within the comments of the source. For questions, contact [email protected]. <?xml version='1.0'?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"> <xsl:template pattern="doc"> <HTML> <HEAD> <TITLE>A Document</TITLE> <HEAD> <BODY> <xsl:process-children/> </BODY> </HTML> </xsl:template> <xsl:template pattern="title"> <H1> <xsl:process-children/> </H1> </xsl:template> <xsl:template pattern="para"> <P> <xsl:process-children/> </P> </xsl:template> <xsl:template pattern="em"> <I> <xsl:process-children/> </I> </xsl:template> <xsl:template pattern="em/em"> <B> <xsl:process-children/> </B> </xsl:template> <xsl:template pattern="figure"> <DIV> <B>Figure <xsl:number level="any" count="figure"/>.</B><BR/> <SPAN><xsl:process select="graphic"/><BR/></SPAN> <B><xsl:process select="title"/></B> </DIV> </xsl:template> <xsl:template pattern="figure/title"> <H3> <xsl:process-children/> </H3> </xsl:template> </xsl:stylesheet>
Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.