Self-Service Syndication with ICE
Building Informative Web Pages and Catalogs Automatically
By Dan R. Greening
Newspapers, product retailers, and Web portals face a common problem: How can they provide the most up-to-date content? They can invest in developing their own original content, as does Web Techniques magazine, or they can assemble material from several outside sources and rebrand it under their own name. The San Francisco Chronicle newspaper assembles its comic page by buying comic strips from King Features and Marvel. Wyle Electronics creates product-information Web pages by assembling data sheets from electronics manufacturers. Excite buys news from Reuters and UPI. Checkout.com buys movie, music, and game information from All Media Guide and offers it along with DVDs, CDs, and games in an online store, creating an "entertainment buying experience."
Providing the best possible information on products in a timely fashion is an important form of customer service. All other factors being equal, customers tend to shop with retailers that give the best presale product information. This is forcing online retailers to become information portals where consumers can go to find out more about products (and incidentally, buy those products).
But the retailers don't have the best information -- manufacturers do. They have the greatest incentive to create highly informative product collateral -- presale brochures, specifications, manuals, and rebate coupons -- to help customers buy and use a product. If manufacturers syndicate collateral on the Internet and let retailers subscribe to it, customers can obtain enormous detail on a product before and after the sale. With syndication, online retailers should have a great advantage over brick-and-mortar stores, where maintaining detailed product information is very costly.
At present, online syndicators usually provide content through proprietary or roll-your-own solutions. Some subscribers assemble and rebrand syndicated content by placing links to a syndicator's Web site. Others perform a nightly download from syndicator FTP sites. And sometimes syndicators create and maintain Web sites under different names (kind of an inverted syndication) to which the subscribers can safely link.
These ad hoc forms of syndication are fraught with technical problems. Usually a subscriber is assembling material from several syndicators. On the subscriber side, adding content from a new syndicator involves engineering and Web-design effort. On the syndicator side, every new subscriber has to be trained to use the syndication system, deployment becomes much more dependent on high-demand Web skills, and in the end customers usually get less information.
Ad hoc approaches typically require human involvement to negotiate the simplest operational issues. Does the subscriber have to credit the author or syndicator? Can the syndicator push the content at a particular time of day? Can the subscriber edit the material? Is there an incremental charge for each piece of content? Can the subscriber download only some of the content offered by the syndicator?
In short, without a standard protocol, syndication doesn't scale very well on the Web.
To address these problems, a consortium of application server and content companies, led by Vignette, created a standard syndication protocol based on XML -- Information and Content Exchange (ICE). XML is a simple standard to represent data-hierarchies using familiar HTML-style tags. The ICE protocol standardizes the following functions:
- A potential subscriber requests a catalog of subscriptions offers.
- A syndicator responds with a subscription-offer catalog, each offer detailing the type of content, usage restrictions, and available delivery methods, times, and frequencies.
- A subscriber subscribes to one or more offers, negotiating for specific delivery methods and times.
- A subscriber pulls a package of subscribed content from a syndicator.
- A syndicator pushes a package of subscribed content to a subscriber.
- A syndicator or subscriber cancels or changes a subscription.
ICE is one of the most innovative uses of XML, in part because ICE is mainly a protocol with a little bit of data, while most XML standards focus solely on data. ICE doesn't even specify the format of the syndicated content data. In XML terms, it's just bytes inside an
ice-itementity. The character data inside an
ice-itemcan be structured using one of the industry-specific XML data representations -- BizTalk, RosettaNet, CommerceOne, WDDX, and so on.
Other standards efforts are often placed in the same box with ICE, because Web publishing is only now becoming automated. The World Wide Web Consortium's Resource Description Framework (RDF) and related standards specify how descriptive terms can be attached to content files, allowing content to be identified and selected according to a filtering or sort criteria. These frameworks can be used together with ICE, letting subscribers select offers using complex criteria.
The syndicate/subscribe model ICE defines is almost the same as what computer scientists call "publish/subscribe." And it turns out that ICE is most similar to binary publish/subscribe protocol standards, such as CORBA and DCOM. But in ICE, messages are delivered through XML, typically delivered over an HTTP connection, as opposed to a lower-level binary protocol. ICE is much easier to read and use, but it is also much more verbose. If you're constrained by network bandwidth, either compress the ICE packets or use something else. Finally, ICE defines many typical syndication operations and constraints that CORBA and DCOM leave to vertical industry implementations.
ICE provides only minimal security and access-control facilities. Typically the HTTP transport layer handles security, using such technology as SSL. Access control is handled on the syndication server, which performs authentication through the usual Web server password-access system, and provides different subscription offers to different subscribers.
The ICE protocol defines a set of request-response pairs coded in XML. The ICE standard doesn't specify the underlying transfer protocol, but does suggest an implementation using the HTTP
POST/responsemechanism called "ICE/HTTP". The body of the HTTP
ice-request, and its associated HTTP response contains the ICE response. As far as I know, all current ICE implementations use ICE/HTTP. This article assumes ICE/HTTP is the transport.
ice-requestis contained within an
ice-payload, which identifies the ICE version and the sender, and provides a
request-id. Listing One shows a sample request payload. Most of the header is devoted to debugging information.
Upon receiving a request, the respondent creates a response payload, which contains whatever the subscriber asked for, if available. The response payload contains many of the same header tags as the request payload.
Before syndication can occur, the syndicator must configure a syndication server, specifying which offers are available to which subscribers at what time. Figure 1 shows a syndication console that lets a syndicator add and delete catalogs, offers, delivery policies, and subscribers.
For a subscriber to know what content it can subscribe to, it needs to obtain a catalog of offers from the syndicator. This can be handled in two ways: In the old-fashioned way, the subscriber telephones the syndicator, asks "What's available?" and the syndicator provides a list. In the modern ICE way, the subscriber sends an
ice-get-catalogrequest to the syndicator, which responds with a collection of offers and offer-groups.
Listing Two shows an example
ice-get-catalogrequest, followed by a response. The ICE catalog first provides contact information for a person who can provide more details on the catalog. Then it provides a set of product offers.
Each offer includes the name of the content and the copyright owner. It can also include several typical usage constraints:
atomic-useindicates that all items in the subscription must be offered to the user, otherwise individual items may be deleted from the presentation.
Editableindicates the subscriber can modify the content.
Ip-statusvalues indicate the intellectual-property rights status of the subscription:
showcreditindicates that the subscriber must display the copyright owner with the content.
Usage-requiredindicates that information regarding viewers of the content must be provided to the syndicator. Other constraints can be encoded in a mutually agreed upon format and offered in a
Offers can be built in to a display hierarchy for convenient navigation using
ice-offer-grouptags. A subscriber can't subscribe to an ice-offer-group; the name of the offer-group is only a mnemonic. Subscriptions are made using the name in the
ice-offertag, even if they are embedded inside an
There are four offers shown in Listing Two:
Local:Art:Daily:Electroluminescent. The last two are organized under the
With every offer is a set of delivery policies, including the
delivery-mode, availability dates, and more detailed availability information.
Content can be delivered in pull or push mode. If the subscriber specifies pull delivery, the subscriber always makes the requests and the syndicator always responds -- content is delivered only when the subscriber requests it. Pull delivery makes programming a subscriber quite simple: You get what you ask for when you want it.
Push delivery, on the other hand, requires the subscriber to run a Web server to handle pushed deliveries, which come in the form of an HTTP request from the syndicator. The HTTP request could contain a large payload with one or more articles. The subscriber usually confirms with an
There are some fairly complex rules for specifying when and how often a subscriber obtains new content from a subscription. In Listing Two,
Mecan be pulled at most once per day (
maxcount="1") between midnight (
starttime="00:00:00") and 4:00am (
duration="P14400S"specifies 14400 seconds from
To subscribe to an offer, the subscriber simply sends an
ice-offerback to the syndicator in an
ice-request. The offer is usually taken verbatim from an
The subscriber can modify any field marked
ice-catalog. If the syndicator responds with
OK, then the negotiated offer is accepted. If the syndicator responds with
Sorry, it was rejected with no further information. If the syndicator responds with a different
ice-offer, the subscriber can consider it a counteroffer, and submit it back to the syndicator as an
ice-request, fairly confident it will be accepted.
A catalog of
ice-offers can be presented "the old-fashioned way," by sending offer information not through an
ice-catalog, but rather via the Web, email, fax, or voice. People who process subscriptions then click on offers of their choosing or cut-and-paste to subscribe.
Figure 2 shows an example Web-page catalog from the National Semiconductor subscription site (ice.national.com). In this case, it is a "catalog of catalogs": It contains various National Semiconductor catalogs to which my site can subscribe. The NSC Product Folders #1(XML) subscription, for example, presents product information in XML on every National Semiconductor product. Clicking on an offer establishes a subscription.
To fulfill subscriptions, the syndicator sends directives in an
ice-packageto update a subscriber from an
ice-item-groupentities specify additions, and
ice-item-removeentities specify deletions.
If a package includes an activation field, the subscriber must not perform the operations before the specified time. For news items, this is typically a release time. A business might want earnings news released after the stock market closes. A politician might want a speech transcript released after a press conference. News editors tend to respect these time constraints, in part to ensure future access to "hot news." ICE can automate the delivery process.
Packages have several other attributes drawn from typical syndication requirements. Some of the fields are the same as
ice-offerfields. These fields modify the package only, while the
ice-offerfields refer to the entire subscription. A number of parameters that appear on the offer can also appear on the package entity --
Exclusion. For example, the Miss Manners column might have an exclusionary clause that the article can't be used unless the author's picture is displayed.
ICE forces packages to be processed in the order specified by the syndicator. The state of a subscriber can be defined by a single value -- the package sequence identifier (PSI). Each package sent has the "old PSI" (the required subscriber state prior to receipt of the package) and the "new PSI" (the state of the subscriber following receipt of the package. Prior to subscribing, the subscriber is in the "empty state" indicated by
"ICE-INITIAL". If it doesn't matter what the previous state was, a package can be sent with
"ICE-ANY"as the PSI.
Using sequence IDs reduces or eliminates the state information stored on the syndicator side. Subscription management becomes the purview of each subscriber. The syndicator has to remember a subscriber's state only if it supports push delivery.
PSI strings are opaque to the subscriber, except when they need to be compared for equality. This gives the syndicator enough flexibility to use an implementation-specific state encoding. For example, the implementation might use integers, time stamps, or a proprietary database key as the PSI.
National Semiconductor offers thousands of complex parts through distributors and retailers. By syndicating product information through ICE, National's distributors can provide the most up-to-date product information to customers. Syndication is very handy for such vendors.
Listing Three shows an example subscription item from the National Semiconductor XML catalog. Ice-item
1333appears in the beginning, an XML description for part
100301. Figure 3 shows the same item integrated into a distributor's Web page.
It's tempting to say this article describes the tip of the iceberg, but ICE is a fairly simple standard. Most of the ICE features omitted in this article relate to error handling and offer negotiation. For more details on ICE or vendors providing ICE applications, refer to the boxes titled "ICE Standardization" and "ICE Resources."
The ICE standard makes it easier for syndicators to deliver information in a controlled way to subscribers. In traditional publishing, writers, artists, composers, and producers "outsource" contract and delivery issues to syndicators, allowing the artisans to focus on their craft. Syndicators then achieve economies of scale by performing the same function for multiple artisans.
Some Web companies follow this traditional model. iSyndicate, for example, assembles content from individual Web pages, packages it, and allows portals to subscribe. If you're an author, it's easy to publish your material through iSyndicate. Using ICE, portals will be able to easily subscribe to syndicated content from iSyndicate.
As usual, the Internet changes traditional definitions, because it makes automated negotiation possible and speeds information transfer. Syndication is no exception. On the Internet, a syndicator can be anyone with a large collection of uniformly structured data made available to multiple subscribers. This means parts manufacturers, icon collections, free software aggregators, stock-photo libraries, and even Web-traffic analyzers can be syndicators.
Now that the Web has matured, and people seek useful information among too much noise, obtaining the best-quality information for your visitors becomes more challenging. Typically, this means subscribing to, assembling, and controlling multiple sources of news and data. The ICE protocol makes it possible for site developers to do this using a single system, allowing everyone to spend more time on creative tasks.
(Get the source code for this article here.)
Dan holds a Ph.D. in computer science from UCLA. He is currently chief technology officer at Andromedia. He can be reached at email@example.com.