Bottom-Up Architecture

By Lou Rosenfeld, January 01, 2002

WebReview.com:Bottom-Up Architecture

At a Glance

Information Architecture for the World Wide Web
by Lou Rosenfeld and Peter Morville

Past Web Architect columns have, to a lesser degree, been focused on top-down architecture. I think that's why Peter and I were suffering writer's block. We'd already written quite a bit about this topic. The funny thing was that our actual real-life consulting work had already taken a hard turn toward bottom-up architecture, just like the overall market is doing.

Hmmm...maybe bottom-up architecture is what many of our future Web Architect columns should cover? The writer's block has been broken!

The last article we wrote came out in November, 1997, which means it was probably written in October. In looking at my calendar, I see that it's now August, 1998 -- I guess you could say we've been gone for a while.

So, what's been happening over the last nine months? Well, lots, but what has interested us the most has been XML. For once, something that is getting a lot of publicity actually seems to hold real, long-term promise, unlike such previous mega-hyped flashes-in-the-pan as VRML, push, and channels.

XML, the eXtensible Markup Language, holds a lot of promise because it supports defining the logical meaning of documents. Instead of marking them up primarily for display purposes, as HTML does, we can define tags and markup documents for specific types of needs. For example, we can define custom tagsets designed to enable efficient transmission of patient records around the medical industry. Or, if we wanted to perform linguistic analysis on a set of documents -- let's say we wanted to compare the Iliad and the Odyssey to see if they really had the same author -- we could mark them up noun-by-noun, verb-by-verb, and so on.

XML is really a sign of things to come. There are many new technologies that are enabling Web developers to get better at dealing with the guts of their sites; the actual content. Document management systems are getting better at using metadata to allow us to track important aspects of our content, such as document versions, sources, workflow, and, through indexing, topics. Database application vendors are getting better at understanding that data (e.g., numbers and facts) and information (e.g., abstract concepts and ideas) aren't the same thing, and are doing more to support information retrieval in their products. Incredibly, such high-falutin' terms as "knowledge management" and "information architecture" are at least becoming recognized, if not commonplace.

The Trend: Top-down, bottom-up

Thanks to the benefits of these improved technologies, not to mention our increased collective experience, we're all getting better at what we call "bottom-up" information architecture. The challenge of bottom-up architecture is to reduce the ambiguity inherent in large, nebulous bodies of content. In other words, bottom-up architecture is about getting your arms around the content so you can organize and manage it better, and so that users can find what they're looking for in these huge messy blobs of content. And messing around with the literal guts of your content will increasingly mean working with technologies like XML.

This is very different from "top-down" information architecture, which involves reducing the ambiguity inherent in nebulous situations. Such as when a large organization tries to develop a user-centric information system, like a corporate intranet, against the backdrop of turf politics. (I hear you snickering.) The top-down approach tries to get at the mission of the site -- an understanding of its major audiences, and the scope of its content and functionality -- in order to determine how the site should be organized, browsed, and searched.

Promises and warnings

Now that we've broken through, I'd like to describe a theme that will likely pervade many of our upcoming columns. It's true that all the new technologies and tools that have recently arrived upon the scene are wonderful. But benefiting from them won't be easy. There are three major reasons for this:

The process of marking up documents logically (a-la XML) is really, really hard.
So is the process of manually indexing documents.
Things that are really hard to do are generally really, really expensive.

Figuring out how you will structure documents requires a lot of forethought and planning. First, you need to have an idea of how to do the structuring. What are the important logical components that you want to get at? And how far do you want to go? For example, is it enough to create tags for a document's author? Or is it important to create separate tags for authors, editors, proofreaders, and others involved in the process of document creation?

Determining which tags to apply takes significant intellectual effort, and the more numerous and more complex the tags are, the more time the entire markup process will take. So you may be between a rock and a hard place, trying to balance the costs of applying a technology like XML with its benefits. And then there's the process of actually going through and manually tagging your documents.

Manually indexing documents is similarly challenging. According to last October's The Forrester Report, the cost of indexing a 15,000 page site would be $960,000 during 1998; and for various reasons, this figure would creep past $2 million by 2001. Argus' own experience in manually indexing thousands of documents for a major telecommunications firm's intranet support Forrester's figures.

Again, there are related planning issues: It is very difficult to determine the appropriate set, or "vocabulary," of terms one should use to index a collection of documents. They need to match both the language of users with the concepts represented in the content. Unfortunately, the more individuals you have applying these terms, the less consistently they will be applied, thereby reducing the utility of the entire effort.

New tools and technologies that support the development of bottom-up architectures will definitely make a huge impact on the development of all information systems -- Web-enabled or otherwise -- over the coming years. But they will work effectively only if we understand and are willing to make the non-trivial investments necessary to structure and index our content. If we are unwilling to make these intellectual investments, we might be better off sticking with plain ol' HTML.

As always, please let me know what you think.

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.