Dr. Dobb's | Bottoms Up | November 12, 2002

Bottoms Up

Reductionism is popular, but it makes for incohesive sites. Forget top-down, use bottom-up information architecture design to make your Web site whole.

November 12, 2002
URL:http://www.drdobbs.com/bottoms-up/184411741

Web design is under attack. Our enemy is a dangerous meme known as reductionism. This devious adversary is spreading the notion that we can fully understand Web sites as a combination of simpler components, and that we can break the process of design into lots of quick steps and clearly defined deliverables.

You can easily identify the infected. They'll tell you they need a taxonomy, or they're building a thesaurus. Ask them about the purpose of the taxonomy or thesaurus and they'll give you a blank stare. They have no interest in the bigger picture. At this point, you should smile politely, slowly turn away, and then run like hell. Reductionism is highly contagious, and there's no easy cure.

Increasingly, our sites are larger and more sophisticated. And yet, we shouldn't let our sites become small pieces badly joined. The cost of giving up is too high. One way to solve the problem is to take a bottom-up approach.

As we designed the first Web sites and intranets, we had holistic views of our projects and designed with a top-down approach.

We could easily define goals and strategy. We described intended audiences, as well as anticipated information and services. And then we designed a site hierarchy to serve as a container for content and as a navigation framework for users. Those were the days when we had big thoughts about small sites. We could comfortably fit a model of the entire site inside our heads and mull it over while we worked on components. Our design of the parts was informed by an understanding of the whole.

Content Contamination

We all know what happened then. Our sites grew and grew and grew. Twenty-page brochure sites soon became complex information and transaction systems with thousands of pages and dozens of interactive functions. Content production was increasingly decentralized.

At some point, the complexity of our sites overwhelmed us. It became distinctly uncomfortable to fit a holistic model inside our heads. This discomfort made reductionism seductive; why not divide the site into more easily digestible modules?

At first, reductionism was healthy. As top-down redesigns became too large for one person, or even one team, managers broke large projects into smaller projects and assigned teams to tackle specific tasks. Initially, team members resisted any separation from the whole. They insisted on understanding how their work fit into the bigger picture. They insisted on interdisciplinary collaboration, and so we had teams of specialists working toward shared goals.

But sites keep growing and reductionism is a slippery slope. Increasingly, people are simply giving up on the big picture. They act locally but don't think globally. These individuals now design their parts in utter ignorance of the whole.

Content Analysis

At the very bottom of each site sits the content that users spend time trying to find. We must immerse ourselves in this messy reality before we can craft workable solutions. A bottom-up approach will help us do that.

Begin content analysis by gathering a representative sample of your site's content. There's no need to get overly scientific with the sample definition. Rather, adopt a Noah's Ark approach: Try to capture a couple of each type of animal. "Gathering Content" should help you distinguish one beast from another and create a diverse, useful sample.

When you examine each content object, ask basic questions such as: What is this content object? How can I describe it? What distinguishes it from other content objects? How can I make it findable?

Once you've reviewed a few dozen content objects, patterns and relationships will emerge that inform your definition of structural, descriptive, and administrative metadata.

We'll return to metadata in a moment, but first let's acknowledge what we haven't done. We haven't allowed the overwhelming volume of documents and applications to scare us from looking at any content. While we may not be able to see the whole picture, our designs will at least be informed by parts of the whole.

Information Ecologies

Now let's survey the broader research terrain. Our goal is to learn enough about the existing system to make real improvements during the redesign. An important step toward success is the recognition that we really can make things worse if we're not careful.

Ecologies offer a useful analogy. In the natural world, researchers have opened our eyes to the interdependence of systems. While we might love to rid ourselves of pestilential mosquitoes, we now understand the critical role they play as a source of food for ants, bats, birds, and so on.

This type of hidden interdependence is also present in our information ecologies. Changes to a single interface or subsystem may have a ripple effect with harmful consequences. To avoid inadvertently damaging something, we need a well-rounded research program designed to study the information ecology from top to bottom. Core elements should include: stakeholder interviews, user interviews, usability testing, search log analysis, and content analysis.

Blind Taxonomies

Once you've done content analysis and researched the heck out of your information ecology, you can finally put on blinders and build the taxonomy, right? Yes, absolutely, as long as you're not bothered by creating something that's unusable and unsustainable. The first rule in resisting reductionism is that you never really get to put blinders on. Focus and discipline are good. Willful ignorance and obliviousness are not.

I have been flabbergasted in recent months by taxonomy construction projects in Fortune 500 companies. Some completely lack user research, and there is often a fierce resistance to discussing how the taxonomy will be used. Let's just focus on the taxonomy, they say. We don't want to get distracted by implementation details.

It appears that reductionists have co-opted the taxonomy. While they may have the best intentions, they know just enough to be dangerous. By ignoring the broader context, they are crafting taxonomies in the dark. There will certainly be a backlash when these costly taxonomies shrivel up in the light of day.

Faceted Classification

Faceted classification is a hallmark of the bottom-up approach and suggests yet another reason why the phrase "build the taxonomy" is ill-conceived. Inspired by Yahoo and encouraged by portal software vendors, many Web and intranet managers have embarked on a long, painful, and doomed journey to build a single, all-purpose enterprise taxonomy. In a world where sites grow but budgets shrink, these monolithic top-down taxonomies will eventually be exposed as unwieldy and unusable.

The bottom-up approach suggests a very different way to classify content. When populating a top-down taxonomy, the central question is "where do I put this?" but at the heart of the bottom-up approach is the question "how do I describe this?" By asking this subtly different question, you'll wind up in a dramatically different destination. Where the top-down question evokes a single answer, the bottom-up question suggests many answers. You may describe a particular document according to any or all of the following categories:

Topic What is this document about? What are the major subjects?
Product Which of our products is featured in this document? How about our competitors' products?
Document Type What is the format of the document? Is it a technical report, a white paper, a news article, an e-service application, a FAQ, a product specification?
Source Who created this document? Which department was responsible for its creation?
Intended Audience For whom is this document intended or appropriate? Which segments of our customers or employees may or may not be interested?
Geography Is this document only applicable to people in specific regions, countries, or locations?
Price Is there a price associated with this document or the products it describes?

You can undoubtedly come up with many more ways to describe a document. What's important is that the bottom-up approach leads you toward the identification of many facets, and eventually the creation of multiple taxonomies.

To bring this discussion of faceted classification down to earth, it may be helpful to consider the more established world of databases. All we're doing here is applying the principles of relational database design and the notion of fields within a database to the muddier world of Web sites and intranets. Facets are fields. And, in most contexts, you'll want to define multiple fields, rather than lumping apples, oranges, and papayas into one big container.

Controlled Vocabularies

Where facets are fields, controlled vocabularies are acceptable values. For each concept within a facet, you'll need to define a preferred term (i.e., acceptable value) and one or more variant or equivalent terms. This will enable your system to manage synonyms, homonyms, misspellings, abbreviations and acronyms, and other ambiguities of language and categorization.

Some of these facets will be flat lists. For example, you may have a single flat list of ten to twenty logical document types. Others, such as geography, may be hierarchical taxonomies, specifying office locations within cities within states within countries within continents.

As you're developing these controlled vocabularies, it's critical to work on the details with a view of the whole. Design of a topical taxonomy, for example, should be influenced by the existence (or lack thereof) of a product taxonomy or a geographic taxonomy. This awareness will make all the difference between focused, complementary vocabularies and ones that are confused and overlapping.

Content and Metadata Management

This brings us to the unfulfilled promise of content management systems (CMS). In the late 1990s, many of us in the information architecture community saw CMS as an exciting opportunity to really make use of the controlled vocabulary metadata produced by the bottom-up approach. In short, where others saw software for controlling digital assets, we saw metadata management systems.

Unfortunately, the others are winning. Instead of enabling distributed solutions for capturing and utilizing controlled vocabulary metadata, many CMS installations have been focused purely on controlling the publishing process and allowing the repurposing of a limited suite of digital assets.

Many companies have continued to use the same top-down process for building Web sites despite their ownership of CMS software. This conjures images of a modern pyramid construction project, with thousands of sweaty laborers lugging blocks of stone while a hydraulic crane with a 16,000-ton lifting capacity sits idle.

Companies that are able to marginalize the "you watch your assets and I'll watch mine" mentality can really tap the power of content and metadata management systems to strike an intelligent balance between centralization and decentralization. Centralized teams can focus on designing broad, shallow enterprise-wide controlled vocabularies. Departmental teams can use the enterprise infrastructure as a starting point for deep dives into particular subject or product vocabularies. A unified metadata registry can provide global rules while allowing for local extensions. Once again, we can use the bottom-up approach to define digestible chunks without losing a sense of the whole.

Searching and Browsing Systems

None of this bottom-up work will have value if the systemic perspective isn't carried through into the interface design process. After all, a solid foundation is only as good as the house it supports. And the design of good houses requires an understanding of both the construction materials and the behavior of real humans.

Figure 1

Click for larger image
Because search quality depends on systems of a site working together, you might consider search as a window into the site's soul. If search works, you've probably got a healthy site.

We know, from decades of research in the fields of library science and information retrieval, that the information-seeking behavior of humans is iterative and interactive. People often don't know exactly what they're looking for and their experience with the information system can change their very goals and expectations repeatedly.

Consequently, an understanding of construction materials and human behavior leads us to the conceptual model of the search system pictured in Figure 1.

A successful user experience requires harmony between these components. It's not sufficient to choose a great search engine. And it's not enough to pursue a content-centric or business-centric or user-centric process. We must take a systemwide view if we are to tap the strength of the bottom-up approach to support powerful, flexible searching and browsing, while simultaneously supporting an efficient, distributed model for designing and managing those complex adaptive systems known as Web sites.

Sound difficult? It is. But if we shy away from the challenge and ignore the forest for the trees, we will all be diminished by our lack of vision.

Gathering Content

This excerpt from Information Architecture for the World Wide Web, Second Edition (O'Reilly & Associates), by Louis Rosenfeld and Peter Morville, details key issues involved in analyzing content.

Format Aim for a broad mix of formats, such as textual documents, software applications, video and audio files, and archived email messages. Try to include offline resources such as books, people, facilities, and organizations that are represented by surrogate records within the site.

Document Type Capturing a diverse set of document types should be a top priority. Examples include product catalog records, marketing brochures, press releases, news articles, annual reports, technical reports, white papers, forms, online calculators, presentations, spreadsheets, and the list goes on.

Source Your sample should reflect the diverse sources of content. In a corporate Web site or intranet, this will mirror the organization chart. You'll want to make sure you've got samples from engineering, marketing, customer support, finance, human resources, sales, research, and so on. This is not just useful, it's also politically astute. If your site includes third-party content such as electronic journals or ASP services, grab those too.

Subject This is a tricky one, since you may not have a topical taxonomy for your site. You might look for a publicly available classification scheme or thesaurus for your industry. It's a good exercise to represent a broad range of subjects or topics in your content sample, but don't force it.

Existing Architecture Used together with these other dimensions, the existing structure of the site can be a great guide to diverse content types. Simply by following each of the major category links on the main page or in the global navigation bar, you can often reach a wide sample of content. However, keep in mind that you don't want your analysis to be overly influenced by the old architecture.

Peter is president of Semantic Studios (www.semanticstudios.com), a strategy and information architecture consultancy, and coauthor of Information Architecture for the World Wide Web (O'Reilly and Associates).