True Desktop Search

Finding what you want when you want it is often easier said than done. Luckily the lines between the desktop and the Web are blurring—and the race is on for the best desktop search tool.


May 19, 2006
URL:http://www.drdobbs.com/database/true-desktop-search/188100593

Eric J. Bruno is a consultant in New York who has worked extensively in developing real-time trading and financial applications. Contact him at [email protected].


Before the advent of the World Wide Web in the early 1990s, performing searches on desktop computers often involved the familiar DOS command, dir, with the /s command switch. You could, for instance, have searched for all text files on a computer with the dir *.txt /s command. In 1995, with the Web growing rapidly, Digital Equipment Corporation (DEC) created the AltaVista search site. Used mainly to showcase DEC's Alpha hardware, AltaVista consisted of two main servers—Alta and Vista—which would scour the Web's HTML pages. As a result, AltaVista pioneered many of today's indexing and search standards.

It wasn't long before Yahoo! became a leader in Web search, and, thanks in part to its partnership with Yahoo!, Google soon took the lead. In contrast, the power and popularity of Web searches far exceeded what was available for the desktop. It was more difficult to find information on your computer's hard drive than in the vastness of the global Web. This was the driving factor behind the introduction of desktop search tools, and the ensuing battle (similar to the browser wars of the late '90s) for desktop presence.

Inside Desktop Search

Desktop search tools, such as Google Desktop Search, quietly index all of the content on a computer's hard drive, including the contents of e-mail messages, text files, Web-browser history, Microsoft Office documents, instant message conversations, audio and video files and so on. For example, Google Desktop's preferences page (see Figure 1) lists the different types of files that can be indexed, and lets you specify which ones to include. The contents of the local index file are typically kept private, but are available to local users to locate content on a computer, or even shared network drives.

Figure 1
Google Desktop indexing preferences.

When a desktop search is performed, it's executed against the search engine's index of all content on the local computer (translation: it's fast!). The results of a search using most desktop search tools appear as they do with Web searches. Google Desktop results are broken out by content type (marked by a unique icon on the left), with thumbnail previews provided on the right for applicable results (Figure 2).

[Click image to view at full size]

Figure 2
Google Desktop Search results.

Some of the more popular desktop search tool providers include:

All desktop search tools work in roughly the same way. First, all of the content on the computer's hard drives is indexed for quick lookup. This is typically performed by a set of filters, sometimes implemented as plug-in components, which understand different types of content (Figure 3). With a plug-in architecture, support for new content types is added by installing additional plug-ins. Most desktop search tools provide an API that allows developers to extend the index engine to support proprietary content. For instance, indexing media content (video, audio and images) involves the metadata associated with the content, which is included as part of the media format. If a new media format is invented, the vendor should also create filters for the common desktop search tools to let the content appear in a user's search results.

Figure 3
Files are indexed by filter components that understand the file's content.

The index is basically a word list, where each word (or phrase) is associated with the set of files in which it appears, along with the word's location within each file. Creating this index takes time, and most of the common desktop search tools let you adjust when and how the index is created. Once the index is created, it takes relatively little time and processing power to keep it up-to-date as files are changed, added and deleted from your hard drive.

Next, searches can be executed against the indexed content, with meaningful results presented to users. Although the mechanics of searching is similar for all of the desktop search tools, the presentation of the search results is where they diverge. Google Desktop, for instance, displays its results within a browser in similar fashion to a www.google.com Web search (Figure 2). When you click on a result, such as an e-mail message, the associated application (Outlook, for instance) launches with the associated content.

Other desktop search tools, such as Copernic and Ask, have a richer user interface, and include a higher level of application integration (Figure 4). Ask integrates with installed desktop applications, such as Office, and allows you to search for content (or files) you wish to edit instead of looking through directories on your hard drive. This is a paradigm shift: it's the content that's searched for and located, not the file that contains it. It's subtle, but it may help boost productivity for many users.

[Click image to view at full size]

Figure 4
Copernic includes a rich user interface with application integration for the embedded display of content.

Ask Desktop is unique because it integrates with Web services such as MyStuff and the Ajax application, Writely. This lets you index and search documents created with Writely, even though the files don't live on your desktop. This is a key point, as it illustrates how Ajax is extending the local desktop to the Web, thereby creating a "virtual desktop" that will need to be integrated and searched. It also illustrates how Web-service integration will be crucial for future desktop applications.

Apple, Linux and Open-Source Support

Most of the available desktop search tools are Windows-centric. Google, Yahoo!, Copernic and Ask, for example, don't offer non-Windows versions of their tools. There are tools available for other operating systems, and there are even open-source desktop search tools available. For instance, Mac OS X Tiger comes with Spotlight (www.apple.com/ macosx/features/spotlight), an excellent desktop search tool that's integrated with the Tiger menu bar.

For Linux, Beagle is an open-source desktop tool that searches content for common Linux applications such as OpenOffice, KMail and Evolution for e-mail and calendaring, Gaim for instant messages, Firefox, Epiphany and Konqueror for browser history, as well as common video, audio and image file formats. Beagle is written in C# with the Mono project (.NET for Linux), and uses a C# version of Lucene (http://lucene .apache.org) as the indexer and search engine.

As for other open-source desktop search tools, there are two that look promising: The File Seeker and Lucene Nutch (both in beta form). The File Seeker is an open-source desktop search tool available from SourceForge (http:// sourceforge.net/projects/fileseeker) that is available in C++ and Delphi, and supports Windows versions from Windows 95 to Windows XP. Nutch (http://lucene.apache.org/nutch) builds on Lucene, which is written in Java and is consequently multiplatform. It adds support for indexing, crawling and parsing content in multiple formats.

Some organizations may be interested in using open-source search tools because they provide complete transparency into the algorithms used to rank search results. Most commercial vendors view this as intellectual property, and therefore don't expose their algorithms. This leads to the suspicion that results can be ranked based on vendor partnership or other, commercial, agreements. The use of open-source tools avoids this, and allows organizations with specific requirements to rank results as they see fit.

Programming Desktop Search

Although you may not be interested in customizing an open-source desktop search tool, you may be interested in extending one of the more popular tools, such as Google Desktop. For this reason, most desktop search providers include an API that allows you to extend their tools. Most of the tools let you at least add new content-type filters. This enables you to extend the tools to index and search content for your own custom applications.

However, some of the tools, such as Google Desktop and Windows Desktop, go further. Google, for example, lets you hook various indexing and search events. Both Google Desktop and Windows Desktop Search allow you to integrate their index and search engines right into your own applications. Let's take a look at the APIs available from the desktop search tool providers and examine the capabilities in each.

Beagle API

The Beagle API contains two main types of components you can develop to extend the indexation and query capabilities of Beagle. These components are called Beagle Filters and Beagle Backend components.

Beagle Filters are components that extract pertinent information from an item to be indexed (such as an e-mail or an OpenOffice document). Filter components rely on Beagle Backend components to retrieve data items and stream the indexable content to them.

Beagle Backend components can be divided into two subtypes. Indexable components have programmed-in knowledge of how to locate specific data items for indexing and inform the Beagle engine of items available to be indexed. Queryable components know how to query other data sources at search time that aren't feasible to be indexed themselves. For instance, there is a Google backend component that queries Google and returns the results to the Beagle engine.

Windows Desktop Search Interfaces and SDK

Microsoft lets you enhance its desktop search tool to add support for new file types to be indexed, add new data sources to locate content to be indexed or searched, and integrate desktop search results into other applications. For instance, a human resources application can be extended to display em-ployee information from a corporate database, as well as information about that employee from a representative's local hard drive or some other private store.

Windows Desktop Search components are built as COM objects that implement specific interfaces. For instance, to add support for a new file type, your component must implement the IFilter interface. The result is a component that knows how to extract the searchable information from the file type in question, can be invoked by the desktop search engine and can be integrated with Windows Explorer.

Here's a complete list of COM interfaces and their function:

Copernic Plug-Ins

Copernic can use both built-in and custom plug-in components, called file extractors, to obtain the contents of varying file types to be indexed and displayed within its user interface. File extractors are written as Microsoft COM objects, and there is support for .NET and C#. The only COM interface required is ICopernicDesktopSearchFileExtractor, which defines these methods:

Google Desktop Search SDK

The Google SDK contains several APIs (also in the form of COM interfaces), which let you extend the Google index engine, search engine, search results presentation and the Google Sidebar display.

The Index and Query APIs are straightforward and extend Google Desktop in ways similar to the other tools discussed. However, the other APIs let you customize the behavior of Google Desktop. The Event API, for example, lets you build components that receive certain events while content is being indexed. By intercepting these events, you can filter out certain types of content from being indexed.

The Display API lets you tailor the way content is displayed to the user in the Google Sidebar and Alert windows (Figure 5). For instance, a search that returns weather information is displayed in the Sidebar using appropriate graphics—sun, clouds, rain and so on. This API lets you choose the best and most creative ways to display your custom content.

Figure 5
This Sidebar, from the Google SDK documentation, displays various content types and can be extended to display your custom content.

Beyond custom content result presentation, you can control precisely what happens when a user clicks on result, either in the Google results window or the Sidebar with the Action API. Custom action components implement the appropriate COM interfaces, such as IGoogleDesktopRegistrar and IGoogleDesktopCustomAction, to perform their actions.

Conclusion

The battle is on for desktop presence, with vendors such as Google and Microsoft competing for installations. Google, for instance, has partnered with both Sun and Dell. When you download the Java runtime or buy a new Dell computer, the Google Toolbar and Google Desktop are also installed. Dell has also agreed to let Google integrate with its website.

Additionally, the lines between a computer's desktop and the Web are further blurred. It's even becoming important for desktop applications to integrate with Web services.

You should take advantage of SDKs that let you integrate your own applications with other applications that your users find useful. If, as a result, users perceive an increase in value for both applications, it will further the success of your application.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.