On Thursday, the three companies plan to reveal support for Sitemaps .90, a protocol that lets Web publishers inform search engines about content on their sites.
"We just wanted to make it easy for site owners to have one file they could maintain on the site," says Vanessa Fox, product manager for Webmaster Central at Google.
"For a long time, most people in the search industry have thought it would be great to have a unified format to submit content to search engines," says Tim Mayer, senior director of global Web search at Yahoo. "There are a lot of sites that are difficult to crawl, like shopping sites or those with dynamic URLs. This enables a publisher to give us a feed of all the URLs for a specific site so we can access them."
A Sitemap is simply an XML file that describes a Web site's file and directory structure. Beyond identifying fresh content for search engines, Sitemaps help save bandwidth for publishers and search engines alike by avoiding unnecessary crawls to identify new or changed files.
"The quality of your index is predicated by the quality of your sources, and Windows Live Search is happy to be working with Google and Yahoo on Sitemaps to not only help Webmasters, but also help consumers by delivering more relevant search results so they can find what they're looking for faster," said Ken Moss, general manager of Windows Live Search at Microsoft, in a statement. "I am sure this will be the first of many industry initiatives you will see us working and collaborating on."
Sitemaps augment rather than replace search engine crawls. And they do not limit the visibility of files to search engines. That's a function of the robots.txt file, which is used to prevent files from being indexed (though not all crawlers obey).
Publishers have been asking for standardization for how search crawlers and other bots deal with directives in a robots.txt file, says Mayer, who indicates the large search engines are open to input about this.
Further information can be found at Sitemaps.org.