1. Load an initial set of URLs for a query from Google 2. Begin Parallel * Lock the URL table * Find an unprocessed URL in the URL table * Set the status of the URL to v (visiting) * Unlock the URL table * Get the text and links for the URL * If the text contains the query keywords * add the links from the URL to the url and link tables * else * delete the URL and all associated links from the url and link tables * check for termination 3. End Parallel 4. Wait until all spiders processed in the parallel section complete 5. Clean up and remove URLs and links that have not been processed
Example 1: The smart spider algorithm.