Redirected from Distributed crawling
As of 2003, most modern commercial search engines use this technique. Companies such as Google use thousands of individual computers in multiple locations to crawl the Web.
Newer projects are attempting to use a less structured, more ad-hoc form of collaboration by enlisting volunteers to join the effort using, in many cases, their home or personal computers. LookSmart is the largest search engine to use this technique in its Grub distributed web-crawling project.
The following is a proposed solution, but does Grub (or others) actually use this algorithm? One solution to this problem is using every computer connected to the Internet to crawl some Internet adresses (URLs) in the background. After downloading the pages, the new pages are compressed and sent back together with a status flag (changed, new, down, redirected) to the powerful central servers. The servers manage a large database and send out new URLs to be tested to all clients.
Search Encyclopedia
|
Featured Article
|