Encyclopedia > Distributed crawling

  Article Content

Distributed web crawling

Redirected from Distributed crawling

Distributed web crawling is a technique used in Internet search engines employing many computers to do the web crawling[?] necessary to index the Internet. The idea is to spread the resource requirements of computing power and bandwidth across many computers and network connections.

As of 2003, most modern commercial search engines use this technique. Companies such as Google use thousands of individual computers in multiple locations to crawl the Web.

Newer projects are attempting to use a less structured, more ad-hoc form of collaboration by enlisting volunteers to join the effort using, in many cases, their home or personal computers. LookSmart is the largest search engine to use this technique in its Grub distributed web-crawling project.

The following is a proposed solution, but does Grub (or others) actually use this algorithm? One solution to this problem is using every computer connected to the Internet to crawl some Internet adresses (URLs) in the background. After downloading the pages, the new pages are compressed and sent back together with a status flag (changed, new, down, redirected) to the powerful central servers. The servers manage a large database and send out new URLs to be tested to all clients.

See also:



All Wikipedia text is available under the terms of the GNU Free Documentation License

 
  Search Encyclopedia

Search over one million articles, find something about almost anything!
 
 
  
  Featured Article
Canadian Music Hall of Fame

... the Juno Awards. Complete list of Inductees 1978 Guy Lombardo 1978 Oscar Peterson 1979 Hank Snow 1980 Paul Anka 1981 Joni Mitchell 1982 Neil Young 1983 ...

 
 
 
This page was created in 22.4 ms