Encyclopedia > Web crawler

  Article Content

Web crawler

A web crawler (also known as web spider) is a program which browses the World Wide Web in a methodical, automated manner. Web crawlers typically keep a copy of all the visited pages for later processing - for example by a search engine.

In general, the web crawler starts with a list of URLs to visit. As it visits these URLs, it identifies all the links in the page and adds them to the list of URLs to visit. The process is either ended manually, or after a certain number of links have been followed.

Web crawlers typically take great care to spread their visits to a particular site over a period of time, because they access many more pages than the normal (human) user and therefore can make the site appear slow to the other users if they access the same site repeatedly.

For similar reasons, web crawlers are supposed to obey the robots.txt protocol, with which web site owners can indicate which pages should not be spidered.

see also spider, Google, PageRank



All Wikipedia text is available under the terms of the GNU Free Documentation License

 
  Search Encyclopedia

Search over one million articles, find something about almost anything!
 
 
  
  Featured Article
Northampton, Suffolk County, New York

... 30.3% from 25 to 44, 20.9% from 45 to 64, and 9.8% who are 65 years of age or older. The median age is 34 years. For every 100 females there are 91.0 males. For every ...

 
 
 
This page was created in 22.3 ms