Redirected from Malcolm Farmer/How many Wikipedia pages are there
Fiddling about with a Perl script to count the number of pages, I get the following counts, as of the morning of 12th December.
Selecting "comma pages" - pages where there's at least enough text to include a comma, which filters out redirects and a load of one-liner pages, we have
19931 comma pages found: 4 were /Talk[?] subpages 472 were author pages (other than those already excluded above) 261 were Wikipedia pages (other than those already excluded above)
19194 remaining
Subtract 27 for the Biographical Listing indexes, which are just lists of links even though they have commas, and 36 for the Complete list of encyclopedia topics pages and subpages, and that leaves 19131 articles. The current rate of addition of new articles is such that barring the wikipedia server going down in the next few hours, there will be more than 19000 by Thursday 13th December.
Notes:
Possible pages to exclude to refine the counting
I'll say we have "over 6,000 articles" on the main page. Any objections? --LMS
A very interesting acquaintance of mine had a reputation for conservative estimates. If some people boasted, he "reverse-boasted". This was always to deprecate his personal ability, and least for a while, done with some semblance of humility (now its an in-joke, but thats a completely different story). For this, he is renowned and celebrated within his peer group.
I see no harm if wikipedia underestimates the amount of articles that have been written. Such a refreshing change from the usual advertising hyperbole bombarding us from every direction.
Even having 4,000-5,000 articles is really quite a feat!
You're right; I was going to up the count this morning. -- Malcolm Farmer
I tried to use the old search engine to find articles with 2,000 characters (=about three paragraphs), per the instructions on Wikipedia Announcements/March 2001, but wasn't successful somehow. I'm not sure what that number is, but given that there were about 500 articles with 2000 characters when there were about 2000 comma articles, and given that there are now something like 12,000 comma articles, it seems to follow that there are now something like 3000 articles with 2000 characters.
Why do I care? See Wikipedia commentary/Kill the Stub Pages[?]. Recent criticisms of Wikipedia on K5 and Usenet make it clear that our PR might be improved by our counting up more substantial articles ("three paragraphs" is obviously arbitrary, but it's reasonably credible). So we could say "We have 11,000 articles, of which 3,000 have three or more paragraphs."
If we decide to do this, it will be psychologically important what number we choose to advertise on the front page, because that will set a length benchmark that will make an article seem officially "substantial." It might be better, instead, for somebody to (finally!) program a statistics page which gives various article number estimates. Then, on the front page, we could say just "11,000 articles" but link that to the statistics page, where the real deal would be stated.
Ideas??? --Larry Sanger
Perhaps the most informative solution would be a histogram of page sizes, with cumulative totals working backwards. IE., 10 pages of 10k or more, 300 pages of 3k or more, 1000 pages of 2k or more, etc. -J
Ye Olde "500 word essay" springs to mind... That would be a seriously address the "just hype" numbers. Also, maybe we should exclude CIA factbook text as well as pages from the 1911 encyclopedia, to be fairer? Regarding the conservatism argument, I agree; if our numbers are in error, it might be better to be in err on the small side.
-- BryceHarrington
Many CIA pages were edited a lot after import. --Taw
Search Encyclopedia
|
Featured Article
|