This article attempts to compare the size of
Wikipedia with other
encyclopedias and information collections.
See Wikipedia:Size of Wikipedia and Wikipedia:Statistics for estimates of Wikipedia's article count and article statistics, from which the following snapshot was taken:
Snapshots of Wikipedia's size:
- (This combines different measures taken on different days in September, and assumes the average word to be 5 letters and a space -- beware!) As of September 2002, Wikipedia had approximately 42,000 'articles', using very crude criteria for what constitutes an article. Of those, perhaps half were "encyclopedia size" articles. The mean article size was about 1997 bytes, or roughly 332 words: the median article size was smaller, at roughly 980 bytes, or roughly 163 words. Combining the mean article size with the article count gives a very approximate character count of 83.9 megabytes, or 14 million words.
So, by estimated word count as of September 2002, Wikipedia is a quarter of the size of Britannica 2002, and by "encylopedia adjusted" article count it is also about a quarter of the size. However, Wikipedia has already half the number of topics of Britannica, measured by raw topic count.
Update: as of early March 2003, Wikipedia has roughly 108000 articles. Of those, perhaps 36,000 are "data dumped" gazeteer entries about towns and cities in the USA. Ignoring these for the moment, and assuming that the mean article size is still the same, this means that there were at that time approximately 72,000 non-gazeteer articles of an estimated average of 332 words, or 23.9 million words, roughly half the size of the Encyclopædia Britannica's 2002 edition
Not bad for an encyclopedia which is only two years old. But we must do better! Many of the articles are still of poor quality. As the Wikipedia grows more comprehensive, efforts are expected to move more towards increasing the quality and scope of existing articles, rather than the creation of new articles. It is also anticipated that the Wikipedia may grow to include a global gazeteer as part of its function.
See Wikipedia:Modelling Wikipedia's growth for more educated guesses about the potential growth of Wikipedia.
Comparison figures:
- The advertisements for Encyclopædia Britannica's 2002 edition proudly proclaim they have over 85,000 articles. A claimed word count of 55 million words, at an assumed average 5 letters per word and a space, gives an estimated character count of 330 million characters, or a crudely estimated mean article length of 3882 characters.
- The Columbia Encyclopedia, Sixth Edition, is cited as having 51,000 articles and having 6.5 million words. Assuming an average word length of five characters, and allowing for one space character per word, this gives a mean article length of very roughly 765 characters per article for the Columbia Encyclopedia.
- Microsoft's Encarta Encyclopedia 2002 is cited as having 26 million words.
- Microsoft Encarta Deluxe 2002 is cited as having "over 60,000 articles, 10,000 historical archives, and over 40 million words".
- Grolier Multimedia Encyclopedia Online claims 11 million words and 39,200 articles.
- American Jurisprudence[?] 1nd ed. is an 83 vol. collection of American common law, 2nd ed. 231 volumes!
Sizes of other non-encyclopedia information collections, for comparison. Note that Wikipedia is neither a dictionary, nor a web index: these figures are just for order-of-magnitude comparison.
- The New Oxford Dictionary of English claims 350,000 definitions, and four million words.
- The NIMA (http://www.nima.mil/ (http://www.nima.mil/)) GEOnet Names Server[?] contains approximately 3.88 million named geographical features outside the United States, with 5.34 million names.
- The USGS Geographic Names Information System[?] claims to have almost 2 million physical and cultural geographic features within the United States.
- The OUP's New Dictionary of National Biography[?] has a target size of 50,000 articles on famous Britons, in 50 million words (implying an average article size of 1000 words). If a country of 60 million people has 50,000 famous people in its history, a world of six billion people should have 5,000,000 famous people in its history.
- The old Dictionary of National Biography had 36,500 articles in 33 million words.
- The New Grove Dictionary of Music and Musicians, 2nd edition claims "25 million words with over 29,000 articles" about the subject of music alone
- The Merck Index[?] Subscription Edition has over 10,000 monographs on chemical compounds.
- The Beilstein[?] database claims entries on "8 million organic and 1.4 million inorganic and organometallic compounds".
- Each Human being is estimated to have 30,000 to 40,000 genes, each of which probably deserves an article.
- The freedb database holds information for around 703,270 compact discs.
- The dmoz web index claims to have over 460,000 categories (for a total of over 3.8 million websites, but the categories are what is important here).
- The Guide Star Catalog II[?] has entries on 998,402,801 distinct astronomical objects
- The British Library claims that it holds over 150 million items.
- The Library of Congress claims that it holds approximately 119 million items.
- The World Resources Institute[?] claims that approximately 1.4 million species have been named, out of an unknown number of total species (estimates range between 2 and 100 million species).
- As of 2003, there are about six billion human beings, each with their own life story. Billions more have lived and died in the past, although most of their lives are lost to history.
- It is accepted by astrophysicists that the number of particles in the universe[?] is in the 1085 range - much less than a googol (1 with a 100 zeroes after it).
- Black's Law Dictionary 7th ed. has 24,500 common law legal terms.
- Online Mendelian Inheritance in Man[?] (external link (http://www.ncbi.nlm.nih.gov/Omim/))has 14520 entries as of June 7th, 2003. site statistics (http://www.ncbi.nlm.nih.gov/Omim/Stats/mimstats)
All Wikipedia text
is available under the
terms of the GNU Free Documentation License