I estimate our median article size as 1000 bytes, because that's the size of our 18943rd longest page according to long pages. (18943 would be the median of 37886 total articles.) To my mind, a conservative count of articles would place an 1000-byte minimum, rather than a 1000-byte median, which would trim our total article count in half. But no matter how we count articles, let us at least prominently post the median size of the articles which are included in the count. And please, please don't call the count "unimpeachable". (For refernence, my little tirade (including this sentence) is 1367 bytes long, i.e. rather longer than our median article.)
--Fritzlein 02:55 Aug 17, 2002 (PDT)
I agree.
It would be good to have the info on this page, or a link on this page to where it can be found.
If I get an answer - this time I'll try to keep the info! David Martland 15:20 Dec 13, 2002 (UTC)
Don't know if this is what you want, but:
$ ls -l total 2801904 -rw-rw---- 1 mysql mysql 8852 Aug 9 19:27 archive.frm -rw-rw---- 1 mysql mysql 21077440 Dec 13 17:19 archive.MYD -rw-rw---- 1 mysql mysql 1024 Dec 13 17:19 archive.MYI -rw-rw---- 1 mysql mysql 8586 Jul 20 19:30 brokenlinks.frm -rw-rw---- 1 mysql mysql 7599784 Dec 13 17:58 brokenlinks.MYD -rw-rw---- 1 mysql mysql 6398976 Dec 13 17:58 brokenlinks.MYI -rw-rw---- 1 mysql mysql 9114 Nov 22 08:21 cur.frm -rw-rw---- 1 mysql mysql 451396440 Dec 13 17:59 cur.MYD -rw-rw---- 1 mysql mysql 201449472 Dec 13 17:59 cur.MYI -rw-rw---- 1 mysql mysql 8756 Jul 20 19:30 image.frm -rw-rw---- 1 mysql mysql 8586 Jul 20 19:30 imagelinks.frm -rw-rw---- 1 mysql mysql 192136 Dec 13 17:51 imagelinks.MYD -rw-rw---- 1 mysql mysql 175104 Dec 13 17:51 imagelinks.MYI -rw-rw---- 1 mysql mysql 457448 Dec 13 15:53 image.MYD -rw-rw---- 1 mysql mysql 215040 Dec 13 15:53 image.MYI -rw-rw---- 1 mysql mysql 8706 Jul 20 19:30 ipblocks.frm -rw-rw---- 1 mysql mysql 7300 Dec 12 15:23 ipblocks.MYD -rw-rw---- 1 mysql mysql 3072 Dec 12 15:23 ipblocks.MYI -rw-rw---- 1 mysql mysql 8582 Jul 20 19:30 links.frm -rw-rw---- 1 mysql mysql 41151856 Dec 13 17:59 links.MYD -rw-rw---- 1 mysql mysql 26686464 Dec 13 17:59 links.MYI -rw-rw---- 1 mysql mysql 8898 Nov 22 08:43 old.frm -rw-rw---- 1 mysql mysql 8790 Jul 20 19:30 oldimage.frm -rw-rw---- 1 mysql mysql 54436 Dec 13 09:51 oldimage.MYD -rw-rw---- 1 mysql mysql 11264 Dec 13 09:51 oldimage.MYI -rw-rw---- 1 mysql mysql 2082194432 Dec 13 17:59 old.MYD -rw-rw---- 1 mysql mysql 19528704 Dec 13 17:59 old.MYI -rw-rw---- 1 mysql mysql 8598 Jul 20 18:49 random.frm -rw-rw---- 1 mysql mysql 85000 Dec 13 04:47 random.MYD -rw-rw---- 1 mysql mysql 1024 Dec 13 04:47 random.MYI -rw-rw---- 1 mysql mysql 8964 Oct 28 08:06 recentchanges.frm -rw-rw---- 1 mysql mysql 3014196 Dec 13 17:59 recentchanges.MYD -rw-rw---- 1 mysql mysql 1556480 Dec 13 17:59 recentchanges.MYI -rw-rw---- 1 mysql mysql 8700 Jul 20 18:49 site_stats.frm -rw-rw---- 1 mysql mysql 29 Dec 13 17:59 site_stats.MYD -rw-rw---- 1 mysql mysql 2048 Dec 13 17:59 site_stats.MYI -rw-rw---- 1 mysql mysql 8874 Aug 24 02:54 user.frm -rw-rw---- 1 mysql mysql 1838736 Dec 13 17:48 user.MYD -rw-rw---- 1 mysql mysql 174080 Dec 13 17:48 user.MYI -rw-rw---- 1 mysql mysql 8590 Nov 27 15:15 watchlist.frm -rw-rw---- 1 mysql mysql 291006 Dec 13 17:37 watchlist.MYD -rw-rw---- 1 mysql mysql 471040 Dec 13 17:37 watchlist.MYI
I'd be glad if someone could point me in the direction of that info or the relevant page again. I'll save the info this time!
It'd be good to have that info on this page - or a link - too. David Martland 15:18 Dec 13, 2002 (UTC)
There really needs to be a more conservative total article count that makes a distinction between encyclopedia articles and almanac articles and that excludes more certain pages. What particularly troubles me is that there are now thousands of year almanac pages and that most of them can't be considered to even be almanac articles because they are just templates.
I therefore propose the following (in addition to the current criteria); 1) any page linked to centuries should be excluded from the total article count and should be given its own line in special:statistics at least until most of these pages become almanac articles (the vast majority are either templates or templates with one or two entries). 2) any page with a link to Wikipedia:Disambiguation be excluded from the count. 3) any page that is less than 500 bytes be excluded from the count (E. coli is 610 bytes). and 4) there should be three "total article counts" for everything not excluded by the above; one for anything with the string, list, chart, timeline or table in their titles (these would be "almanac-like" articles), one for everything left over (these would be "encyclopedia articles") and one grand total count that would still be the number displayed on the Main Page.
Our current count is exaggerating the true number of articles we have and is harming the project as a result. We need to be honest with our article counts and very conservative -- otherwise we will loose credit with passers-by who are at first impressed by our article count but then find out that it is bloated. --mav 13:36 Aug 28, 2002 (PDT)
I don't think 1500 characters is a particularly useful number -- there are many subjects for which 500 characters would suffice as a minimum article size (although something in the range of 500 - 1000 characters wouldn't bother me too much). Could you maybe run some quick numbers to see how much of a reduction would occur if my proposal were to be enacted (this could be done easily if there were an page count in "what links here")? I was thinking about a reduction of 5 - 7 (maybe 10) thousand. Even if it is more than that I don't think that a temporary reduction in the total article count would hurt. We've already been through one round of this back when we upgraded to phase II and it didn't hurt anybody's moral that I know of. All that we have to do is write-up an announcement that we are enacting a far more conservative definition of what we consider to be an article as far as automatic detection goes. --mav
Total articles 45179 (including without comma) <500 -14314 -Disambiguation -289 -Year in review -1292 (pages with numbers as title) -'list' in title -386 -'century' in title -60 -'timeline' in title-59 -'table' in title -55 TOTAL 28903
I also don't think we could ever be able to teach a computer how to dertermine just what is, or is not, a 'useful' article. --mav
=0: 2 <16: 3 ( 1-15: 1) <31: 21 ( 16-30: 18) <63: 111 ( 31-62: 90) <125: 1222 ( 63-124: 1111) <250: 4646 ( 125-249: 3424) <500: 14138 ( 250-499: 9492) <1000: 25474 ( 500-999: 11336) <2000: 34739 (1000-1999: 9265) <4000: 40849 (2000-3999: 6110) total: 45172 (4000+: 4323)
Can we have some figures for mean article word/character counts (ignoring markup and HTML), please? This would enable better comparisons with existing encyclopedias: see the article text for comparisons.
character count... bah. thats unreliable too. Can i suggest a simple, effective, and _working_ solution? yeah thats what i thought. Since wikipedia is usermoderated, why not add the option for registered users, or maybe unregistered too to vote on how usefull they found the article, including a reason why. think slash dot (-7 too short), +5 well written, etc... it wouldn't be hard to implement, and i think it would be good, and then you could count the "real" articles based on their user approval. Of course this affects articles which are voted really low, then majourly updated to help this that still have a low score. thats why i think ratings should be cleared every time there is a majour update (eg. not minor + sometype of change comparison with the .diff file) I think. ideas anyone? I think this is pretty good. and i'd be willing to implement it, if people like the idea, i dont know how long it'd take me, cause im not familiar with the codebase, but im very profficient in php and db work as well as other programming languages. i am already registered in SF too... so if nayone likes this idea, leave a comment here, mny talk page, or [e-mail me].
Lightning, Sept 29 3:17
Search Encyclopedia
|
Featured Article
|