Redirected from Wikipedia commentary/Proposal for an Encyclopdian Recycling Endeavor
In my opinion, these articles greatly augment Wikipedia, with necessary data that is unlikely to just "happen" to be entered by visitors. Consider, for example, Alphonso_X_of_Spain, a medieval spanish king. Certainly worthy of mention, both as a world leader and because of his early involvement in astronomy. But would an entry on this fellow just happen to show up through normal Wikipedia processes? Maybe, but probably not. Needless to say there are thousands of moderately important people like Alphonso who *should* be listed in Wikipedia, yet most likely *won't*; at least not anytime soon.
I don't mean to disparage Wikipedia, quite the contrary. Wikipedia has a number of strengths that the proprietary encyclopedias will likely *never* have. Of these many strengths, let me choose just one for elaboration: Timeliness. Let me diverge a bit on a little example.
This week a new planetoid-thingee was discovered out in the comet belt. Very important scientific discovery, but let's say within a few weeks your daughter needs to write a report on it for high school, and needs more in-depth info than available in those terse CNN news items. That hard-bound dead-tree encyclopedia might have some useful articles on asteroids and the solar system, and probably will only be a few years out of date, but it certainly won't have anything useful on this newly discovered planetoid. Fortunately you bought your daughter a new computer today and it came with a digital CD ROM encyclopedia. Unfortunately, due to space constraints, this encyclopedia's asteroid article is extremely terse (though it does have a photo of an asteroid, but it's copyrighted with full legal protections of course). And since the CD ROM was published months before the new planetoid thingee was discovered, you're not likely to find it there either. You decide to try an online encyclopedia, yet these appear to be just an online version of the CD ROM you bought; maybe the pay-to-view site can afford to stay more up to date, but you've paid for two encyclopedia's, and aren't too thrilled about paying for another. Surely there must be another option...
Your daughter giggles at you. "Silly, just go to Wikipedia!" You do so, and she looks at the Recent Changes page to see if anyone's been keeping up to date with the news. Sure enough: Just today (the 26th of August, only a few days after the discovery), folks have been busily posting away on all manner of astronomical topics. Asteroid is rather terse, but at least includes mention of that planetoid thingee, "asteriod found in 2001, identified as 2001 KX76". talk:Asteroid includes some extra interesting info not yet incorporated into the main article. Trans-Neptunian object, Kuiper Belt, Planet, Near-Earth asteroid, Solar system, and Planet X have also been updated (or newly added). Clicking around, she's able to find much more information on astronomers, other recent discoveries, and historical information to fill in her report. She's even able to make contact with some other students and teachers interested in this newly found body, and thereby learn of further sources of information on it, available through the web. When she is finally done and turns in her report, she decides to also post it up to Wikipedia, as article 2001 KX76[?]. :-)
Wonderful! Wikipedia comes to the rescue and serves its role in the passing of knowledge to those who need it.
But you notice two things that are kind of odd. First, of course, there's very few photos in wikipedia, but that's a whole 'nother topic of discussion. At least there's a photo of Galileo on his page. Second, and perhaps more importantly, many of the supporting articles seem to be rather terse. For example, you compare your proprietary encyclopedia's lengthy dissertation on Pluto with wikipedia's dinky Pluto entry. Saturn is not much better. Charon doesn't even exist... Hmm...
Timeliness may be a strength of wikipedia, but depth may be its weakness. Certainly we can expect better articles to come from the planets; after all they're big and will always be there and new things are likely to be discovered about them. But what about other, older topics? Say you needed to know about the origins of astronomy in the 13th century. Luckily there's that aforementioned article on Alphonso X of Spain, but what of historical figures with names starting with B-Z?
The 'A' encyclopedia was digitized by hand, by someone who happened to have a 1911 edition on hand. Digitization is a lot of work, but it can be done; Project Gutenberg's been at it for years, and when you think about it, they're not so much different, organizationally, than we are.
So here is my proposal.
I think we could turn our distributed, collaborative talents and processes towards a mini-Project Gutenberg endeavor to digitize and copy into Wikipedia a full set of out-of-copyright encyclopedias.
I think in the interests of practicality and to make distribution of efforts a bit simpler, we may want to allow variation in years... We could have the 1911 A, 1922 B, 1909 C, etc. I think if we allow this, it eliminates a lot of need for coordination. There will of course be variability in where one volume stops and another starts, and we might have a few articles slip through, but if those articles are important, we may "pick them up" through usual Wikipedia evolution.
There's several steps we'd need to take:
a) First, we would need to determine when a rough cut-off date for copyrights is. Is 1911 the only year open to us, or could a 1920 or 1930 edition be used?
b) Next, we need folks to keep an eye out as they go about their lives, for some old encyclopedia sets. Look in grandparent's closets and bookshelves, a dusty corner office in an old college, book-heavy garage sales, and used bookstores. The set needn't be complete, but it's important that the print quality be good enough to scan. See if you can buy or have one or more of the books.
c) Now, even if an encyclopedia is in the right date range, we still need to pause and verify that the particular edition in hand *is* in the public domain. Copyright law can be complicated, and wikipedia *certainly* doesn't want to take the risk doing anything that could risk a lawsuit from a jealous encyclopedia company some day.
d) I think the easiest and fastest way to scan an encyclopedia volume is also rather destructive: Tear off the cover, break the binding, and cut the pages into loose-leaf, then run them through a scanner. I bet a multi-page feeding scanner would letcha get through a bunch of volumes at once. I suppose one could justify the ruining of an antique book in the knowledge that it's probably near the end of its life in paper form anyway and is bound for a new and even more meaningful life in an electronic form. Note that by leveraging the US Postal System (or fedex), this step need not necessarily be done by the same fellow who did step B. :-)
e) Next is the hard part: Proofreading. But maybe this step could be skipped, if the scanner is good enough. I've noticed that with the volume 'A' articles, spelling and format correction is quick to occur when the article appears in wikipedia. So maybe this step could be just a quick QA to ensure the page isn't garbled and in need of re-scanning.
f) With the article digitized, the next step is to get it into wikipedia. This is the step we already know how to do very well, so nothing more need be said. Judging by how quickly articles have been submitted lately, I'm guessing someone has developed a tool or process we could reuse.
Steps b-f can proceed in parallel; someone in New Jersey could be working on volume 9 of a 1919 encyclopedia, while someone else does volume 21 of a 1912 edition. Coordination can be done peer-to-peer, as folks ask who is working on what, and can see from what's *not* in wikipedia what still needs to be done.
BryceHarrington
I think that if we indeed get the scanning project going, we should also donate it to Project Gutenberg, since they gave as volume 1. This means we release it in the public domain I guess.
However, I'm wondering if Project Gutenberg is scanning the other volumes right now. Does anybody know who scanned volume 1? Those people should be contacted. --AxelBoldt
The rest of the 1911 Encyclopedia Britannica is available on CD-ROM as image files. See http://www.classiceb.com/ So the physical scanning part is already done (no destruction of books required :-), and all that remains is the OCR. The files could be then given to Project Gutenberg and also used here. --Alan Millar
The problem with using Wikipedia to correct things is that it is unlikely to result in an exact transcription of EB, which is what Project Gutenburg would want.
Finally, I might note that a lot of texts that are public domain (such as EB1911) in the US may still be copyrighted elsewhere, due to past differences in copyright law -- but then so long as Wikipedia is located on a US server, that shouldn't be a problem.
Also, http://www.classiceb.com claims their scanned-in images are copyrighted. As per talk:Public Domain Resources, they are (probably) wrong -- mere scanning is not of sufficent novelty to create copyright. But they still might cause legal hassles -- so maybe we better just scan it in ourselves.
However, while the original text of these older editions of the Encyclopedia Britannica are in the public domain, the ClassicEBŪ CDs are copyrighted by ClassicEB.com in all respects except for the original text. This means that purchasers of any ClassicEBŪ CDs may not copy our CDs in any way under penalty of violating copyright laws. Users may, of course, print off on paper any or all images contained on the CDs, in unlimited amounts, without violating ClassicEB.com's rights or copyright laws.
If you own your own physical set of the public domain editions of the Encyclopedia Britannica, you may scan your set onto CD and offer them yourself without being in violation of copyright laws. But you may not copy the images contained on the ClassicEBŪ CDs, nor may you use in any way the manual or any of the index tools contained on the CDs which were designed by ClassicEB.com. Violators of ClassicEB.com's rights will be prosecuted.
One thing nobody has mentioned is that a lot of things have changed since 1911, leading to a lot of misleading ideas in articles scanned directly. A disclaimer at the bottom of the page helps, but knowing that it's an old text doesn't fix everything, and it can be hard to expunge all the wrong data. Articles on physics, for example, would need to be carefully reviewed by somebody fully up-to-date, lest we bring relativity back into debate...
Even things like the article on King Alphonso might now be considered apocryphal nowadays - it's my understanding that history gets revised now and again (even by non-revisionists :))
Working on a project like this gets me seriously annoyed with the current state of copyright law. I have a World Book Encyclopedia[?] set from the 1970s that I can't even give away. The company that makes World Book won't make a cent off of this material, but because of the copyright, we can't use it for Wikipedia; instead we can only reuse material that hasn't been current for almost a century. sigh -- Stephen Gilbert
Search Encyclopedia
|
Featured Article
|