Redirected from Data compression/LZW
|
The key insight of the method is that it is possible to automatically build a dictionary of previously seen strings in the text being compressed. The dictionary does not have to be transmitted with the compressed text, since the decompressor can build it the same way the compressor does, and if coded correctly, will have exactly the same strings that the compressor dictionary had at the same point in the text.
The dictionary starts off with 256 entries, one for each possible single byte string. Every time a string already in the dictionary is seen, a longer string consisting of that string with the single character following it, is stored in the dictionary.
The output consists of integer indices into the dictionary. These initially are 9 bits each, and as the dictionary grows, can increase to up to 16 bits. A special symbol is reserved for "flush the dictionary" which takes the dictionary back to the original 256 entries, and 9 bit indices. This is useful if compressing a text which has variable charactistics, so that a dictionary of early material is no longer much use later in the text.
This use of variable increasing index sizes is one of Welch's contributions. Another was to specify an efficient data structure to store the dictionary.
The method became moderately widely used in the program "compress" which became a more or less standard utility in Unix systems circa 1986. (It has since disappeared from many for both legal and technical reasons.) Several other popular compression utilities also used the method, or closely related ones.
It became very widely used after it became part of the GIF image format in 1987. It may also (optional) be used in TIFF-files.
LZW compression provided a better compression ratio, in most applications, than any well known method available up to that time. It became the first widely used general purpose data compression method on computers. On large English texts, it typically compressed to about half original size. Other kinds of data were also quite usefully compressed in many cases.
LZW compression has now been superseded for most purposes by DEFLATE and Burrows-Wheeler transform methods, partly because these newer methods compress better in most cases, and partly for legal reasons. However the GIF image format with LZW compression is still in widespread use in 2001.
Various patents have been issued in the USA and other countries for LZW and similar algorithms. LZ78 was covered by US patent 4,464,650 by Lempel, Ziv, Cohn, and Eastman, assigned to Sperry Corporation, later Unisys Corporation, filed on 10 August 1981 and presumably now expired. Two US patents were issued for the LZW algorithm: 4,814,746 by Victor S. Miller and Mark N. Wegman and assigned to IBM, originally filed on 1 June 1983, and 4,558,302 by Welch, assigned to Sperry Corporation, later Unisys Corporation, filed on 20 June 1983.
US patent 4,558,302 is the one that has caused the most controversy. Unisys at one time granted royalty-free patent licenses to developers of free software and freely available proprietary software; the company terminated those licenses in August 1999. Many legal experts have concluded that the patent does not cover devices that can only uncompress LZW data and cannot compress it; for this reason, the popular Gzip program can read .Z files but cannot write them.
It was reported in Debian Weekly News (http://www.debian.org/News/weekly/2002/45/) based on a comp.compression thread (http://groups.google.com/groups?&threadm=a5aa8dd0.0208271613.3cd18da6%40posting.google.com), that the Unisys patent in the USA expired on December 20, 2002 - 17 years and 10 days after it was granted. Most other sources claim the patent is due to expire in June 2003, 20 years after it was filed, because 35 USC §154(c)(1) (http://www4.law.cornell.edu/uscode/35/154) specifies that patents subsisting as of six months after the enactment of the Uruguay Round Agreements Act[?] last for the greater of 17 years after grant and 20 years after filing.
Search Encyclopedia
|
Featured Article
|