Encyclopedia > ISO 8859

Article Content

ISO 8859

ISO 8859 is a group of related ISO standards for 8-bit character encodings for use by computers. These standards are based on ASCII, the most widely used 7-bit character encoding.

While the 128 ASCII characters are sufficient to exchange information in English without preventing comprehension, most other languages that use the Roman alphabet need additional symbols not covered by ASCII, such as ß (German), å (Swedish and other Nordic languages), etc. ISO 8859 sought to remedy this problem by extending 7-bit ASCII to eight bits, allowing positions for another 128 characters. However, more characters were needed to achieve this than could fit in a single 8-bit character encoding, so several were developed. All the encodings, however, encode the first 128 positions (from 0 to 127) in the same way as each other and the same way as ASCII. Positions 128 to 159 contain control characters. The upper 96 code points of each ISO 8859 encoding differ.

The ISO 8859 standards are designed for reliable information exchange, not typography. As a result, the standards omit symbols needed for high-quality typography, such as optional ligatures, curly quotation marks, dashes, etc. As a result, high-quality typesetting systems often use proprietary or idiosyncratic extensions on top of the ISO 8859 standards, or use Unicode instead.

As a rule of thumb, if a character or symbol was not already part of a widely used data-processing character set and was also not usually provided on typewriter keyboards for a national language, it didn't get in. Hence the directional double quotation marks « and » used for some European languages were included, but not the directional double quotation marks “ and ” used for English and some other languages. French didn't get its œ and Œ ligatures because French speakers had not previously needed them enough to demand them on their keyboards.

The ISO 8859 encodings provide the diacritic marks required for various European languages. They also provide non-Roman alphabets: Greek, Cyrillic (used by Russian, Bulgarian, and other languages), Hebrew, and Arabic. However, the standard makes no provision for the scripts of East Asian languages such as Chinese or Japanese, as these highly ideographic writing systems require many thousands of code points, many more than can be placed in a single 8-bit plane.

The encodings defined by ISO-8859 include:

ISO 8859-1 (aka Latin-1)---perhaps the most widely used ISO 8859 standard, covering most Western European languages: Albanian, Basque, Catalan, Danish, Dutch (partial), English, Faeroese, Finnish (partial), French (partial), German, Icelandic, Irish, Italian, Norwegian, Portuguese, Rhaeto-Romanic[?], Scottish[?], Spanish, Kurdish, and Swedish, as well as the African languages Afrikaans and Swahili. There is no Euro symbol. There is a small ÿ but the capital Ÿ is absent, presumably justified because ÿ is only used in French for the supported languages, and it had become traditional mostly not to place accents on capital letters in French.
ISO 8859-2 (aka Latin-2)---supports those Eastern European languages that use a Roman alphabet, including Polish, Czech, Slovak, and Hungarian. German is also supported.
ISO 8859-3 (aka Latin-3 or "South European")---Turkish, Maltese, and Esperanto; largely superseded by ISO 8859-9[?] for Turkish and Unicode for Esperanto.
ISO 8859-4[?] (aka Latin-4 or "North European")---Estonian, Latvian, Lithuanian, Greenlandic, and Lappish[?].
ISO 8859-5[?]---Cyrillic.
ISO 8859-6[?]---Arabic.
ISO 8859-7[?]---Greek.
ISO 8859-8[?]---Hebrew.
ISO 8859-9[?] (aka Latin-5)---Largely the same as ISO 8859-1, replacing the rarely used Icelandic letters with Turkish ones.
ISO 8859-10[?] (aka Latin-6)---a rearrangement of Latin-4.
ISO 8859-11[?]---Thai.
No ISO 8859-12!
ISO 8859-13[?] (aka Latin-7)---Baltic Rim.
ISO 8859-14[?] (aka Latin-8)---Celtic.
ISO 8859-15 (aka Latin-9)---a revision of 8859-1 that removes some little-used symbols, replacing them with the Euro symbol € and the letters Š, š, Ž, ž, Œ, œ, and Ÿ, which completes the coverage of French and Finnish.
ISO 8859-16[?] (aka Latin-10)---is in the making and is targetted on South-Eastern European languages, incl. Albanian, Croatian, Hungarian, Italian, Polish, Romanian and Slovenian, but also Finnish, French, German and Irish Gaelic (new orthography). The focus lies more on letters than symbols.

The ISO 8859 standards are usually inadequate when one wishes to use multiple languages at once. For example, none of the above standards simultaneously supports Polish and Russian.

An alternative character set standard called Unicode was developed to unify coverage of the other character sets. It supports over a million code points (much more than ISO 8859) by using several character encodings of 8-bit, 16-bit, or variable-length words. Because Unicode does away with the limitations of 8-bit character encodings, it is often preferred for new applications. However, ISO 8859 has the advantage of being well-established, and simpler software is needed to manipulate it: the equation of one byte to one character holds, there are no combining characters or variant forms, and fonts remain conveniently small.

External References

Descriptions and code charts for most ISO 8859 standards are found in "ISO 8859 Alphabet Soup": http://czyborra.com/charsets/iso8859

All Wikipedia text is available under the terms of the GNU Free Documentation License

Search Encyclopedia

Search over one million articles, find something about almost anything!