Encyclopedia > Latin-1

  Article Content

ISO 8859-1

Redirected from Latin-1

ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. 1, consisting of 191 characters from the Latin script, each encoded as a single 8-bit code value. These code values can be used in almost any data interchange system to communicate in the following European languages: Albanian, Basque, Catalan, Danish, Dutch, English, Faroese, Finnish, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Rhaeto-Romanic, Scottish, Spanish, Swedish. Other languages covered include Afrikaans and Swahili. Thus, this character encoding is used throughout the American continent, Western Europe, Australia and much of Africa.

ISO/IEC 8859-1 suffers from a number of deficiencies, including the omission of a few French diacritics and the lack of a Euro symbol. For this reason ISO/IEC 8859-15 has been developed as an update of ISO/IEC 8859-1 to add the required additional characters. (This required however the removal of some less used characters from ISO/IEC 8859-1, including fraction symbols and letter-free diacritics: ¤, ¦, ¨, ´, ¸, ¼, ½ and ¾.)

Since all 191 characters encoded by ISO/IEC 8859-1 are graphic and compatible with most web browsers, they can be shown as glyphs in the following table. The row and column headings indicate the hexadecimal digit combinations to produce the 8-bit code value; e.g., "L" is hex 4C, or binary 01001100.

ISO/IEC 8859-1
x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0xunused
1x
2xSP!"#$%&'()*+,-./
3x0123456789:;<=>?
4x@ABCDEFGHIJKLMNO
5xPQRSTUVWXYZ[\]^_
6x`abcdefghijklmno
7xpqrstuvwxyz{|}~
8xunused
9x
AxNBSP¡¢£¤¥¦§¨©ª«¬­®¯
Bx°±²³´µ·¸¹º»¼½¾¿
CxÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ
DxÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
Exàáâãäåæçèéêëìíîï
Fxðñòóôõö÷øùúûüýþÿ

In the table above, 20 is the regular SPACE character, and A0 is the NO-BREAK SPACE. AD is a SOFT HYPHEN, which may not appear at all in some web browsers.

Code values 00-1F, 7F, and 80-9F are not assigned to characters by ISO/IEC 8859-1.

ISO 8859-1 vs ISO-8859-1 The IANA has approved ISO-8859-1 (note the extra hyphen), a superset of ISO/IEC 8859-1, for use on the Internet. This character map, or character set or code page, supplements the assignments made by ISO/IEC 8859-1, mapping control characters to code values 00-1F, 7F, and 80-9F. It thus provides for 256 characters via every possible 8-bit value.

The IANA allows all of the following aliases for ISO-8859-1 to be used case-insensitively:

  • ISO_8859-1:1987
  • ISO_8859-1
  • ISO-8859-1
  • iso-ir-100
  • csISOLatin1
  • latin1
  • l1
  • IBM819
  • CP819

The name Latin-1 is an informal alias unrecognized by ISO or the IANA, but is perhaps meaningful in some computer software.

The following table shows ISO-8859-1, with the 3-letter abbreviations for the control characters shown in underlined text.

ISO-8859-1
x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0xNULSOHSTXETXEOTENQACKBELBSHTLFVTFFCRSOSI
1xDLEDC1DC2DC3DC4NAKSYNETBCANEMSUBESCFSGSRSUS
2xSP!"#$%&'()*+,-./
3x0123456789:;<=>?
4x@ABCDEFGHIJKLMNO
5xPQRSTUVWXYZ[\]^_
6x`abcdefghijklmno
7xpqrstuvwxyz{|}~DEL
8xPADHOPBPHNBHINDNELSSAESAHTSHTJVTSPLDPLURISS2SS3
9xDCSPU1PU2STSCCHMWSPAEPASOSSGCISCICSISTOSCPMAPC
AxNBSP¡¢£¤¥¦§¨©ª«¬­®¯
Bx°±²³´µ·¸¹º»¼½¾¿
CxÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ
DxÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
Exàáâãäåæçèéêëìíîï
Fxðñòóôõö÷øùúûüýþÿ

In the table above, 20 is the regular SPACE character, and A0 is the NO-BREAK SPACE. AD is a SOFT HYPHEN, which may not appear at all in some web browsers.

There are additional parts to the ISO/IEC 8859 standard that have corresponding IANA-approved character sets, e.g. ISO/IEC 8859-10 (Latin alphabet no. 6) is very similar to character set ISO-8859-10. Each of the ISO/IEC 8859-x parts encodes characters in the same way: they cover the ASCII range (hex 20-7F) plus 96 additional characters in the A0-FF range, for a total of 191 characters. The ISO-8859-x sets each add the C0 control characters from 00-1F, and additional characters in the 80-9F range, thus offering a set of 255 characters. ISO-8859-1 is unique among these sets in that that its coded characters are equivalent to the first 256 code points of Unicode.

ISO-8859-1 is the standard encoding used by the X Window System on most Unix machines.

ISO-8859-1 vs Windows ANSI The legacy components of Microsoft Windows use, by default, an encoding that differs from ISO-8859-1, using graphic characters rather than control characters in the 80-9F range. Windows calls it ANSI generically, but depending on where the operating system was sold, the character set will have another name, e.g. CP1252 in the US and Western European markets, with the IANA-approved name Windows-1252.

The following table shows Windows-1252, with the differences from ISO-8859-1 highlighted:

Windows-1252 (CP1252)
x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0xNULSOHSTXETXEOTENQACKBELBSHTLFVTFFCRSOSI
1xDLEDC1DC2DC3DC4NAKSYNETBCANEMSUBESCFSGSRSUS
2xSP!"#$%&'()*+,-./
3x0123456789:;<=>?
4x@ABCDEFGHIJKLMNO
5xPQRSTUVWXYZ[\]^_
6x`abcdefghijklmno
7xpqrstuvwxyz{|}~DEL
8xƒˆŠŒŽ
9x˜šœžŸ
AxNBSP¡¢£¤¥¦§¨©ª«¬­®¯
Bx°±²³´µ·¸¹º»¼½¾¿
CxÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ
DxÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
Exàáâãäåæçèéêëìíîï
Fxðñòóôõö÷øùúûüýþÿ

In the table above, 20 is the regular SPACE character, and A0 is the NO-BREAK SPACE. AD is a SOFT HYPHEN, which may not appear at all in some web browsers. 81, 8D, 8F, 90, and 9D are unused.

Older Apple Macintosh computers use a different encoding, MacRoman, that likewise differs in the 80-9F range.

The distinction between ISO/IEC 8859-1, ISO-8859-1, Windows-1252, and MacRoman is a common source of confusion among computer programmers.

External Links



All Wikipedia text is available under the terms of the GNU Free Documentation License

 
  Search Encyclopedia

Search over one million articles, find something about almost anything!
 
 
  
  Featured Article
Sanskrit language

... sub-branch of Indo-Iranian[?]. Vedic Sanskrit and Avestan are the oldest members of the Indo-Iranian sub-branch of the Indo-European[?] family. Nursitani ...

 
 
 
This page was created in 25.3 ms