ASCII (American Standard Code for Information Interchange) is a character set and a character encoding based on the Roman alphabet as used in modern English. It is most commonly used by computers and other communication equipment to represent text and by control devices that work with text.
Like other codes, ASCII specifies a correspondence between integers that can be represented digitally and the symbols of a written language, thus allowing digital devices to communicate with each other and to process and store character-oriented information. The ASCII character encoding (or a compatible extension; see below) is used on nearly all common computers, especially personal computers and workstations. The preferred MIME name for this encoding is "US-ASCII".
ASCII is a seven-bit code, meaning that it uses the integers representable with seven binary digits (a range of 0 to 127 decimal) to represent information. Even at the time that ASCII was introduced, most computers dealt with eight-bit bytes as the smallest unit of information; the eighth bit was commonly used for error checking on communication lines or other device-specific functions.
ASCII does not specify any way to represent information about the structure or appearance of a piece of text. That requires other standards markup language.
ASCII was first published as a standard in 1963 by the American Standards Association (ASA), which later became ANSI. There are many variations of ASCII, but its present, most widely-used form is ANSI X3.4-1967, also standardized as ECMA-6, ISO/IEC 646:1991 International Reference Version, and ITU-T Recommendation T.50 (09/92). It is embedded in page zero of its probable replacement, unicode. ASCII is generally considered the most successful software standard ever promulgated.
Historically, ASCII developed from telegraphic codes. It started as a commercial 7-bit teleprinter code promoted by Bell data services. ASA reordered the code for sorting (alphabetization) of lists, and added features for devices other than teleprinters. Bell's code added punctuation and lower-case letter to the earlier 5-bit Baudot[?] teleprinter code. Baudot automated sending and receiving of telegraphic messages and took many features from Morse code.
The first thirty-two codes (numbers 0-31 decimal) in ASCII are reserved for control characters: codes that may not themselves represent information, but that are used to control devices (such as printers) that make use of ASCII. For example, character 10 represents the "line feed" function (which causes a printer to advance its paper), and character 27 represents the "escape" key found on the top left of common keyboards.
Code 127 (all seven bits on) is another special character known as "delete" or "rubout". Though its function is similar to that of other control characters, it was placed at this position so that it could be used to erase a section of paper tape, a popular storage medium at one time, by punching out all its holes. Code 0 (all bits off) is ignored by many computer systems.
Many of the codes are to mark data packets, and control a data transmission protocol (i.e. enquiry (any stations out there?), acknowledge, negative acknowledge, start of header, start of text, end of text). Escape and substitute permit a protocol to mark binary data so that if it contains codes with the same values as protocol characters, the codes will be processed as data.
The separator characters (record separator, etc.) were designed for use with magnetic tape systems.
XON and XOFF are often sent from a slow device, such as a printer, to start and stop a flow of data so no data is lost.
Binary | Decimal | Hex | Abbreviation | Printable Representation | Name/Meaning |
---|---|---|---|---|---|
0000 0000 | 0 | 00 | NUL | ␀ | Null character |
0000 0001 | 1 | 01 | SOH | ␁ | Start of Header |
0000 0010 | 2 | 02 | STX | ␂ | Start of Text |
0000 0011 | 3 | 03 | ETX | ␃ | End of Text |
0000 0100 | 4 | 04 | EOT | ␄ | End of Transmission |
0000 0101 | 5 | 05 | ENQ | ␅ | Enquiry |
0000 0110 | 6 | 06 | ACK | ␆ | Acknowledgment |
0000 0111 | 7 | 07 | BEL | ␇ | Bell |
0000 1000 | 8 | 08 | BS | ␈ | Backspace |
0000 1001 | 9 | 09 | HT | ␉ | Horizontal Tab |
0000 1010 | 10 | 0A | LF | ␊ | Line Feed |
0000 1011 | 11 | 0B | VT | ␋ | Vertical Tab |
0000 1100 | 12 | 0C | FF | ␌ | Form Feed |
0000 1101 | 13 | 0D | CR | ␍ | Carriage return |
0000 1110 | 14 | 0E | SO | ␎ | Shift Out |
0000 1111 | 15 | 0F | SI | ␏ | Shift In |
0001 0000 | 16 | 10 | DLE | ␐ | Data Link Escape |
0001 0001 | 17 | 11 | DC1 | ␑ | XON Device Control 1 |
0001 0010 | 18 | 12 | DC2 | ␒ | Device Control 2 |
0001 0011 | 19 | 13 | DC3 | ␓ | XOFF Device Control 3 |
0001 0100 | 20 | 14 | DC4 | ␔ | Device Control 4 |
0001 0101 | 21 | 15 | NAK | ␕ | Negative Acknowledgement |
0001 0110 | 22 | 16 | SYN | ␖ | Synchronous Idle |
0001 0111 | 23 | 17 | ETB | ␗ | End of Trans. Block |
0001 1000 | 24 | 18 | CAN | ␘ | Cancel |
0001 1001 | 25 | 19 | EM | ␙ | End of Medium |
0001 1010 | 26 | 1A | SUB | ␚ | Substitute |
0001 1011 | 27 | 1B | ESC | ␛ | Escape |
0001 1100 | 28 | 1C | FS | ␜ | File Separator |
0001 1101 | 29 | 1D | GS | ␝ | Group Separator |
0001 1110 | 30 | 1E | RS | ␞ | Record Separator |
0001 1111 | 31 | 1F | US | ␟ | Unit Separator |
0111 1111 | 127 | 7F | DEL | ␡ | Delete |
In the table above, the fifth column contains graphic characters that are reserved for representing the position of control codes in a data stream; your HTML user agent may require the installation of additional fonts in order to display them.
See new line.
Code 32 is the "space" character, denoting the space between words, which is produced by the large space bar of a keyboard. Codes 33 to 126 are called the printable characters, which represent letters, digits, punctuation marks, and a few miscellaneous symbols.
ASCII provides some internationalization for French and Spanish (both spoken in the U.S.) by providing a backspace with the grave, accent (miscalled a "single quote"), tilde, and breath mark (inverted vel).
|
|
|
Note how uppercase characters can be converted to lowercase by adding 32 to their ASCII value; in binary, this can be accomplished simply by setting the sixth-least significant bit to 1.
The international spread of computer technology led to many variations and extensions to the ASCII character set, since ASCII does not include accented letters and other symbols necessary to write most languages besides English that use Roman-based alphabets. International standard ISO 646 (1972) was the first attempt to remedy this problem, although it regrettably created compatibility problems as well. ISO 646 was still a seven-bit character set, and since no additional codes were available, some were re-assigned in language-specific variants. See ISO 646 for details.
Improved technology brought out-of-band means to represent the information formerly encoded in the eighth bit of each byte, freeing this bit to add another 128 additional character codes for new assignments. Eight-bit standards such as ISO 8859 enabled a broader range of languages to be represented, but were still plagued with incompatibilities and limitations. Still, ISO 8859-1 and original 7-bit ASCII are the most common character encodings in use today, though Unicode (with a much larger code set) is quickly becoming standard in many places. These newer codes are backward-compatible: that is, the first 127 code points of each code are the same as ASCII, and the first 256 code points of Unicode are the same as ISO 8859-1.
The portmanteau word "ASCIIbetical" has evolved to describe the collation of data in ASCII code order rather than genuine alphabetical order (which requires some tricky computation, and varies with language). (See the Jargon File (http://www.catb.org/~esr/jargon/html/entry/ASCIIbetical-order).)
See also: Extended ASCII, Unicode, ASCII art.
Search Encyclopedia
|
Featured Article
|