The integral data types are computer data types capable of storing integers. Integral data types generally consist of a certain fixed number of bits, which is called the data type size and is usually a power of two. This implementation of an integral data type is sometimes called a fixed precision integer. Arbitrary or infinite precision integers are found in Lisp. Integral data types are treated as a unit of storage and manipulation.
The table below lists data types recognized by common processors. Additional data types, such as bit-fields and extended-precision integers, found in high level programming languages are not discussed here. Following the table are additional usage notes, then details on number representation. For information about representation of real numbers, see real data type.
|
Common sizes of integral data types
bits | name | comments |
---|---|---|
1 | bit | status, Boolean flag, has only 2 possible states |
4 | nibble, nybble | humorously derived half a byte; can contain a single BCD digit |
8 | byte, octet | integers, ASCII characters |
16 | word | integers, pointers, UCS-2 characters |
32 | doubleword/longword | usually shortened to long; integers, pointers |
64 | quadword, long long | integers, pointers |
128 | octword | integers, pointers |
The terms in the table are typically used only when the content is to be interpreted numerically (and not as some other kind of data structure).
To represent both positive and negative (signed) integers, the convention is that the most significant bit of the binary representation of the number will be used to indicate the sign of the number, rather than contributing to its magnitude; three formats have been used for representing the magnitude: sign-and-magnitude, one's complement, and two's complement, the latter being by far the most common nowadays.
To avoid this, and to also make integer addition simpler, the two's-complement representation is the one generally used. The two's-complement representation is created by first complementing the positive number, then adding 1 to it. Thus 00101011 (43) becomes 11010101 (-43).
In two's-complement, there is only one zero (00000000). Negating a negative number involves the same operation: complementing, then adding 1. The pattern 11111111 now represents -110 and 10000000 represents -12810; that is, the range of two's-complement integers is -12810 to +12710.
To add two two's-complement integers, treat them as unsigned numbers, add them, and ignore any potential carry over (this is essentially the great advantage that two's-complement has over the other conventions). The result will be the correct two's-complement number, unless both summands were positive and the result is negative or both summands were negative and the result is non-negative. The latter cases are referred to as "overflow" or "wrap around"; the addition cannot be carried out in 8 bit two's-complement in these cases. For example:
00101011 (+43) 11010101 (-43) 00101011 (+43) 10011010 (-101) + 11010101 (-43) + 11100011 (-29) + 11100011 (-29) + 10110001 (- 79) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 00000000 ( 0) 10111000 (-72) 00001110 (+14) 01001011 (overflow)
Endianness When an integer is represented with multiple bytes, the actual ordering of those bytes in memory, or the sequence in which they are transmitted over some medium, is subject to convention. This is similar to the situation in written languages, where some are written left-to-right, while others are written right-to-left.
Using a 4-byte integer, written as "ABCD", where A is the most significant byte and D is least significant byte, big-endian convention would store the number in successive memory locations as A (lowest address), then B, then C, finally D, while little-endian convention would store the bytes in D-C-B-A order.
Network byte order is, by convention, sending the bytes in the order A, then B, etc., onto the medium. It is the responsibility for the transmitting and receiving systems to convert, if necessary, to their internal endian format.
Big-endian numbers are easier to read when debugging a program but less intuitive (because the high byte is at the smaller address); similarly little-endian numbers are more intuitive but harder to debug. The choice of big-endian vs. little-endian for a CPU design has begun a lot of flame wars. Emphasizing the futility of this argument, the very term big-endian was taken from the Big-Endians of Jonathan Swift's Gulliver's Travels. See the Endian FAQ (http://rdrop.com/~cary/html/endian_faq), including the significant essay "On holy wars and a plea for peace", Danny Cohen 1980.
See also: Kilobyte, Megabyte, Gigabyte, Terabyte, Petabyte, Exabyte, Zettabyte, Yottabyte
Search Encyclopedia
|
Featured Article
|