Encyclopedia > Floating point

Article Content

Floating point

A floating-point number is a digital approximation for a real number on a computer. A floating point calculation[?] is one arithmetic calculation done with floating point numbers.

Below are some examples of floating point numbers:

5.0
9/7
3.1419 (terminates)
8E17

In a floating-point number, the number of significant digits is constant, rather than the absolute precision.

In other words, we could represent a number a by two numbers m and e, such that a = m × b^e. In any such system we pick a base b (called the base of numeration, also radix) and a precision p (how many digits to store). m (which is called the mantissa, also significand) is a p digit number of the form +-d.ddd...ddd (each digit being an integer between 0 and b-1 inclusive). If the leading digit of m is non-zero then the number is said to be normalised. Some descriptions use a separate sign (s, which is either -1 or +1) and require m to be positive. e is called the exponent. (For more on the concept of "mantissa", see common logarithm.)

This scheme allows a greater range of numbers to be represented within a limited precision field, which is not possible in a fixed point notation.

As an example, a floating point number with four decimal digits (b=10, p=4)could be used to represent 4321 or 0.00004321, but would round 432.123 to 432.1 and 43212.3 to 43210. Of course in practice the number of digits is often larger than four.

Table of contents

Representation Binary floating-point representation in computer is analogous to scientific notation of decimal number. For example, the number 0.00001234 is written as 1.234 × 10^-5 in decimal scientific notation, or the number 123400 is written as 1.234 × 10⁵. Notice that the significant digits are normalized to start with a non-zero digit.

In floating-point representation, the number is divided into three parts. A sign bit to indicate negative numbers, a certain number of bits are allocated to represent the mantissa and the remaining of the bits are allocated to represent the exponent. The exponent can be represented in many ways (see IntegerFormats) but is usually represented as a biased integer (a bias, say +127, is added to all exponents before storing them). IEEE format uses the biased integer representation. The radix point and the base of 2 are understood and not included in the representation.

Similar to scientific notation, floating-point numbers are often normalized so that the first digit is non-zero. For example, 22₁₀ = 10110₂. When this number is represented in floating point, it becomes 1.011₂ × 2⁴.

Hidden bit

When using binary (b=2) a saving can be made if one requires that all numbers are normalised. The mantissa of a normalised binary number is always non-zero, in particular it is always 1. This means that it does not need to be stored explicitly, for a normalised number it can be understood to be 1. The IEEE standard exploits this fact. Requiring all numbers to be normalised means that 0 cannot be represented; typically some special representation of zero is chosen.

Usage in computing While in the examples above the numbers are represented in the decimal system (that is the base of numeration, b = 10, computers usually do so in the binary system, which means that b = 2). In computers, floating-point numbers are sized by the number of bits used to store them. This size is usually 32 bits or 64 bits, often called "single-precision" and "double-precision". A few machines offer larger sizes; Intel FPUs such as 8087[?] (and its descendands integrated into the x86 architecture) offer 80 bit floating point numbers for intermediate results, and several systems offer 128 bit floating-point, generally implemented in software.

The IEEE have standized the computer representation in IEEE 754. This standard is followed by almost all modern machines. The only exceptions are IBM Mainframes, which recently acquired an IEEE mode, and Cray vector machines, where the T90 series had an IEEE version, but the SV1 still uses Cray floating point format.

Examples

The value of Pi, π = 3.1415926...₁₀ decimal, which is equivalent to binary 11.001001000011111...₂. When represented in a computer that allocates 17 bits for the mantissa, it will become 0.11001001000011111 × 2². Hence the floating point representation would starts with bits 01100100100001111 and end with bits 01 (which represent the exponent 2 in the binary system). Note: the first zero indicate a positive number, the ending 10₂ = 2₁₀.)

The value of -0.375₁₀ = 0.011₂ or 0.11 × 2^-1. In 2's complement notation, -1 is represented as 11111111 (assuming 8 bits are used in the exponent). In floating point notation, the number with start with a 1 for sign bit, followed by 110000... and then followed by 11111111 at the end, or 1110...011111111 (where ... are zeros).

Note that though the examples in this article used a consistent system of floating-point notation, the notation is different from the IEEE standard. For example, in IEEE 754, the exponent is between the sign bit and the mantissa, not at the end of the number. Also the IEEE exponent uses a biased integer instead of a 2's-complement number. The readers need to understand that the examples serve the purpose of illustrating how floating-point numbers could be representated, but the actual bits shown in the article is different from what an IEEE 754-compliant number would look like. The placement of the bits in the IEEE standard enables two floating point numbers to be compared bitwise (sans sign-bit) to yield a result without interpreting the actual values. The arbituary system used in this article cannot do the same. Some good wikipedians with spare time can rewrite the examples using the IEEE standard if desired, though the current version is good enough as textbook examples for it highlighted all the major components of a floating-point notation. This also illustrated that a non-standard notation system also works as long as it is consistent.