300 Float
300 Float
Grinshpan
The fixed point notation, although not without virtues, is usually inadequate for numerical
analysis as it does not allow enough numbers and accuracy.
The floating-point notation is by far more flexible. Any x ̸= 0 may be written in the form
±(1.b1 b2 b3 ...)2 × 2n ,
called the normalized representation of x. The normalized representation is achieved by
choosing the exponent n so that the binary point “floats” to the position after the first
nonzero digit. This is the binary version of scientific notation.
To store a normalized number in 32-bit format one reserves 1 bit for the sign, 8 bits for the
signed exponent, and 23 bits for the portion b1 b2 b3 ...b23 of the fractional part of the
number. The leading bit 1 is not stored (as it is always 1 for a normalized number) and is
referred to as a “hidden bit”.
The 8-bit exponent field is used to store integer exponents −126 ≤ n ≤ 127.
We will discuss later how exactly this is done.
−0.2
−0.4
−0.6
−0.8
−1
0 2 4 6 8 10 12 14 16