0% found this document useful (0 votes)
47 views2 pages

300 Float

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views2 pages

300 Float

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Numerical Analysis

Grinshpan

Fixed-point and floating-point representations of numbers


A fixed-point representation of a number may be thought to consist of 3 parts: the sign
field, integer field, and fractional field. One way to store a number using a 32-bit format is
to reserve 1 bit for the sign, 15 bits for the integer part and 16 bits for the fractional part.
A number whose representation exceeds 32 bits would have to be stored inexactly.

On a computer, 0 is used to represent + and 1 is used to represent −.

Example. The 32-bit string


1 | 000000000101011 | 1010000000000000
represents (−101011.101)2 = −43.625.

The fixed point notation, although not without virtues, is usually inadequate for numerical
analysis as it does not allow enough numbers and accuracy.

Example. In the format just discussed, the largest number is


0 | 111111111111111 | 1111111111111111
or (215 − 1) + (1 − 2−16 ) = 215 (1 − 2−31 ) ≈ 32768, and the smallest positive number is
0 | 000000000000000 | 0000000000000001
or 2−16 ≈ 0.000015 . Note that 2−16 is precisely the gap between two adjacent fixed-point
numbers.

The floating-point notation is by far more flexible. Any x ̸= 0 may be written in the form
±(1.b1 b2 b3 ...)2 × 2n ,
called the normalized representation of x. The normalized representation is achieved by
choosing the exponent n so that the binary point “floats” to the position after the first
nonzero digit. This is the binary version of scientific notation.

To store a normalized number in 32-bit format one reserves 1 bit for the sign, 8 bits for the
signed exponent, and 23 bits for the portion b1 b2 b3 ...b23 of the fractional part of the
number. The leading bit 1 is not stored (as it is always 1 for a normalized number) and is
referred to as a “hidden bit”.

The 8-bit exponent field is used to store integer exponents −126 ≤ n ≤ 127.
We will discuss later how exactly this is done.

Example. The 32-bit string


1 | 8 bits storing n=5 | 10101100000000000000000
represents (−1.101011)2 × 25 = (−110101.1)2 = −53.5.
Example. The 32-bit word
0 | 8 bits storing n=0 | 00000000000000000000000
represents (1.0)2 = 1.
Example. The largest normalized number that fits into 32 bits is
0 | 8 bits storing n=127 | 11111111111111111111111
or (1.11111111111111111111111)2 × 2127 = (224 − 1)2104 ≈ 3.40 × 1038 .
The smallest normalized positive number that fits into 32 bits is
0 | 8 bits storing n=-126 | 00000000000000000000000
or (1.00000000000000000000000)2 × 2−126 = 2−126 ≈ 1.18 × 10−38 .
The
1 precision of a floating-point format is the number of positions reserved for binary digits
plus one (for the hidden bit). In the examples considered here the precision is 23+1=24.
One says that x is a floating-point number if it can be represented exactly using a given
0.8
floating-point format. For instance, 1/3 = (0.010101 . . . )2 cannot be a floating-point
number as its binary representation is nonterminating.
0.6
The gap between 1 and the next normalized floating-point number is known as machine
epsilon. In our setting, this gap is (1 + 2−23 ) − 1 = 2−23 . Note that this is not the same as
0.4
the smallest positive floating-point number. Unlike in the fixed-point scenario, the spacing
between the floating-point numbers is not uniform, but varies from one dyadic interval
[2n , 2n+1 ) to another. As we move away from the origin, the spacing becomes less dense:
0.2

−0.2

−0.4

−0.6

−0.8

−1
0 2 4 6 8 10 12 14 16

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy