Lecture 6. Fixed and Floating Point Numbers: Prof. Taeweon Suh Computer Science Education Korea University
Lecture 6. Fixed and Floating Point Numbers: Prof. Taeweon Suh Computer Science Education Korea University
Korea Univ
Fixed-Point Numbers
• Fixed point notation has an implied binary point between the integer
and fraction bits
The binary point is not a part of the representation but is implied
Example:
• Fixed-point representation of 6.75 using 4 integer bits and 4 fraction bits:
01101100
0110.1100
22 + 21 + 2-1 + 2-2 = 6.75
Korea Univ
Signed Fixed-Point Numbers
• Example:
-2.375 using 8 bits (4 bits each to represent integer and fractional parts)
• 2.375 = 0010 . 0110
• Sign/magnitude notation: 1010 0110
• Two’s complement notation:
1. flip all the bits: 1101 1001
2. add 1: + 1
1101 1010
Korea Univ
Example
Korea Univ
Fixed-Point Number Systems
Korea Univ
Floating-Point Numbers
• Floating-point number systems circumvent the limitation of having a constant
number of integer and fractional bits
They allow the representation of very large and very small numbers
Korea Univ
Floating-Point Numbers
Korea Univ
Floating-Point Representation #1
Korea Univ
Floating-Point Representation #2
• You may have noticed that the first bit of the mantissa is always 1, since the
binary point floats to the right of the most significant 1
Example: 22810 = 111001002 = 1.11001 × 27
• Thus, storing the most significant 1 (also called the implicit leading 1) is
redundant information
Korea Univ
Floating-Point Representation #3
Korea Univ
Example
Korea Univ
Floating-Point Numbers: Special Cases
• The IEEE 754 standard includes special cases for numbers that are
difficult to represent, such as 0 because it lacks an implicit leading 1
NaN is used for numbers that don’t exist, such as √-1 or log(-5)
Korea Univ
Floating-Point Number Precision
• The IEEE 754 standard also defines 64-bit double-precision that provides
greater precision and greater range
Single-Precision (use the float declaration in C language)
• 32-bit notation
• 1 sign bit, 8 exponent bits, 23 fraction bits
• bias = 127
• It spans a range from ±1.175494 X 10-38 to ±3.402824 X 1038
• Most general purpose processors (including Intel and AMD processors) provide
hardware support for double-precision floating-point numbers and operations
Korea Univ
Double Precision Example
• Represent -5810 using the IEEE 754 double precision
First, convert the decimal number to binary
• 5810 = 1110102 = 1.1101 × 25
Korea Univ
Represent 0.7
……
Korea Univ
Binary Coded Decimal (BCD)
• Since floating-point number systems can’t represent some numbers exactly such
as 0.1 and 0.7, some application (calculators) use BCD (Binary coded decimal)
BCD numbers encode each decimal digit using 4 bits with a range of 0 to 9
Korea Univ
Floating-Point Numbers: Rounding
https://www.youtube.com/watch?
v=Iw77CYUT74c
• Arithmetic results that fall outside of the available precision
must round to a neighboring number
• Rounding modes
Round down
Round up
Round toward zero
Round to nearest
• Example
Round 1.100101 (1.578125) so that it uses only 3 fraction bits
• Round down: 1.100
• Round up: 1.101
• Round toward zero: 1.100
• Round to nearest: 1.101
1.625 is closer to 1.578125 than 1.5 is
Korea Univ
Floating-Point Addition with the Same Sign
Korea Univ
Floating-Point Addition Example
1.5 + 3.25
1.5(10) = 1.1(2) x 20
3.25(10) = 11.01(2) = 1.101(2) x 21
Korea Univ
Floating-Point Addition Example
Korea Univ
Floating-Point Addition Example
3. Compare exponents
127 – 128 = -1, so shift N1 right by 1 bit
5. Add mantissas
0.11 × 21
+ 1.101 × 21
10.011 × 21
Korea Univ
Floating-Point Addition Example
7. Round result
No need (fits in 23 bits)
Korea Univ