0% found this document useful (0 votes)

61 views34 pages

Chapter2 2.5

The document discusses floating point number representation in computers. It begins by explaining that signed integer formats are not suitable for scientific and business applications involving real numbers. It then introduces floating point representation as a solution. The key points made include: - Floating point numbers represent numbers as three fields - a sign bit, exponent, and significand (mantissa). - The IEEE 754 standard defines common floating point formats, including 32-bit single precision and 64-bit double precision. - Special exponent values are used to represent infinity and NaN (not a number) values. - Zero can be represented with both positive and negative sign bits, so testing for equality to zero is problematic.

Uploaded by

iole

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views34 pages

Chapter2 2.5

Uploaded by

iole

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Chapter 2

Floating Point
Numbers
2.5 Floating-Point Representation

The signed magnitude, one s complement,

and two s complement representation that we
have just previously discussed deal with
signed integer values only.
Without modification, these formats are not
useful in scientific or business applications
that deal with real number values.
Floating-point representation solves this
problem.

2
2.5 Floating-Point Representation

If we are clever programmers, we can perform

floating-point calculations using any integer format.
This is called floating-point emulation, because
floating point values aren t stored as such; we just
create programs that make it seem as if floating-
point values are being used.
Most of toda s computers are equipped with
specialized hardware that performs floating-point
arithmetic with no special programming required.
Other than using the provided instruction set of you
CPU architecture
3
2.5 Floating-Point Representation

Floating-point numbers allow an arbitrary

number of decimal places to the right of the
decimal point.
For example: 0.5 0.25 = 0.125
They are often expressed in scientific notation.
For example:
0.125 = 1.25 10-1
5,000,000 = 5.0 106

4
2.5 Floating-Point Representation
Computers use a form of scientific notation for
floating-point representation
Numbers written in scientific notation have three
components:

5
2.5 Floating-Point Representation
Computer representation of a floating-point
number consists of three fixed-size fields:

Or Mantissa

more correct

This is the standard arrangement of these fields.

Note: Although “significand” and “mantissa” do not technically mean the same
thing, many people use these terms interchangeably. We use the term “significand”
to refer to the fractional part of a floating point number.

6
2.5 Floating-Point Representation

The one-bit sign field is the sign of the stored value.

The size of the exponent field determines the range
of values that can be represented.
The size of the significand determines the
precision of the representation.

7
2.5 Floating-Point Representation

We introduce a hypothetical Model to explain the

concepts, after which we will discuss the IEEE-754
one.
In this model:
A floating-point number is 14 bits in length
The exponent field is 5 bits
The significand field is 8 bits

8
2.5 Floating-Point Representation

The significand is always preceded by an implied

binary point. 0
Thus, the significand always contains a fractional
binary value.
The exponent indicates the power of 2 by which the
significand is multiplied.
0 011 x 215
9
2.5 Floating-Point Representation
Example:
Express 3210 in the simplified 14-bit floating-point
model. (1-bit sign, 5-bit exponent, 9-bit significand)
We know that 32 is 25. So in (binary) scientific
notation 32 = 100000 =0.1 x 26.

Using this information, we put 110 (= 610) in the

y
exponent field and 1 in the significand as shown.
Bo
o

10
2.5 Floating-Point Representation
The illustrations shown at
the right are all equivalent
representations for 32
using our simplified
model. 11

Not only do these

synonymous 11
representations waste
space, but they can also
11
cause confusion.

Same

0.12 26 and 0.012 27 same

11
2.5 Floating-Point Representation
To resolve the problem of synonymous forms,
we establish a rule that the first digit of the
significand must be 1, with no ones to the left of
the radix point.
This process, called normalization, results in a
unique pattern for each floating-point number.
In our simple model, all significands must have the
form 0.1xxxxxxxx
Iz
For example, 4.5 = 100.1 x 20 = 1.001 x 22 = 0.1001 x
23. The last expression is correctly normalized.

In our simple instructional model, we use no implied bits.

12
2.5 Floating-Point Representation

Another problem with our system is that we have

made no allowances for negative exponents. We
have no way to express 0.25! (Notice that there is
O
no sign in the exponent field.) solution1 excess M

All of these problems can be fixed with no

changes to our basic model.

13
2.5 Floating-Point Representation

To provide for negative exponents, we will use a

biased exponent. adepend
In our case, we have a 5-bit exponent. ftp.T.seqf.IN
angry_
25-1 1 = 24-1 = 15
Thus will use 15 for our bias: our exponent will use
excess-15 representation. TNT
a
In our model, exponent values less than 15 are
negative, representing purely fractional numbers.

14
2.5 Floating-Point Representation
Example:

D
Express 3210 in the revised 14-bit floating-point model.
eep
e
We know that 32 = 1.0 x 25 = 0.1 x 26. D
d
To use our excess 15 biased exponent, we add 15 to
6, giving 2110 (=101012). 21
61 15
So we have:

8 5
5 0.1 H
10001.101 1.0 24
15
2.5 Floating-Point Representation

Example:
Express 0.062510 in the revised 14-bit floating-point
model.
We know that 0.0625 is 2-4. So in (binary) scientific
notation 0.0625 =0.0001 = 1.0 x 2-4 = 0.1 x 2 -3.
To use our excess 15 biased exponent, we add 15 to
-3, giving 1210 (=011002). 12
15
34
6

16
2.5 Floating-Point Representation
Example:
Express -26.62510 in the revised 14-bit floating-point
e
model.
We find 26.62510 = 11010.1012. Normalizing, we
have: 26.62510 = 0.11010101 x 2 5.
To use our excess 15 biased exponent, we add 15 to
5, giving 2010 (=101002). We also need a 1 in the sign
bit.

17
2.5 Floating-Point Representation

The IEEE has established a standard for

floating-point numbers
excess M 127
The IEEE-754 single precision floating point
standard uses an 8-bit exponent (with a bias of
f
127) and a 23-bit significand.
issi.IT
The IEEE-754 double precision standard uses
an 11-bit exponent (with a bias of 1023) and a

I
52-bit significand.
eisias
them PLIED 1
18
2.5 Floating-Point Representation

In both the IEEE single-precision and double-

precision floating-point standard, the significant has
I
an implied 1 to the LEFT of the radix point.
The format for a significand using the IEEE format is:
1.xxx…
For example, 4.5 = .1001 x 23 in IEEE format is 4.5 =
1.001 x 22. The 1 is implied, which means is does not need
0
to be listed in the significand (the significand would
include only 001).
7
19
2.5 Floating-Point Representation
Example: Express -3.75 as a floating point number
using IEEE single precision.
First, let s normali e according to IEEE rules:
-3.75 = -11.112 = -1.111 x 21
The bias is 127, so we add 127 + 1 = 128 (this is our
exponent)
The first 1 in the significand is implied, so we have:

(implied)

Since we have an implied 1 in the significand, this equates

to
-(1).1112 x 2 (128 127) = -1.1112 x 21 = -11.112 = -3.75.

20
2.5 Floating-Point Representation
Using the IEEE-754 single precision floating point
standard:
An exponent of 255(after adding the bias (all 1 s)) indicates
a special value.

I
• If the significand is zero, the value is infinity.
• If the significand is nonzero, the value is NaN, not a
number, often used to flag an error condition.
Using the double precision standard:
The special exponent value for a double precision number
is 2047, instead of the 255 used by the single precision
standard.

21
2.5 Floating-Point Representation
Both the 14-bit model that we have presented
and the IEEE-754 floating point standard allow
two representations for zero.
Zero is indicated by all zeros in the exponent and the
significand, but the sign bit can be either 0 or 1.
This is why programmers should avoid testing a
T
floating-point value for equality to zero.
no
Negative zero does not equal positive zero.
r
22
2.5 Floating-Point Representation

IEEE Floating-point addition and subtraction

are done using methods analogous to how we
perform calculations using pencil and paper.
The first thing that we do is express both
operands in the same exponential power, then
add the numbers, preserving the exponent in the
sum.
If the exponent requires adjustment, we do so at
the end of the calculation.

23
2.5 Floating-Point Representation
Example: aka at
precised
Find the sum of 1210 and 1.2510 using the 14-bit simple
floating-point model.
We find 1210 = 0.1100 x 2 4. And 1.2510 = 0.101 x 2 1 =
0.000101 x 2 4.
Thus, our sum is
0.110101 x 2 4.

if bit it is added at
carry
24 the end to the exponent
2.5 Floating-Point Representation

Floating-point multiplication is also carried out in

a manner akin to how we perform multiplication
using pencil and paper.
We multiply the two operands and add their
exponents.
If the exponent requires adjustment, we do so at
the end of the calculation.

25
2.5 Floating-Point Representation

No matter how many bits we use in a floating-point

representation, our model must be finite.
The real number system is, of course, infinite, so our
models can give nothing more than an approximation
of a real value.
At some point, every model breaks down, introducing
errors into our calculations.
By using a greater number of bits in our model, we
can reduce these errors, but we can never totally
eliminate them.

27
2.5 Floating-Point Representation
O
• Consider OI
0.2
• 0.1 in decimal. will never
get 1Obean
I

• It cannot be perfectly represented in binary..

• 0.110
0.000110011001100110011 … …

28
2.5 Floating-Point Representation
Our job becomes one of reducing error, or at least
being aware of the possible magnitude of error in
our calculations.
We must also be aware that errors can compound
through repetitive arithmetic operations.
For example, our 14-bit model cannot exactly
represent the decimal value 128.5. In binary, it is 9
bits wide:
10000000.12 = 128.510

29
2.5 Floating-Point Representation

When we try to express 128.510 in our 14-bit model,

we lose the low-order bit, giving a relative error of:

To
128.5 - 128
128.5
0.39%

If we had a procedure that repetitively added 0.5 to

128.5, we would have an error of nearly 2% after only
four iterations.

30
2.5 Floating-Point Representation

Floating-point errors can be reduced when we use

teen
age
operands that are similar in magnitude.
If we were repetitively adding 0.5 to 128.5, it
would have been better to iteratively add 0.5 to
itself and then add 128.5 to this sum.
In this example, the error was caused by loss of
the low-order bit.
Loss of the high-order bit is more problematic.

31
2.5 Floating-Point Representation

Floating-point overflow and underflow can cause

programs to crash.
Overflow occurs when there is no room to store
the high-order bits resulting from a calculation.
Underflow occurs when a value is too small to
store, possibly resulting in division by zero.

Experienced programmers know that it s better for a

program to crash than to have it produce incorrect, but
plausible, results.

32
2.5 Floating-Point Representation

When discussing floating-point numbers, it is

important to understand the terms range,
precision, and accuracy.
The range of a numeric integer format is the
difference between the largest and smallest
values that can be expressed.
Accuracy refers to how closely a numeric
representation approximates a true value.
The precision of a number indicates how much
information we have about a value

33
2.5 Floating-Point Representation

Most of the time, greater precision leads to better

accuracy, but this is not always true.
For example, 3.1333 is a value of pi that is accurate to
two digits, but has 5 digits of precision.
There are other problems with floating point
numbers.
Because of truncated bits, you cannot always
assume that a particular floating point operation is
commutative or distributive.

34
2.5 Floating-Point Representation
This means that we cannot assume:
(a + b) + c = a + (b + c) or
a*(b + c) = ab + ac
Moreover, to test a floating point value for equality to
some other number, it is best to declare a nearness to x
epsilon value. For example, instead of checking to see if
floating point x is equal to 2 as follows:

t
if x = 2 then …
it is better to use:
if (abs(x - 2) < epsilon) then ...
(assuming we have epsilon defined correctly!)

Numerical Methods Binary FloatingPoint Errors
No ratings yet
Numerical Methods Binary FloatingPoint Errors
109 pages
Chapter 1 - Izaac-Wang - Computational Quantum Mechanics (2018)
No ratings yet
Chapter 1 - Izaac-Wang - Computational Quantum Mechanics (2018)
12 pages
08 FloatingPoint
No ratings yet
08 FloatingPoint
52 pages
Week 3
No ratings yet
Week 3
66 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Fixed and Floating Point Representation
No ratings yet
Fixed and Floating Point Representation
5 pages
Lec 06
No ratings yet
Lec 06
49 pages
4 Floating Point Inclass
No ratings yet
4 Floating Point Inclass
33 pages
Floating Point Arithmetic
100% (1)
Floating Point Arithmetic
30 pages
Fixed Point and Floating Point Number Representations
No ratings yet
Fixed Point and Floating Point Number Representations
7 pages
Week 2 Nptel Digital Electronics
No ratings yet
Week 2 Nptel Digital Electronics
74 pages
Floating Point Representation: Reading: B&O 2.4
No ratings yet
Floating Point Representation: Reading: B&O 2.4
44 pages
Installation Diagram Senyang Board
67% (3)
Installation Diagram Senyang Board
14 pages
ENSC254 - Floating Point Computation
No ratings yet
ENSC254 - Floating Point Computation
29 pages
arch1-LECTURE-NUMBER REPRESENTATION
No ratings yet
arch1-LECTURE-NUMBER REPRESENTATION
42 pages
2.4 Floating Point Representation
No ratings yet
2.4 Floating Point Representation
7 pages
Monitoring Active Directory Attacks With Wazuh-4.10
No ratings yet
Monitoring Active Directory Attacks With Wazuh-4.10
20 pages
Cse 321 4 5
No ratings yet
Cse 321 4 5
11 pages
Floating - Point - Number
No ratings yet
Floating - Point - Number
36 pages
Floating Point Arithmetic Class
No ratings yet
Floating Point Arithmetic Class
24 pages
Fixed Point and Floating Point Number Representations
No ratings yet
Fixed Point and Floating Point Number Representations
5 pages
FIXED and FLOAT
No ratings yet
FIXED and FLOAT
8 pages
COA UNIT-III PPTs Dr.G.Bhaskar ECE
No ratings yet
COA UNIT-III PPTs Dr.G.Bhaskar ECE
64 pages
3-EED220 Lecture 3
No ratings yet
3-EED220 Lecture 3
22 pages
Floating Point
No ratings yet
Floating Point
13 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Floating Point & Fixed Point Representation - BCA II
No ratings yet
Floating Point & Fixed Point Representation - BCA II
24 pages
An Fpga Based 64-Bit Ieee - 754 Double Precision Floating Point Adder/Subtractor and Multiplier Using VHDL
No ratings yet
An Fpga Based 64-Bit Ieee - 754 Double Precision Floating Point Adder/Subtractor and Multiplier Using VHDL
11 pages
9-Algorithms For Floating Point Arithmetic Operations-22-01-2024
No ratings yet
9-Algorithms For Floating Point Arithmetic Operations-22-01-2024
49 pages
Floating-Point Numbers
No ratings yet
Floating-Point Numbers
23 pages
Complete Floating Point (Blog)
No ratings yet
Complete Floating Point (Blog)
18 pages
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
No ratings yet
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
32 pages
Fixed and Floating Point Numbers: Dr. Ashish GUPTA Sense, Vit-Ap Ashish - Gupta@vitap - Ac.in
No ratings yet
Fixed and Floating Point Numbers: Dr. Ashish GUPTA Sense, Vit-Ap Ashish - Gupta@vitap - Ac.in
34 pages
Binary Tutorial
No ratings yet
Binary Tutorial
10 pages
Floating Point Representation - M.eng Term Paper
No ratings yet
Floating Point Representation - M.eng Term Paper
6 pages
Programmable Logic Controllers (PLC) : NIOEC-SP-70-21
No ratings yet
Programmable Logic Controllers (PLC) : NIOEC-SP-70-21
19 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
5 pages
Lect4 Floats
No ratings yet
Lect4 Floats
64 pages
Floating Point
No ratings yet
Floating Point
33 pages
The IEEE Standard For Floating Point Arithmetic
No ratings yet
The IEEE Standard For Floating Point Arithmetic
9 pages
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
No ratings yet
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
42 pages
8.3 Floating Point Numbers
No ratings yet
8.3 Floating Point Numbers
19 pages
Floating Point Representation
No ratings yet
Floating Point Representation
7 pages
Lecture 4 - Computer Arithmetic
No ratings yet
Lecture 4 - Computer Arithmetic
18 pages
CSC340 - HW3
No ratings yet
CSC340 - HW3
28 pages
#3 - Floating Point
No ratings yet
#3 - Floating Point
38 pages
Module 2 - PART D Floating
No ratings yet
Module 2 - PART D Floating
30 pages
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
No ratings yet
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
51 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
30 pages
NPTEL CC Assignment11
33% (3)
NPTEL CC Assignment11
4 pages
Floating Points
No ratings yet
Floating Points
31 pages
Mws Gen Aae Spe Floatingpoint
No ratings yet
Mws Gen Aae Spe Floatingpoint
8 pages
Floating Point: - We Need A Way To Represent
No ratings yet
Floating Point: - We Need A Way To Represent
14 pages
Cosc 2150: Computer Organization: Chapter 9, Part 3 Floating Point Numbers
No ratings yet
Cosc 2150: Computer Organization: Chapter 9, Part 3 Floating Point Numbers
39 pages
ELEC2041 Microprocessors and Interfacing Lectures 19: Floating Point Number Representation - I
No ratings yet
ELEC2041 Microprocessors and Interfacing Lectures 19: Floating Point Number Representation - I
24 pages
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
No ratings yet
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
16 pages
Pawa Pasha Ict Training Manual
No ratings yet
Pawa Pasha Ict Training Manual
109 pages
Computer Organization 2: Lab Tutorial 3 Chapter
No ratings yet
Computer Organization 2: Lab Tutorial 3 Chapter
30 pages
1 5 Floating Point Representation
No ratings yet
1 5 Floating Point Representation
9 pages
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
No ratings yet
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
31 pages
Cloud Sect9 Kubernetes F23PA2
No ratings yet
Cloud Sect9 Kubernetes F23PA2
137 pages
Qryptonic Inc. Warns "Q-Day" Could Arrive by 2027, Launches Quantum-Resistant Encryption Solutions To Protect Businesses
No ratings yet
Qryptonic Inc. Warns "Q-Day" Could Arrive by 2027, Launches Quantum-Resistant Encryption Solutions To Protect Businesses
4 pages
Memory Mapping in 64 Bit Mode and Registers: 64 Bit Intel Assembly Language
No ratings yet
Memory Mapping in 64 Bit Mode and Registers: 64 Bit Intel Assembly Language
28 pages
Floating-Point Numbers and Operations Representation
No ratings yet
Floating-Point Numbers and Operations Representation
8 pages
Wolfram Alpha Math Homework
100% (1)
Wolfram Alpha Math Homework
7 pages
Module 3 - Electronic Business (E-Business) Systems
No ratings yet
Module 3 - Electronic Business (E-Business) Systems
45 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
31 pages
M e Cse
No ratings yet
M e Cse
83 pages
Transaction Management Overview: Chapter 16, 3/e
No ratings yet
Transaction Management Overview: Chapter 16, 3/e
67 pages
Diagnose Commands For FortiOS 3 - V2
No ratings yet
Diagnose Commands For FortiOS 3 - V2
65 pages
Software Development Life Cycle (SDLC)
No ratings yet
Software Development Life Cycle (SDLC)
28 pages
Chapter2 1-2 4
No ratings yet
Chapter2 1-2 4
64 pages
SESlides 5
No ratings yet
SESlides 5
21 pages
Java Unit-4 Assignment Answers
No ratings yet
Java Unit-4 Assignment Answers
8 pages
M - Sequence and P - Sequencer in UVM
No ratings yet
M - Sequence and P - Sequencer in UVM
4 pages
Artemis
No ratings yet
Artemis
24 pages
Code Optimization: A Project Report ON
No ratings yet
Code Optimization: A Project Report ON
21 pages
BSC Fyp
No ratings yet
BSC Fyp
30 pages
Exp 8-10 Dld. Lab
No ratings yet
Exp 8-10 Dld. Lab
16 pages
EE234 Final Exam Fall 2023-11 - Annotated
No ratings yet
EE234 Final Exam Fall 2023-11 - Annotated
8 pages
Error Detection and Correction
No ratings yet
Error Detection and Correction
27 pages
Test2 DDWD2713
No ratings yet
Test2 DDWD2713
10 pages
FortiMail Cloud User Portal Guide
No ratings yet
FortiMail Cloud User Portal Guide
6 pages
VFDs++for+HVAC+Application+ +Standard+or+Packaged
No ratings yet
VFDs++for+HVAC+Application+ +Standard+or+Packaged
6 pages
Informatics Midterm-Exam With Answers
No ratings yet
Informatics Midterm-Exam With Answers
6 pages
Pega Customer Service Pricing Matrix PDF
No ratings yet
Pega Customer Service Pricing Matrix PDF
3 pages
Chapter2 2.6
No ratings yet
Chapter2 2.6
10 pages
Practical and Assignment 6
No ratings yet
Practical and Assignment 6
3 pages
Emacs Cheat Sheets
No ratings yet
Emacs Cheat Sheets
2 pages
C++ Shell
No ratings yet
C++ Shell
1 page
PRETEST-ICT - CSS NC II Grade-9 or 11
100% (2)
PRETEST-ICT - CSS NC II Grade-9 or 11
4 pages
1.data Types and Definitions: Abap - Syntax'S
No ratings yet
1.data Types and Definitions: Abap - Syntax'S
15 pages
Mathematics Principles V11
From Everand
Mathematics Principles V11
Clive W. Humphris
No ratings yet
Employability Skills: Brush Up Your Maths
From Everand
Employability Skills: Brush Up Your Maths
Clive W. Humphris
No ratings yet
GCSE Maths Teachers Pack V11
From Everand
GCSE Maths Teachers Pack V11
Clive W. Humphris
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter2 2.5

Uploaded by

Chapter2 2.5

Uploaded by

Chapter 2

The signed magnitude, one s complement,

If we are clever programmers, we can perform

Floating-point numbers allow an arbitrary

This is the standard arrangement of these fields.

The one-bit sign field is the sign of the stored value.

We introduce a hypothetical Model to explain the

The significand is always preceded by an implied

Using this information, we put 110 (= 610) in the

Not only do these

0.12 26 and 0.012 27 same

In our simple instructional model, we use no implied bits.

Another problem with our system is that we have

All of these problems can be fixed with no

To provide for negative exponents, we will use a

The IEEE has established a standard for

In both the IEEE single-precision and double-

Since we have an implied 1 in the significand, this equates

IEEE Floating-point addition and subtraction

Floating-point multiplication is also carried out in

No matter how many bits we use in a floating-point

• It cannot be perfectly represented in binary..

When we try to express 128.510 in our 14-bit model,

If we had a procedure that repetitively added 0.5 to

Floating-point errors can be reduced when we use

Floating-point overflow and underflow can cause

Experienced programmers know that it s better for a

When discussing floating-point numbers, it is

Most of the time, greater precision leads to better

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.