0% found this document useful (0 votes)

55 views33 pages

95% Completely Clueless: " of The Folks Out There Are About Floating-Point."

- 95% of people are clueless about floating point numbers according to James Gosling. - Floating point aims to provide standard real number arithmetic for computers while keeping as much precision as possible within fixed size formats. It also helps programmers handle errors in real number calculations. - Scientific notation represents numbers as a significand multiplied by a power of the base. Floating point uses a similar representation with a base of 2 to fit real numbers into binary formats.

Uploaded by

svkarthik83

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views33 pages

95% Completely Clueless: " of The Folks Out There Are About Floating-Point."

Uploaded by

svkarthik83

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

Quote of the day

“95% of the
folks out there are
completely clueless about
floating-point.”

James Gosling
Sun Fellow
Java Inventor
1998-02-28
CS 314 Chapter 3.1 CSE, 2016
Goals for Floating Point
 Standard arithmetic for reals for all computers
 Like two’s complement

 Keep as much precision as possible in formats

 Help programmer with errors in real arithmetic
 +∞, -∞, Not-A-Number (NaN), exponent overflow,
exponent underflow
 Keep encoding that is somewhat compatible with two’s
complement
 E.g., 0 in Fl. Pt. is 0 in two’s complement
 Make it possible to sort without needing to do floating
point comparison

CS 314 Chapter 3.2 CSE, 2016

Scientific Notation (e.g., Base 10)

 Normalized scientific notation (aka standard form or exponential

notation):
 r x Ei, E is exponent (usually 10), i is a positive or negative
integer, r is a real number ≥ 1.0, < 10
 Normalized => No leading 0s
 61 is 6.10 x 102, 0.000061 is 6.10 x10-5

CS 314 Chapter 3.3 CSE, 2016

Scientific Notation (e.g., Base 10)
 (r x ei) x (s x ej) = (r x s) x ei+j
(1.999 x 102) x (5.5 x 103) = (1.999 x 5.5) x 105
= 10.9945 x 105
= 1.09945 x 106
 (r x ei) / (s x ej) = (r / s) x ei-j
(1.999 x 102) / (5.5 x 103) = 0.3634545… x 10-1
= 3.634545… x 10-2
 For addition/subtraction, you first must align:
(1.999 x 102) + (5.5 x 103)
= (.1999 x 103) + (5.5 x 103) = 5.6999 x 103

CS 314 Chapter 3.4 CSE, 2016

Floating Point:
Representing Very Small Numbers

 Zero: Bit pattern of all 0s is encoding for 0.000

 But 0 in exponent should mean most negative
exponent (want 0 to be next to smallest real)
 Can’t use two’s complement (1000 0000two)
 Bias notation: subtract bias from exponent
 Single precision uses bias of 127; DP uses 1023

 0 uses 0000 0000two => 0-127 = -127;

∞, NaN uses 1111 1111two => 255-127 = +128
Smallest SP real can represent: 1.00…00 x 2-126
 Largest SP real can represent: 1.11…11 x 2+127

CS 314 Chapter 3.5 CSE, 2016

Bias Notation (+127)
How it is interpreted How it is encoded

∞, NaN

Getting
closer to
zero

Zero

CS 314 Chapter 3.6 CSE, 2016

What About Real Numbers in Base 2?
r x Ei, E where exponent is (2), i is a positive or
negative integer, r is a real number ≥ 1.0, < 2
 Computers version of normalized scientific notation
called Floating Point notation

CS 314 Chapter 3.7 CSE, 2016

Floating Point Numbers
 32-bit word has 232 patterns, so must be approximation of real
numbers ≥ 1.0, < 2
 IEEE 754 Floating Point Standard:
 1 bit for sign (s) of floating point number

 8 bits for exponent (E)

 23 bits for fraction (F)

(get 1 extra bit of precision if leading 1 is implicit)
(-1)s x (1 + F) x 2E
 Can represent from 2.0 x 10-38 to 2.0 x 1038

CS 314 Chapter 3.8 CSE, 2016

Floating Point Numbers

 What about bigger or smaller numbers?

 IEEE 754 Floating Point Standard:
Double Precision (64 bits)
 1 bit for sign (s) of floating point number

 11 bits for exponent (E)

 52 bits for fraction (F)
(get 1 extra bit of precision if leading 1 is implicit)
(-1)s x (1 + F) x 2E
 Can represent from 2.0 x 10-308 to 2.0 x 10308
 32 bit format called Single Precision

CS 314 Chapter 3.9 CSE, 2016

Representing Big (and Small) Numbers
 What if we want to encode the approx. age of the earth?
4,600,000,000 or 4.6 x 109
or the weight in kg of one a.m.u. (atomic mass unit)
0.0000000000000000000000000166 or 1.6 x 10-27

There is no way we can encode either of the above in a

32-bit integer.

 Floating point representation (-1)sign x F x 2E

 Still have to fit everything in 32 bits (single precision)
s E (exponent) F (fraction)
1 bit 8 bits 23 bits
 The base (2, not 10) is hardwired in the design of the FPALU
 More bits in the fraction (F) or the exponent (E) is a trade-off
between precision (accuracy of the number) and range (size of
the number)
CS 314 Chapter 3.10 CSE, 2016
Exception Events in Floating Point
 Overflow (floating point) happens when a positive
exponent becomes too large to fit in the exponent field
 Underflow (floating point) happens when a negative
exponent becomes too large to fit in the exponent field
-∞ +∞

- largestE -smallestF - largestE +smallestF

+ largestE -largestF + largestE +largestF

 One way to reduce the chance of underflow or overflow

is to offer another format that has a larger exponent field
 Double precision – takes two MIPS words
s E (exponent) F (fraction)
1 bit 11 bits 20 bits
F (fraction continued)
32 bits
CS 314 Chapter 3.11 CSE, 2016
“Father” of the Floating point standard

IEEE Standard 754

for Binary Floating-
Point Arithmetic.

1989
ACM Turing
Award Winner! Prof. Kahan

www.cs.berkeley.edu/~wkahan/
…/ieee754status/754story.html
CS 314 Chapter 3.12 CSE, 2016
IEEE 754 FP Standard
 Most (all?) computers these days conform to the IEEE 754
floating point standard (-1)sign x (1+F) x 2E-bias
 Formats for both single and double precision
 F is stored in normalized format where the msb in F is 1 (so there
is no need to store it!) – called the hidden bit
 To simplify sorting FP numbers, E comes before F in the word and
E is represented in excess (biased) notation where the bias is -127
(-1023 for double precision) so the most negative is 00000001 =
21-127 = 2-126 and the most positive is 11111110 = 2254-127 = 2+127

 Examples (in normalized format)

 Smallest+: 0 00000001 1.00000000000000000000000 = 1 x 21-127
 Zero: 0 00000000 00000000000000000000000 = true 0
 Largest+: 0 11111110 1.11111111111111111111111 =
2-2-23 x 2254-127
 1.02 x 2-1 = 0 01111110 1.00000000000000000000000
 0.7510 x 24 = 0 10000010 1.10000000000000000000000
CS 314 Chapter 3.14 CSE, 2016
Ex: Converting Binary FP to Decimal
BEE00000H is the hex. Rep. Of an IEEE 754 SP FP number

10111 1101 110 0000 0000 0000 0000 0000

(-1)S x (1 + Significand) x 2(Exponent-127)
°Sign: 1 => negative
°Exponent:
• 0111 1101two = 125ten
• Bias adjustment: 125 - 127 = -2
°Significand:
1 + 1x2-1+ 1x2-2 + 0x2-3 + 0x2-4 + 0x2-5 +...
=1+2-1 +2-2 = 1+0.5 +0.25 = 1.75
°Represents: -1.75tenx2-2 = -0.4375 (= -4.375x10-1 )
CS 314 Chapter 3.15 CSE, 2016
Ex: Converting Decimal to FP
-1.275 x 101
1. Denormalize: -12. 75
2. Convert integer part:
12 = 8 + 4 = 11002
3. Convert fractional part:
.75 = .5 + .25 = .112
4. Put parts together and normalize:
1100.11 = 1.10011 x 23
5. Convert exponent: 127 + 3 = 128 + 2 = 1000 00102

11000 0010 100 1100 0000 0000 0000 0000

The Hex rep. is C14C0000H
CS 314 Chapter 3.16 CSE, 2016
Representation for 0

How to represent 0?
exponent: all zeros
significand: all zeros
What about sign? Both cases valid.
+0: 0 00000000 00000000000000000000000
-0: 1 00000000 00000000000000000000000

CS 314 Chapter 3.17 CSE, 2016

Representation for +∞/-∞ ∞ ：infinity

How to represent +∞/-∞?

• Exponent : all ones (11111111B = 255)
• Significand: all zeros
+∞ : 0 11111111 00000000000000000000000
-∞ : 1 11111111 00000000000000000000000
Operations
5 / 0 = +∞, -5 / 0 = -∞
5+(+∞) = +∞, (+∞)+(+∞) = +∞
5 - (+∞) = -∞, (-∞) - (+∞) = -∞ etc

CS 314 Chapter 3.18 CSE, 2016

Representation for “Not a Number”
Sqrt (- 4.0) = ? 0/0 = ?
 Called Not a Number (NaN) - “非数”
How to represent NaN
Exponent = 255
Significand: nonzero
NaNs can help with debugging
Operations
sqrt (-4.0) = NaN 0/0 = NaN
op (NaN,x) = NaN +∞+(-∞) = NaN
+∞- (+∞) = NaN ∞/∞ = NaN
etc.
CS 314 Chapter 3.19 CSE, 2016
Representation for Denorms(非规格化数)

What have we defined so far? (for SP)

Exponent Significand Object Used to represent

Denormalized
0 0 +/-0 numbers

0 nonzero Denorms
1-254 anything Norms
implicit leading 1
255 0 +/- infinity
255 nonzero NaN

CS 314 Chapter 3.20 CSE, 2016

Group Discussion 1: Questions about IEEE 754
Four students form a group and discuss the following
question.
 What about following type converting: will it output
true?
if ( i == (int) ((float) i) ) {
printf (“true”);
}
if ( f == (float) ((int) f) ) {
printf (“true”);
}
CS 314 Chapter 3.21 CSE, 2016
Question II about IEEE 754

 How about FP add associative? (X+Y)+Z=X+(Y+Z)

x = – 1.5 x 1038, y = 1.5 x 1038, z = 1.0
(x+y)+z = (–1.5x1038+1.5x1038 ) +1.0 = 1.0
x+(y+z) = –1.5x1038+ (1.5x1038+1.0) = 0.0

CS 314 Chapter 3.22 CSE, 2016

IEEE 754 FP Standard Encoding
 Special encodings are used to represent unusual events
 ± infinity for division by zero
 NAN (not a number) for the results of invalid operations such as
0/0
 True zero is the bit string all zero

Single Precision Double Precision Object

E (8) F (23) E (11) F (52) Represented
0000 0000 0 0000 … 0000 0 true zero (0)
0000 0000 nonzero 0000 … 0000 nonzero ± denormalized
number
0111 1111 to anything 0111 …1111 to anything ± floating point
+127,-126 +1023,-1022 number
1111 1111 +0 1111 … 1111 -0 ± infinity
1111 1111 nonzero 1111 … 1111 nonzero not a number
(NaN)

CS 314 Chapter 3.23 CSE, 2016

Support for Accurate Arithmetic
 IEEE 754 FP rounding modes
 Always round up (toward +∞)
 Always round down (toward -∞)
 Truncate
 Round to nearest even (when the Guard || Round || Sticky are
100) – always creates a 0 in the least significant (kept) bit of F

 Rounding (except for truncation) requires the hardware to

include extra F bits during calculations
 Guard bit – used to provide one F bit when shifting left to normalize
a result (e.g., when normalizing F after division or subtraction)
 Round bit – used to improve rounding accuracy
 Sticky bit – used to support Round to nearest even; is set to a 1
whenever a 1 bit shifts (right) through it (e.g., when aligning F
during addition/subtraction)
F = 1 . xxxxxxxxxxxxxxxxxxxxxxx G R S
CS 314 Chapter 3.24 CSE, 2016
Floating Point Addition
 Addition (and subtraction)
(F1  2E1) + (F2  2E2) = F3  2E3
 Step 0: Restore the hidden bit in F1 and in F2
 Step 1: Align fractions by right shifting F2 by E1 - E2 positions
(assuming E1  E2) keeping track of (three of) the bits shifted out
in G R and S
 Step 2: Add the resulting F2 to F1 to form F3
 Step 3: Normalize F3 (so it is in the form 1.XXXXX …)
- If F1 and F2 have the same sign  F3 [1,4)  1 bit right shift F3
and increment E3 (check for overflow)
- If F1 and F2 have different signs  F3 may require many left shifts
each time decrementing E3 (check for underflow)
 Step 4: Round F3 and possibly normalize F3 again
 Step 5: Rehide the most significant bit of F3 before storing the
result

CS 314 Chapter 3.25 CSE, 2016

Floating Point Addition Example
 Add
(0.5 = 1.0000  2-1) + (-0.4375 = -1.1100 2-2)
 Step 0: Hidden bits restored in the representation above
 Step 1: Shift significand with the smaller exponent (1.1100) right
until its exponent matches the larger exponent (so once)

 Step 2: Add significands

1.0000 + (-0.111) = 1.0000 – 0.111 = 0.001

 Step 3: Normalize the sum, checking for exponent over/underflow

0.001 x 2-1 = 0.010 x 2-2 = .. = 1.000 x 2-4

 Step 4: The sum is already rounded, so we’re done

 Step 5: Rehide the hidden bit before storing

CS 314 Chapter 3.27 CSE, 2016

Exercise
 Given A=2.6125×101, B=4.150390625×10-1, Calculate
the sum of A and B by hand, assuming A and B are
stored by the following format, Assume 1 guard, 1 round
bit, and 1 sticky bit, and round to the nearest even. Show
all the steps.

Sign Exponent Fraction

1 bit 5 bits 10 bits
S E F

CS 314 Chapter 3.28 CSE, 2016

 Solution:
a.
2.6125×101 + 4.150390625×10–1
2.6125×101 = 26.125 = 11010.001 = 1.1010001000×24
4.150390625×10–1 = .4150390625 = .011010100111
=1.1010100111×2–2
Shift binary point 6 to the left to align exponents,
GR
1.1010001000 00
+.0000011010 10 0111 (Guard = 1, Round = 0, Sticky = 1)
--------------------
1.1010100010 10
In this case the extra bits (G,R,S) are more than half of the least significant bit
(0).
Thus, the value is rounded up.
1.1010100011 × 24 = 11010.100011 × 20 = 26.546875
= 2.6546875 × 101

CS 314 Chapter 3.29 CSE, 2016

Floating Point Multiplication
 Multiplication
(F1  2E1) x (F2  2E2) = F3  2E3
 Step 0: Restore the hidden bit in F1 and in F2
 Step 1: Add the two (biased) exponents and subtract the bias
from the sum, so E1 + E2 – 127 = E3
also determine the sign of the product (which depends on the
sign of the operands (most significant bits))
 Step 2: Multiply F1 by F2 to form a double precision F3
 Step 3: Normalize F3 (so it is in the form 1.XXXXX …)
- Since F1 and F2 come in normalized  F3 [1,4)  1 bit right shift
F3 and increment E3
- Check for overflow/underflow
 Step 4: Round F3 and possibly normalize F3 again
 Step 5: Rehide the most significant bit of F3 before storing the
result
CS 314 Chapter 3.30 CSE, 2016
Floating Point Multiplication Example
 Multiply
(0.5 = 1.0000  2-1) x (-0.4375 = -1.1100 2-2)
 Step 0: Hidden bits restored in the representation above
 Step 1: Add the exponents (not in bias would be -1 + (-2) = -3
and in bias would be (-1+127) + (-2+127) – 127 = (-1
-2) + (127+127-127) = -3 + 127 = 124

 Step 2: Multiply the significands

1.0000 x 1.110 = 1.110000
 Step 3: Normalized the product, checking for exp over/underflow
1.110000 x 2-3 is already normalized

 Step 4: The product is already rounded, so we’re done

 Step 5: Rehide the hidden bit before storing

CS 314 Chapter 3.32 CSE, 2016

MIPS Floating Point Instructions
 MIPS has a separate Floating Point Register File
($f0, $f1, …, $f31) (whose registers are used in
pairs for double precision values) with special instructions
to load to and store from them
lwcl $f1,54($s2) #$f1 = Memory[$s2+54]
swcl $f1,58($s4) #Memory[$s4+58] = $f1
 And supports IEEE 754 single
add.s $f2,$f4,$f6 #$f2 = $f4 + $f6
and double precision operations
add.d $f2,$f4,$f6 #$f2||$f3 =
$f4||$f5 + $f6||$f7
similarly for sub.s, sub.d, mul.s, mul.d, div.s,
div.d
CS 314 Chapter 3.33 CSE, 2016
MIPS Floating Point Instructions, Con’t
 And floating point single precision comparison operations
c.x.s $f2,$f4 #if($f2 < $f4) cond=1;
else cond=0
where x may be eq, neq, lt, le, gt, ge
and double precision comparison operations
c.x.d $f2,$f4 #$f2||$f3 < $f4||$f5
cond=1; else cond=0
 And floating point branch operations
bclt 25 #if(cond==1)
go to PC+4+25
bclf 25 #if(cond==0)
go to PC+4+25
CS 314 Chapter 3.34 CSE, 2016
Frequency of Common MIPS Instructions
 Only included those with >3% and >1%
SPECint SPECfp SPECint SPECfp
addu 5.2% 3.5% add.d 0.0% 10.6%
addiu 9.0% 7.2% sub.d 0.0% 4.9%
or 4.0% 1.2% mul.d 0.0% 15.0%
sll 4.4% 1.9% add.s 0.0% 1.5%
lui 3.3% 0.5% sub.s 0.0% 1.8%
lw 18.6% 5.8% mul.s 0.0% 2.4%
sw 7.6% 2.0% l.d 0.0% 17.5%
lbu 3.7% 0.1% s.d 0.0% 4.9%
beq 8.6% 2.2% l.s 0.0% 4.2%
bne 8.4% 1.4% s.s 0.0% 1.1%
slt 9.9% 2.3% lhu 1.3% 0.0%
slti 3.1% 0.3%
sltu 3.4% 0.8%
CS 314 Chapter 3.35 CSE, 2016
Assignment III
 3.6, 3.8, 3.11, 3.14
 Coding Assignment
 Objective: Understanding the applications of IEEE 754 floating points in real-
world machine
 Task 1: In your machine, what is the accuracy for single precision and
double precision (or the number of bits required for single/double precision
floating)? Please use a simple program to demonstrate it.
 Task 2: Run a program to obtain the results of “-8.0/0”and“sqrt（-4.0）”in
your machine.
 Reports:
 1. Submit your codes and execution results by printing your screen.
 2. Answer the following questions:
 1)What are the accuracy of float and double in your machine.
 2)How to represent infinite and NAN in your machine.

 Due: Nov. 17
CS 314 Chapter 3.36 CSE, 2016

Floating Point Arithmetic
100% (1)
Floating Point Arithmetic
30 pages
Lec 06
No ratings yet
Lec 06
49 pages
Floating Point Representation Part IV With Anno
No ratings yet
Floating Point Representation Part IV With Anno
101 pages
Demystifying Floating Point - John Farrier - CppCon 2015
No ratings yet
Demystifying Floating Point - John Farrier - CppCon 2015
61 pages
Class03 cs230s22
No ratings yet
Class03 cs230s22
33 pages
08 FloatingPoint
No ratings yet
08 FloatingPoint
52 pages
Lecture5 - Arithmetic For Computers - Part 2
No ratings yet
Lecture5 - Arithmetic For Computers - Part 2
57 pages
15 - Floating Point Encoding
No ratings yet
15 - Floating Point Encoding
17 pages
Floating Point Arithmetic Class
No ratings yet
Floating Point Arithmetic Class
24 pages
4 Floating Point Inclass
No ratings yet
4 Floating Point Inclass
33 pages
L1 FloatingPointNumbers Intro
No ratings yet
L1 FloatingPointNumbers Intro
17 pages
L2-Variables and Floating Point Number System
No ratings yet
L2-Variables and Floating Point Number System
38 pages
5 Data - Floating - Point v1
No ratings yet
5 Data - Floating - Point v1
25 pages
Chapter3 3
No ratings yet
Chapter3 3
13 pages
Cao Journal Review - Merged
No ratings yet
Cao Journal Review - Merged
13 pages
CH03 Data II
No ratings yet
CH03 Data II
31 pages
ML System Optimization Lecture 11 Quantization
No ratings yet
ML System Optimization Lecture 11 Quantization
150 pages
LEC03 Data II
No ratings yet
LEC03 Data II
45 pages
COA UNIT-III PPTs Dr.G.Bhaskar ECE
No ratings yet
COA UNIT-III PPTs Dr.G.Bhaskar ECE
64 pages
AWS Solution Architect Class Notes
100% (2)
AWS Solution Architect Class Notes
22 pages
Summary of Integer Arithmetic and ALU: - Addition
No ratings yet
Summary of Integer Arithmetic and ALU: - Addition
22 pages
Floating - Point - Number
No ratings yet
Floating - Point - Number
36 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Floating Point
No ratings yet
Floating Point
33 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Svcet: 1. Explain Carry Look Ahead Adders in Detail
No ratings yet
Svcet: 1. Explain Carry Look Ahead Adders in Detail
10 pages
Lect4 Floats
No ratings yet
Lect4 Floats
64 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
30 pages
Floating Point & Fixed Point Representation - BCA II
No ratings yet
Floating Point & Fixed Point Representation - BCA II
24 pages
Cosc 2150: Computer Organization: Chapter 9, Part 3 Floating Point Numbers
No ratings yet
Cosc 2150: Computer Organization: Chapter 9, Part 3 Floating Point Numbers
39 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
Cse 321 4 5
No ratings yet
Cse 321 4 5
11 pages
Floating-Point Numbers
No ratings yet
Floating-Point Numbers
23 pages
CH08.2-Computer Arithmetic
No ratings yet
CH08.2-Computer Arithmetic
14 pages
Ece552 10 Floating Point
No ratings yet
Ece552 10 Floating Point
15 pages
Unit Ii
No ratings yet
Unit Ii
10 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
31 pages
ELEC2041 Microprocessors and Interfacing Lectures 21: Floating Point Number Representation - III
No ratings yet
ELEC2041 Microprocessors and Interfacing Lectures 21: Floating Point Number Representation - III
31 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
30 pages
ARCh Presentation1
No ratings yet
ARCh Presentation1
12 pages
CSC340 - HW3
No ratings yet
CSC340 - HW3
28 pages
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
No ratings yet
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
31 pages
8.3 Floating Point Numbers
No ratings yet
8.3 Floating Point Numbers
19 pages
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
No ratings yet
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
34 pages
COA - Unit2 Floating Point Arithmetic 3
No ratings yet
COA - Unit2 Floating Point Arithmetic 3
19 pages
Floating Point: - We Need A Way To Represent
No ratings yet
Floating Point: - We Need A Way To Represent
14 pages
HW 4 Sol
No ratings yet
HW 4 Sol
10 pages
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
No ratings yet
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
16 pages
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
No ratings yet
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
42 pages
ScadaBR-Developers - CERTI - ScadaBR2
100% (1)
ScadaBR-Developers - CERTI - ScadaBR2
20 pages
ENSC254 - Floating Point Computation
No ratings yet
ENSC254 - Floating Point Computation
29 pages
100+ Tools To Stay Anonymous Online
100% (1)
100+ Tools To Stay Anonymous Online
6 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Booth and Radix-4 Questions
No ratings yet
Booth and Radix-4 Questions
8 pages
This Unit: Arithmetic and ALU Design Floating Point Arithmetic
No ratings yet
This Unit: Arithmetic and ALU Design Floating Point Arithmetic
8 pages
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
No ratings yet
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
51 pages
Floating Point Representation of Numbers: Wide Range
No ratings yet
Floating Point Representation of Numbers: Wide Range
11 pages
Fixed & Floating Point
No ratings yet
Fixed & Floating Point
31 pages
Internship Report
No ratings yet
Internship Report
21 pages
IEEE 754 Floating Point Notes
No ratings yet
IEEE 754 Floating Point Notes
4 pages
Implementation of IEEE 754 Compliant Single Precision Floating-Point Adder Unit Supporting Denormal Inputs On Xilinx FPGA
No ratings yet
Implementation of IEEE 754 Compliant Single Precision Floating-Point Adder Unit Supporting Denormal Inputs On Xilinx FPGA
5 pages
Chap 02
No ratings yet
Chap 02
16 pages
RTOS Class Notes
100% (1)
RTOS Class Notes
15 pages
ccs341 Data Warehouse Lab Experiments
No ratings yet
ccs341 Data Warehouse Lab Experiments
26 pages
Data Structures Mcqs
No ratings yet
Data Structures Mcqs
27 pages
2 - Q4 TLE Computer Operations 10
No ratings yet
2 - Q4 TLE Computer Operations 10
23 pages
Prequel 2
No ratings yet
Prequel 2
2 pages
FRAM Utilities UsersGuide
No ratings yet
FRAM Utilities UsersGuide
70 pages
ISACA Kenya Cyber Crime and Digital Forensics PDF
No ratings yet
ISACA Kenya Cyber Crime and Digital Forensics PDF
36 pages
Sicam Power Quality and Measurement: Catalog
100% (1)
Sicam Power Quality and Measurement: Catalog
42 pages
Fraud Detection in Financial Transaction Project
No ratings yet
Fraud Detection in Financial Transaction Project
1 page
Authorization Management: at The Customer Site
No ratings yet
Authorization Management: at The Customer Site
20 pages
Embeded Linux
100% (1)
Embeded Linux
55 pages
DoubleTakeAvailabilityLinuxUsersGuide 7.1.2
No ratings yet
DoubleTakeAvailabilityLinuxUsersGuide 7.1.2
302 pages
Neural Networks Unit 3
No ratings yet
Neural Networks Unit 3
93 pages
Chapter-1 Introduction To PLC: Types of PLC Avialable in Market
No ratings yet
Chapter-1 Introduction To PLC: Types of PLC Avialable in Market
50 pages
GSM Channels
No ratings yet
GSM Channels
44 pages
Inbound Integration Process (Lookup Integration)
No ratings yet
Inbound Integration Process (Lookup Integration)
4 pages
Lecture 6 - Email and Mobile Marketing
No ratings yet
Lecture 6 - Email and Mobile Marketing
19 pages
Glo Settings
No ratings yet
Glo Settings
4 pages
Sentiment Analysis For Customer Feedback
No ratings yet
Sentiment Analysis For Customer Feedback
1 page
Week 5 Reengineering
No ratings yet
Week 5 Reengineering
17 pages
Test Script Purchasing Noor GroupV1
No ratings yet
Test Script Purchasing Noor GroupV1
9 pages
Curriculam Vitae: Anand Kumar - Peela: 91-9030189253 Objective
No ratings yet
Curriculam Vitae: Anand Kumar - Peela: 91-9030189253 Objective
3 pages
PLDT Serbilis: AKA QIK Project
No ratings yet
PLDT Serbilis: AKA QIK Project
17 pages
New Data Warehouse Lab Manual
No ratings yet
New Data Warehouse Lab Manual
19 pages
Course Outcomes
No ratings yet
Course Outcomes
1 page
3 Relational Keys
No ratings yet
3 Relational Keys
19 pages
SSOID - Icegate E-Mail ID Creation Template 2
No ratings yet
SSOID - Icegate E-Mail ID Creation Template 2
9 pages
9691 Computing: MARK SCHEME For The October/November 2008 Question Paper
No ratings yet
9691 Computing: MARK SCHEME For The October/November 2008 Question Paper
6 pages
A Bms Client and Gateway Using Bacnet Protocol: Abstract. A Building Management System (BMS) Is A Computer-Based Control
No ratings yet
A Bms Client and Gateway Using Bacnet Protocol: Abstract. A Building Management System (BMS) Is A Computer-Based Control
2 pages
FDTP On Web Technology UCET
No ratings yet
FDTP On Web Technology UCET
2 pages
Dolby Multichannel Amplifier Product Sheet
No ratings yet
Dolby Multichannel Amplifier Product Sheet
2 pages
Invoice: WPS Canada Inc
No ratings yet
Invoice: WPS Canada Inc
2 pages
Principles of Digital Electronics
From Everand
Principles of Digital Electronics
Sapana Rane
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

95% Completely Clueless: " of The Folks Out There Are About Floating-Point."

Uploaded by

95% Completely Clueless: " of The Folks Out There Are About Floating-Point."

Uploaded by

Quote of the day

 Keep as much precision as possible in formats

CS 314 Chapter 3.2 CSE, 2016

 Normalized scientific notation (aka standard form or exponential

CS 314 Chapter 3.3 CSE, 2016

CS 314 Chapter 3.4 CSE, 2016

 Zero: Bit pattern of all 0s is encoding for 0.000

 0 uses 0000 0000two => 0-127 = -127;

CS 314 Chapter 3.5 CSE, 2016

CS 314 Chapter 3.6 CSE, 2016

CS 314 Chapter 3.7 CSE, 2016

 8 bits for exponent (E)

 23 bits for fraction (F)

CS 314 Chapter 3.8 CSE, 2016

 What about bigger or smaller numbers?

 11 bits for exponent (E)

CS 314 Chapter 3.9 CSE, 2016

There is no way we can encode either of the above in a

 Floating point representation (-1)sign x F x 2E

- largestE -smallestF - largestE +smallestF

 One way to reduce the chance of underflow or overflow

IEEE Standard 754

 Examples (in normalized format)

10111 1101 110 0000 0000 0000 0000 0000

11000 0010 100 1100 0000 0000 0000 0000

CS 314 Chapter 3.17 CSE, 2016

How to represent +∞/-∞?

CS 314 Chapter 3.18 CSE, 2016

What have we defined so far? (for SP)

Exponent Significand Object Used to represent

CS 314 Chapter 3.20 CSE, 2016

 How about FP add associative? (X+Y)+Z=X+(Y+Z)

CS 314 Chapter 3.22 CSE, 2016

Single Precision Double Precision Object

CS 314 Chapter 3.23 CSE, 2016

 Rounding (except for truncation) requires the hardware to

CS 314 Chapter 3.25 CSE, 2016

 Step 2: Add significands

 Step 3: Normalize the sum, checking for exponent over/underflow

 Step 4: The sum is already rounded, so we’re done

 Step 5: Rehide the hidden bit before storing

CS 314 Chapter 3.27 CSE, 2016

Sign Exponent Fraction

CS 314 Chapter 3.28 CSE, 2016

CS 314 Chapter 3.29 CSE, 2016

 Step 2: Multiply the significands

 Step 4: The product is already rounded, so we’re done

 Step 5: Rehide the hidden bit before storing

CS 314 Chapter 3.32 CSE, 2016

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.