0% found this document useful (0 votes)
93 views7 pages

Floating Point 6up

1) Floating point numbers represent fractions in computers using scientific notation, with a mantissa and exponent. The IEEE 754 standard defines common floating point representations. 2) Floating point numbers use a sign bit, exponent field, and mantissa field. The exponent is stored using bias to represent both positive and negative exponents. 3) The IEEE 754 standard defines single and double precision floating point number formats. It allows for consistent representation of floating point values across systems.

Uploaded by

edemkv
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views7 pages

Floating Point 6up

1) Floating point numbers represent fractions in computers using scientific notation, with a mantissa and exponent. The IEEE 754 standard defines common floating point representations. 2) Floating point numbers use a sign bit, exponent field, and mantissa field. The exponent is stored using bias to represent both positive and negative exponents. 3) The IEEE 754 standard defines single and double precision floating point number formats. It allows for consistent representation of floating point values across systems.

Uploaded by

edemkv
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Outline

  Fractional numbers
  Floating point scientific notation
Floating Point Representation
  Floating point in binary
  IEEE Floating Point Standard
DCS111 Computer Architecture   Behaviour of Floating Point Numbers

Recap: fractions
  Decimal 5.6710 is
  5 x 100 plus
Fractional Numbers   6 x 10-1 plus
  7 x 10–2
… not whole numbers   Binary 11.0112 is
  1 x 21 plus
  1 x 20 plus
  0 x 2-1 plus Quiz: what is
  1 x 2–2 plus 11.0112 in decimal?
  1 x 2–3

Recap: fractions Recap: fractions


Quiz: what is a third as a Quiz: what is a third as a
decimal: N.NNNNN? decimal: N.NNNNN?

  Third is 0.33333…
  Not all numbers can be represented exactly
(with limited digits)

1
Problem Solution 1 – Fixed Point
  How to hold fractions in computers?   Divide bits between whole and fractional parts

0 0 1 1 1 1 0 1

integer bits fractional bits integer bits fractional bits

Point always Quiz: what is this in


in the same decimal?
place

Solution 1 – Fixed Point Evaluation of Fix Point


  Divide bits between whole and fractional parts   Range versus Accuracy
  High accuracy means low range
  High range means low accuracy
  Has uses

integer bits fractional bits


Quiz:
•  What is maximum number?
  Really just scaled integers
range
•  What is difference between   Software library for fixed point numbers
successive numbers? accuracy   No need for special hardware

Scientific (Exponent) Notation Scientific (Exponent) Notation


3.21 x 105 6.54 x 10-5 3.21 x 105 6.54 x 10-5

Mantissa   321,000 and 0.0000654


Exponent
5 -5
  Same accuracy
  Mantissa is a fraction
  Different magnitude   Exponent is an integer
  Both mantissa and exponent can be negative
Quiz: Write these number as decimal, without exponents

2
Normalisation
Advantage of Scientific Notation

}
  Large range   0.002 x 100
  Constant proportional accuracy (… with   0.2 x 10-2
exceptions)   2.0 x 10-3 all the same value
  20 x 10-4

  Normalised number has 1 digit before the point

Binary Floating Point


  1.01 x 22
  1.1 x 2-2
Floating Point in Binary
  Exponent: positive or negative
  Mantissa: positive or negative

Quiz:
•  Effect of negative mantissa?
•  Effect of negative exponent?

Normalised Binary FP Representation (32 bits)‫‏‬


  Sign bit S
  In normalised binary scientific notation
  Exponent E
  1.mmmm…mmm x 2E
  Mantissa M
  unless the number is 0
  1.mmm…mmm is the mantissa
  E is the exponent

exponent fraction (mantissa)‫‏‬


sign

First digit
always 1

3
Representation (32 bits)‫‏‬ Negative exponents - how?
  Sign bit S – 1 bit
  Aim: ALU (Arithmetic Logic Unit) can reuse
  Exponent E – 8 bits integer machinery
  Mantissa M – 23 bits BUT   Eg, comparison with zero: x > 0
  Easy because of sign bit
  Floating point numbers can be easily classified as
negative or positive
exponent fraction (mantissa)‫‏‬
sign
  Comparison of two floating point numbers x<y
not so straightforward...
  (-1)S x 1.M x 2E   choose exponent representation to help
First digit always 1, so
not included

Exponent in 2's Comp ?? Representation of Exponents


  Consider: 1/2 < 1   We want:
  half: 0.1 = 1.0 x 2-1 (normalised)‫‏‬   FP number order to follow (unsigned) bit order
  one: 1.0 = 1.0 x 20 (normalised)‫‏‬   11111111 to represent the highest positive exponent

0 11111111 000 …   Use biased representation

0 00000000 000 …

Bad Design

Bias by N (Excess N)‫‏‬ Bias by N (Excess N)‫‏‬


  Representation of negative numbers used in   Excess 7
floating point numbers
  Numbers in ‘correct’ order 0000 -7 1000 1
0001 -6 1001 2
0010 -5 1010 3
excess-N-rep(X) = unsigned-rep(X + N) 0011 -4 1011 4
0100 -3 1100 5
  Excess 7 0101 -2 1101 6
0110 -1 1110 7
excess-7-rep(-3) = unsigned-rep(-3 + 7)‫‏‬ 0111 0 1111 8
= 0100
excess-7-rep(-7) = 0000 E.g –2 is represented as unsigned(7-2)
excess-7-rep(4) = unsigned-rep(4 + 7)‫‏‬ = unsigned(5)‫‏‬
= 1011 = 0101

4
IEEE 754-1985
  What is IEEE?
  Standard important for
IEEE Standard   exchange of data
  portability of code

  Representation for FP numbers in


  32-bit (single precision)‫‏‬
  64-bit (double precision)‫‏‬

IEEE 32-bit FP IEEE 32-bit FP


  Sign bit S – 1 bit   Sign bit S – 1 bit
  Mantissa M – 23 bits   Mantissa M – 23 bits
  Exponent E – 8 bits
S E M
exponent fraction (mantissa)‫‏‬
sign
  Exponent E – 8 bits
  Bias is 127 (-1)S x (1.M) x 2E-127
  Exponents –126 (00000001) to +127 (11111110)‫‏‬
  Exponents 00000000 and 11111111 special

Example 1 – Convert to FP Example 2 – Convert from FP


  Represent 0.312510 = 5/16   What number is represented by:
  5/16 = 1/4 + 1/16 = 0.01012= 1.01*2-2
0 01111101 010000 ... 000
 S = 0
 S = 0
  E = -2 + bias = -2 + 127 = 12510=01111101
  E = 0111 1101 = 12510
  M = 010....000
  Real exponent = E-bias = 125-127 = -2
  M = 1/4
  (-1)S x (1+M) x 2E-bias
0 01111101 010000 ... 000 = (1 + 1/4) x (1/4)
= 5/16

5
Quiz IEEE FP Extra’s
  What are   Zero
  Both E and M = zero
0 10000001 111000 ... 000   Can be positive or negative

1 01111001 011000 ... 000   +/- Infinity (exponent all 1's)‫‏‬


  De-normalised numbers
  E=0
  Convert to 32 FP using IEEE
  close to zero, exponent is -126
  4.125
  -7.625

Overflow and Underflow


  Overflow
Behaviour of Floating Point   Results too large (positive or negative) to be
Numbers represented
  Underflow
  Result too close to zero (positive or negative) to be
represented

Range – 32 bit FP Range – 32 bit FP


negative zero positive negative zero positive

smallest smallest positive (>0) largest smallest smallest positive (>0) largest
largest negative largest negative

  Quiz: find the largest and smallest FP in IEEE   Largest/smallest +/- (2 – 223) x 2127 ≈ 1038
32-bit   Near zero (normalised numbers)‫‏‬
  +/- 1.0 x 2-126

6
How do they behave? Summary
  If x, y are positive is:   FP scientific notation
  x+y>x ?   Normalised representation in binary
  If x and y are different can:   Bias to represent -ve to +ve range in exponent
  x–y=0?   Notice how a 32-bit binary number can
  Do these rules hold: represent many different entities in memory
  (x + y) + z = x + (y + z) ?   Underflow as well as overflow
  (x * y) * z = x * (y * z) ?
  x * (y + z) = x*y + x*z ?

Different evaluation orders have different rounding errors

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy