0% found this document useful (0 votes)

29 views6 pages

Floating Point

This document explains how floating point numbers are represented and how to convert between decimal and binary floating point formats. Floating point numbers use a sign bit, exponent bits, and fraction bits. The exponent indicates the power of two to scale the fraction, which represents a binary number less than one. Normalized values have an implied leading one, while denormalized values do not. Special cases like infinity and NaN are also covered.

Uploaded by

Sir Bob

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views6 pages

Floating Point

Uploaded by

Sir Bob

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

How To Floating Point

Basic structure

Floating-point numbers are represented in 3 parts: a sign bit, exponent bits, and the fractional bits
(denoted here as the mantissa, but they’re slightly different as explained later). A 32-bit float looks like
this:

However, floats need not follow this exact format. The following instructions work for exponent/fractional
parts of any size. This would include 64-bit floats (1 sign bit, 11 exponent bits, and 52 fractional bits), or
even imaginary formats (like a made-up format that has 1 sign bit, 4 exponent bits, and 5 fractional bits).

Why like this??

One way you can sort of think of this structure is basically like scientific notation, but for binary - the
fractional bit represents some really precise significant numbers, and the exponent bit represents bit shifts
used to scale the number a desired power.

How to turn one of these into a decimal number

There are 3 scenarios, determined by the contents of the exponent bits.

If exponent section has mix of zeroes and ones (normalized)

Example: 0 0111111 11000000000000000000000

1. Find what the exponent bits equal if interpreted as an unsigned int. We will call this e. In the above
example, e = 0111111 = 127.

2. Find the bias. bias = 2(number of exponent bits)−1

− 1. In the above example, bias = 28−1 − 1 = 127.

3. Find the biased exponent. We will call this E. E = e − bias. In the example, E = 127 − 127 = 0.

4. Find the value of the fractional bits. We will call this f .

1
1 1 0 0 ... 0
f = = 2−1 + 2−2 = 0.75
2−1 2−2 2−3 2−4 ... 2 −23

5. Find the mantissa/significand M . For normalized values, this is M = f + 1 (known as the implicit
1). In our example, M = 0.75 + 1 = 1.75.

6. Find the sign of the number, as denoted by the sign bit. If it is 0, then the number is positive. If it
is 1, then the number is negative. This is manifested by taking (−1)s , where s is the sign bit.

7. Finally, we put all of this together with the equation (−1)s ∗ 2E ∗ M . In our example:
(−1)s ∗ 2E ∗ M
= (−1)0 ∗ 20 ∗ 1.75
= 1 ∗ 1 ∗ 1.75
= 1.75

Tada! The binary floating-point number 0011111111000000000000000000000 equals 1.75.

If exponent section is all 0 (denormalized)

Example: 0 00000000 11000000000000000000000

1. The bias is found the same way as before, 2(number of exponent bits)−1
− 1. In this example, bias =
28−1 − 1 = 127.

2. The biased exponent E is now 1 − bias. In this example, E = 1 − 127 = −126.

3. Find the value of the fractional bits, same as with normalized. The fractional bits in this example
are the same as the normalized example, so we’ll skip calculation. f = 0.75.

4. However, the mantissa for denormalized numbers does not have an implicit 1. Therefore M = f =
0.75.

5. Finally, we put all of this together with the same equation as before - (−1)s ∗2E ∗M . In this example:
(−1)s ∗ 2E ∗ M
= (−1)0 ∗ 2−126 ∗ 0.75
= 1 ∗ 2−126 ∗ 0.75
= Really really small and long number that I won’t put here because it’s too long.

Denormalized numbers are used to represent really really small numbers, as you can see here.

If exponent section is all 1 (special)

If the fractional bits are all zeroes, then the number is infinity. Sign bit determines if positive or
negative infinity in the same way it does with numbers.
Example: 0 11111111 00000000000000000000000 = Infinity
Example: 1 11111111 00000000000000000000000 = -Infinity

If there is anything at all in the fractional part, then it’s NaN (not a number).
Example: 0 11111111 00000000000000000000001 = NaN

2
How to turn a decimal number into one of these

Normalized Example: Represent 5.375 in binary with the IEEE floating-point standard for a 32-bit float.

1. If the number is negative, remove the negative before proceeding and remember to set the sign bit
to 1. Our number isn’t negative, so the sign bit s will be 0.
2. Separate the number into two parts: the whole number part and the decimal part. In our example,
the whole number part is 5 and the decimal part is 0.375.
3. Represent the whole number part in binary. Do this however you normally do. In this example,
that’s 101.
4. Represent the decimal part in binary. One way to do this is to multiply the decimal part by 2
continuously. Whenever the result is greater than 1, mark down a 1 and subtract 1 before continuing
to multiply. Otherwise, mark down a 0. End when you’ve reached 1. This is more easily illustrated
with an example:
0.375 * 2 = 0.75
0.75 * 2 = 1.5
0.5 * 2 =1
Thus 0.375 in binary is 011.
5. Put the two parts back together again, this time in binary decimal point form: 101.011.
6. Move the decimal point just behind the leftmost 1. Record how many decimal points you moved -
that’s the exponent of 2 you multiplied by.
In our example, after moving our decimal point to the correct position we would have 1.01011. We
moved the decimal point left by 2, so we multiplied 22 . Therefore we have that 101.011 = 1.01011 ∗ 22
This is basically scientific form, but for binary. Works in the other direction, too: 0.0000101 =
1.01 ∗ 2−5 (nothing to do with example, just a demonstration).
7. The power of 2 you multiplied by is the exponent, E, from back when we were turning binary into
decimal. The number in front is the mantissa, M . We can now understand the original number as
(−1)0 ∗ 22 ∗ 1.01011 = (−1)s ∗ 2E ∗ M . So E = 2 and M = 1.01011.
8. Now what remains is to do what we used to do to achieve E and M , but in reverse in order to get e
and f .
a. We know that E = e − bias.
We know that bias = 2(# exp bits)−1
− 1 = 28−1 − 1 = 127.
Therefore 2 = e − 127
Therefore → e = 2 + 127
e = 129
In binary, that’s 10000001.
b. Now that we have found e, we can check to see if the number is normalized. If e > 0, then it’s
normalized. If not, then it’s denormalized. In our case it’s normalized.
We know that for normalized values, M = 1 + f .
Therefore, 1.01011 = 1 + f
Therefore, f = 1.01011 − 1
f = 0.01011
F is already in binary form, so no conversion is needed. The fractional part will begin with
01011.

3
9. Now that we have the sign bit, the original unbiased exponent bits, and the fractional bits, we can
put it all together:
5.375 = 0 10000001 01011000000000000000000

Tada! 5.375 represented in IEEE-754 floating-point standard is 01000000101011000000000000000000.

Non-Standard Example: Represent -1.625 in binary with an imaginary floating-point format based off the
IEEE Standard that has 1 sign bit, 4 exponent bits, and 3 fractional bits.

As we said at the start, all steps apply no matter the number of exponent/fractional bits. Remember
that the bias changes, though, based on the number of exponent bits!

1. Find sign bit: Our number is negative, so the sign bit will be 1.

2. Separate: Whole part is 1, and the decimal part is 0.625.

3. Represent whole part in binary: 1 in binary is just 1.

4. Decimal part in binary:

0.625 * 2 = 1.25
0.25 * 2 = 0.5
0.5 * 2 =1
Thus 0.625 in binary is 101.

5. Put the parts back together: 1.101.

6. Move decimal point and convert to scientific form: It’s already in the correct form, so 1.101 ∗ 20 .

7. Find E and M: (−1)1 ∗ 20 ∗ 1.101 = (−1)s ∗ 2E ∗ M . Thus E = 0 and M = 1.101.

8. Reverse E and M to get e and f:

a. E = e − bias
bias = 2# exp bits−1 − 1 = 23 − 1 = 7
Thus 0 = e − 7
Thus e = 7
In binary, that’s 0111
b. e is greater than 0, so the number’s normalized.
For normalized floats: M = 1 + f
Thus 1.101 = 1 + f
Thus f = 0.101
The fractional part will begin with 101

9. Putting it all together:

-1.625 = 1 0111 101

Tada! -1.625 represented with our imaginary floating-point format is 10111101.

Denormalized Example (Non-Standard): Represent 0.0029296875 with an imaginary floating-point format

based off the IEEE standard that has 1 sign bit, 4 exponent bits, and 4 fractional bits

Now we look at representing a denormalized number. The reason we use a non-standard format is because

4
32-bit denormalized numbers are REALLY small and hard to work with without clogging absolutely ev-
erything up. We won’t easily know the number’s denormalized until we find e, so everything we do up to
that point will be the same as normal.

1. Find sign bit: Our number is positive, so the sign bit will be 0.

2. Separate: Whole part is 0, decimal part is 0.0029296875.

3. Whole part in binary: 0 in binary is 0.

4. Decimal part in binary:

0.0029296875 * 2 = 0.005859375
0.005859375 * 2 = 0.0.01171875
0.01171875 * 2 = 0.0234375
0.0234375 * 2 = 0.046875
0.046875 * 2 = 0.09375
0.09375 * 2 = 0.1875
0.1875 * 2 = 0.375
0.375 * 2 = 0.75
0.75 * 2 = 1.5
0.5 * 2 =1
Thus 0.0029296875 in binary is 0000000011.

5. Put the parts back together: 0.0000000011.

6. Move decimal point and convert to scientific: 0.0000000011 = 1.1 ∗ 2−9 .

7. Find E and M: 2−9 ∗ 1.1 = 2E ∗ M (ignoring sign bit for brevity). Thus E = -9 and M = 1.1.

8. Reverse E and M to get e and f:

a. E = e − bias
bias = 24−1 − 1 = 7
Thus −9 = e − 7
Thus e = −2

But now we’ve reached a problem: The exponent bits are unsigned and thus can’t be nega-
tive. What do we do? This is why denormalized exists.

When e ≤ 0 (as it is now), the number is denormalized. Under this condition, E now be-
comes 1 − bias.
Therefore, E = −6
The exponent bits will all be set to 0 (signifies denormalized).
b. Since E now equals -6 instead of the required -9, we must redo the fractional part to accommo-
date. Think of it this way - before we had 0.0000000011 = 1.1 ∗ 2−9 . Now we must get it in the
format x ∗ 2−6 .
Since 0.0000000011 = 1.1 ∗ 2−9 :
We can shift the decimal in the fractional part left by 3 (think of it like ”applying” -3 of the
powers of 2): 0.0000000011 = 0.0011 ∗ 2−6
Therefore M = 0.0011.
For a denormalized float: M = f (no implicit 1)
Thus f = 0.0011. The fractional bits will begin with 0011.

5
9. Finally, let’s put it all together:
0.0029296875 = 0 0000 0011
Tada! 0.0029296875 represented with our imaginary floating-point format is 000000011

All Of The Above Put Into A Garbage Compactor

Converting Binary to Float:

Normalized (exponent bits not all 0 or 1)

1. e = exponent bits interpreted as unsigned int

2. bias = 2(number of exponent bits)−1
−1
3. E = e - bias
4. f = fractional bits interpreted. b1 ∗ 2−1 + b2 ∗ 2−2 ∗ ...
5. M = f + 1
6. s = sign bit
7. Number = (−1)s ∗ 2E ∗ M

Denormalized (exponent bits all 0)

1. bias = 2(number of exponent bits)−1

−1
2. E = 1 - bias
3. M = fractional bits interpreted. b1 ∗ 2−1 + b2 ∗ 2−2 ∗ ...
4. s = sign bit
5. Number = (−1)s ∗ 2E ∗ M

Special (exponent bits all 1)

Fractional bits all 0: Number = infinity * (−1)s

Fractional bits not all 0: Number = NaN

Converting Float to Binary:

Normalized

Denormalized

A Level ZIMSEC Computer Science Notes
No ratings yet
A Level ZIMSEC Computer Science Notes
10 pages
Gr11.General.mathematics.M4
No ratings yet
Gr11.General.mathematics.M4
159 pages
Floating Point
No ratings yet
Floating Point
33 pages
Computer Architecture and Organization: Lecture 6: Floating Points
No ratings yet
Computer Architecture and Organization: Lecture 6: Floating Points
20 pages
Dr.Shoeb_ME212_Lec-3
No ratings yet
Dr.Shoeb_ME212_Lec-3
43 pages
Session 7 and 8
No ratings yet
Session 7 and 8
26 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Lec 06
No ratings yet
Lec 06
49 pages
Doc-20240730-Wa0013 240730 165456
No ratings yet
Doc-20240730-Wa0013 240730 165456
21 pages
13.3-Floating-Point-Numbers-Notes-2024
No ratings yet
13.3-Floating-Point-Numbers-Notes-2024
8 pages
L7_Floating Point Representation
No ratings yet
L7_Floating Point Representation
39 pages
Floating Point Arithmetic
100% (1)
Floating Point Arithmetic
30 pages
Ieee 754 Notes
No ratings yet
Ieee 754 Notes
1 page
Floating Point Number
No ratings yet
Floating Point Number
34 pages
STAT3201 Module 1. Set Theory
No ratings yet
STAT3201 Module 1. Set Theory
44 pages
3.1.3 Real Numbers and Normalised Floating-Point Numbers
100% (2)
3.1.3 Real Numbers and Normalised Floating-Point Numbers
24 pages
Data Representation
No ratings yet
Data Representation
58 pages
16-Algorithms For Floating Point Arithmetic Operations and Numericals-01-02-2024
No ratings yet
16-Algorithms For Floating Point Arithmetic Operations and Numericals-01-02-2024
21 pages
NAChapter-1
No ratings yet
NAChapter-1
24 pages
5 Data - Floating - Point v1
No ratings yet
5 Data - Floating - Point v1
25 pages
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
No ratings yet
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
42 pages
2.4 Floating Points
No ratings yet
2.4 Floating Points
36 pages
Data Storage in Computer System: BITS Pilani
No ratings yet
Data Storage in Computer System: BITS Pilani
30 pages
13.3 Real Numbers - Normalized Floating Point
No ratings yet
13.3 Real Numbers - Normalized Floating Point
18 pages
Electromagnetic Waves & Antennas Solutions - 2008
50% (2)
Electromagnetic Waves & Antennas Solutions - 2008
137 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
CH03-Data-II(2) (2)
No ratings yet
CH03-Data-II(2) (2)
31 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Circuit Symmetry PDF
No ratings yet
Circuit Symmetry PDF
14 pages
Fixed & Floating Point
No ratings yet
Fixed & Floating Point
31 pages
L4
No ratings yet
L4
29 pages
FLO-2D Reference Manual
100% (1)
FLO-2D Reference Manual
11 pages
CSC340 - HW3
No ratings yet
CSC340 - HW3
28 pages
Fixed and Floating Point Numbers: Dr. Ashish GUPTA Sense, Vit-Ap Ashish - Gupta@vitap - Ac.in
No ratings yet
Fixed and Floating Point Numbers: Dr. Ashish GUPTA Sense, Vit-Ap Ashish - Gupta@vitap - Ac.in
34 pages
Floating Point Representation: Reading: B&O 2.4
No ratings yet
Floating Point Representation: Reading: B&O 2.4
44 pages
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
No ratings yet
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
31 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
23 pages
Chapter 1 Physical Quantities and Units
100% (1)
Chapter 1 Physical Quantities and Units
37 pages
Binary Tutorial
No ratings yet
Binary Tutorial
10 pages
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
No ratings yet
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
16 pages
Lect4 Floats
No ratings yet
Lect4 Floats
64 pages
2 3-FloatingPtNumbers
No ratings yet
2 3-FloatingPtNumbers
44 pages
Decimal To Floating-Point Conversions: The Conversion Procedure
No ratings yet
Decimal To Floating-Point Conversions: The Conversion Procedure
5 pages
8th Maths
No ratings yet
8th Maths
12 pages
Q3 Statistics and Probability Week 3 - 4
No ratings yet
Q3 Statistics and Probability Week 3 - 4
21 pages
IEEE FP Representation
No ratings yet
IEEE FP Representation
3 pages
#3 - Floating Point
No ratings yet
#3 - Floating Point
38 pages
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
No ratings yet
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
34 pages
Week8 Slides
No ratings yet
Week8 Slides
43 pages
2D-BP Paper
No ratings yet
2D-BP Paper
17 pages
Floating Points
No ratings yet
Floating Points
31 pages
Decimal To Floating Point
No ratings yet
Decimal To Floating Point
2 pages
floating-point-numbers-237045407-237045407
No ratings yet
floating-point-numbers-237045407-237045407
20 pages
8th Arithmetic and Geometric Sequences
No ratings yet
8th Arithmetic and Geometric Sequences
14 pages
Intermediate Mathematical Challenge: Instructions
No ratings yet
Intermediate Mathematical Challenge: Instructions
4 pages
Floating Point
No ratings yet
Floating Point
13 pages
4.4_1 New Floating Point.pptx
No ratings yet
4.4_1 New Floating Point.pptx
22 pages
Floating Point Arithmetic Class
No ratings yet
Floating Point Arithmetic Class
24 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Complete Floating Point (Blog)
No ratings yet
Complete Floating Point (Blog)
18 pages
Mathematics For Economics and Finance: Answer Key To Final Exam
No ratings yet
Mathematics For Economics and Finance: Answer Key To Final Exam
14 pages
ARCh Presentation1
No ratings yet
ARCh Presentation1
12 pages
430 4 3 Mathematics (Basic)
No ratings yet
430 4 3 Mathematics (Basic)
19 pages
Non Western Mathematics
No ratings yet
Non Western Mathematics
15 pages
Shrinkage Parameter Selection Via Modified Cross Validation Approach For Ridge Regression Model
No ratings yet
Shrinkage Parameter Selection Via Modified Cross Validation Approach For Ridge Regression Model
10 pages
Fixed Point and Floating Point Number Representations
No ratings yet
Fixed Point and Floating Point Number Representations
7 pages
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
No ratings yet
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
51 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
31 pages
Floating Point Representation - M.eng Term Paper
No ratings yet
Floating Point Representation - M.eng Term Paper
6 pages
The Conversion Procedure (Decimal To Floating Point)
No ratings yet
The Conversion Procedure (Decimal To Floating Point)
8 pages
Module2.1 of nothing
No ratings yet
Module2.1 of nothing
7 pages
Download Full A Walk Through Combinatorics An Introduction to Enumeration and Graph Theory 4th Edition Miklós Bóna PDF All Chapters
100% (2)
Download Full A Walk Through Combinatorics An Introduction to Enumeration and Graph Theory 4th Edition Miklós Bóna PDF All Chapters
55 pages
Coordinate Geometry: Get To The Point!
No ratings yet
Coordinate Geometry: Get To The Point!
15 pages
IEEE 754 Floating Point Notes
No ratings yet
IEEE 754 Floating Point Notes
4 pages
Group Method of Data Handling
No ratings yet
Group Method of Data Handling
6 pages
Math QT
No ratings yet
Math QT
6 pages
Mrjacksonmaths Foundation Non Calc Paper B Answers
No ratings yet
Mrjacksonmaths Foundation Non Calc Paper B Answers
6 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
MathEcon18 FinalExam Solution
No ratings yet
MathEcon18 FinalExam Solution
13 pages
What Are The Geometric Tools
No ratings yet
What Are The Geometric Tools
3 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Sampling With Replacement - Definition
No ratings yet
Sampling With Replacement - Definition
1 page
IM Unit Overview
No ratings yet
IM Unit Overview
1 page
CMA Foundation Maths PDF
No ratings yet
CMA Foundation Maths PDF
12 pages
G 8 Maths Test 1
No ratings yet
G 8 Maths Test 1
1 page
Tabel Poison
No ratings yet
Tabel Poison
2 pages
Ford GD&T Pocket Guide
100% (10)
Ford GD&T Pocket Guide
33 pages
Master Division & Fractions
From Everand
Master Division & Fractions
Mourad Boufadene
No ratings yet
Master Fracions Addition, Subtraction And Multiplication: Math Childern Book
From Everand
Master Fracions Addition, Subtraction And Multiplication: Math Childern Book
Mourad Boufadene
No ratings yet
Easy Arithmetics
From Everand
Easy Arithmetics
Dilip Kr. Bandyopadhyay
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Floating Point

Uploaded by

Floating Point

Uploaded by

How To Floating Point

Why like this??

How to turn one of these into a decimal number

There are 3 scenarios, determined by the contents of the exponent bits.

If exponent section has mix of zeroes and ones (normalized)

Example: 0 0111111 11000000000000000000000

2. Find the bias. bias = 2(number of exponent bits)−1

4. Find the value of the fractional bits. We will call this f .

Tada! The binary floating-point number 0011111111000000000000000000000 equals 1.75.

If exponent section is all 0 (denormalized)

Example: 0 00000000 11000000000000000000000

2. The biased exponent E is now 1 − bias. In this example, E = 1 − 127 = −126.

If exponent section is all 1 (special)

Tada! 5.375 represented in IEEE-754 floating-point standard is 01000000101011000000000000000000.

2. Separate: Whole part is 1, and the decimal part is 0.625.

3. Represent whole part in binary: 1 in binary is just 1.

4. Decimal part in binary:

5. Put the parts back together: 1.101.

7. Find E and M: (−1)1 ∗ 20 ∗ 1.101 = (−1)s ∗ 2E ∗ M . Thus E = 0 and M = 1.101.

8. Reverse E and M to get e and f:

9. Putting it all together:

Tada! -1.625 represented with our imaginary floating-point format is 10111101.

Denormalized Example (Non-Standard): Represent 0.0029296875 with an imaginary floating-point format

2. Separate: Whole part is 0, decimal part is 0.0029296875.

3. Whole part in binary: 0 in binary is 0.

4. Decimal part in binary:

5. Put the parts back together: 0.0000000011.

6. Move decimal point and convert to scientific: 0.0000000011 = 1.1 ∗ 2−9 .

8. Reverse E and M to get e and f:

All Of The Above Put Into A Garbage Compactor

Converting Binary to Float:

Normalized (exponent bits not all 0 or 1)

1. e = exponent bits interpreted as unsigned int

Denormalized (exponent bits all 0)

1. bias = 2(number of exponent bits)−1

Special (exponent bits all 1)

 Fractional bits all 0: Number = infinity * (−1)s

Converting Float to Binary:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Fractional bits all 0: Number = infinity * (−1)s