0% found this document useful (0 votes)

99 views5 pages

Floating Point Numbers

Floating point number representation in DSP

Uploaded by

Dramane Bonkoungou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

99 views5 pages

Floating Point Numbers

Floating point number representation in DSP

Uploaded by

Dramane Bonkoungou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Floating-Point Numbers :: Data Types and Scaling (Simulink Fixed Po...

jar:file:///C:/Program%20Files/MATLAB/R2011a/help/toolbox/fixpoint/...

Floating-Point Numbers
On this page
About Floating-Point Numbers
Scientific Notation
The IEEE Format
Range and Precision
Exceptional Arithmetic

About Floating-Point Numbers

Fixed-point numbers are limited in that they cannot simultaneously represent very large or very small numbers using a
reasonable word size. This limitation can be overcome by using scientific notation. With scientific notation, you can
dynamically place the binary point at a convenient location and use powers of the binary to keep track of that location.
Thus, you can represent a range of very large and very small numbers with only a few digits.
e

You can represent any binary floating-point number in scientific notation form as f 2 , where f is the fraction (or
mantissa), 2 is the radix or base (binary in this case), and e is the exponent of the radix. The radix is always a positive
number, while f and e can be positive or negative.
When performing arithmetic operations, floating-point hardware must take into account that the sign, exponent, and
fraction are all encoded within the same binary word. This results in complex logic circuits when compared with the circuits
for binary fixed-point operations.
The Simulink Fixed Point software supports single-precision and double-precision floating-point numbers as defined by the
IEEE Standard 754. Additionally, a nonstandard IEEE-style number is supported.
Back to Top

Scientific Notation
A direct analogy exists between scientific notation and radix point notation. For example, scientific notation using five
decimal digits for the fraction would take the form

where d = 0,...,9 and p is an integer of unrestricted range.

Radix point notation using five bits for the fraction is the same except for the number base

where b = 0,1 and q is an integer of unrestricted range.

For fixed-point numbers, the exponent is fixed but there is no reason why the binary point must be contiguous with the
fraction. For example, a word consisting of three unsigned bits is usually represented in scientific notation in one of these
four ways.

If the exponent were greater than 0 or less than -3, then the representation would involve lots of zeros.

1 of 5

11/26/2014 11:52 AM

Floating-Point Numbers :: Data Types and Scaling (Simulink Fixed Po...

jar:file:///C:/Program%20Files/MATLAB/R2011a/help/toolbox/fixpoint/...

These extra zeros never change to ones, however, so they don't show up in the hardware. Furthermore, unlike
floating-point exponents, a fixed-point exponent never shows up in the hardware, so fixed-point exponents are not limited
by a finite number of bits.
Note Restricting the binary point to being contiguous with the fraction is unnecessary; the Simulink Fixed Point
software allows you to extend the binary point to any arbitrary location.

The IEEE Format

The IEEE Standard 754 has been widely adopted, and is used with virtually all floating-point processors and arithmetic
coprocessorswith the notable exception of many DSP floating-point processors.
Among other things, this standard specifies four floating-point number formats, of which singles and doubles are the most
widely used. Each format contains three components: a sign bit, a fraction field, and an exponent field. These
components, as well as the specific formats for singles and doubles, are discussed in the sections that follow.

The Sign Bit

While two's complement is the preferred representation for signed fixed-point numbers, IEEE floating-point numbers use a
sign/magnitude representation, where the sign bit is explicitly included in the word. Using this representation, a sign bit of
0 represents a positive number and a sign bit of 1 represents a negative number.

The Fraction Field

In general, floating-point numbers can be represented in many different ways by shifting the number to the left or right of
the binary point and decreasing or increasing the exponent of the binary by a corresponding amount.
To simplify operations on these numbers, they are normalized in the IEEE format. A normalized binary number has a
fraction of the form 1.f where f has a fixed size for a given data type. Since the leftmost fraction bit is always a 1, it is
unnecessary to store this bit and is therefore implicit (or hidden). Thus, an n-bit fraction stores an n+1-bit number. The
IEEE format also supports denormalized numbers, which have a fraction of the form 0.f. Normalized and denormalized
formats are discussed in more detail in the next section.

The Exponent Field

In the IEEE format, exponent representations are biased. This means a fixed value (the bias) is subtracted from the field
to get the true exponent value. For example, if the exponent field is 8 bits, then the numbers 0 through 255 are
represented, and there is a bias of 127. Note that some values of the exponent are reserved for flagging Inf (infinity),
NaN (not-a-number), and denormalized numbers, so the true exponent values range from -126 to 127. See the sections Inf
and NaN.

Single-Precision Format
The IEEE single-precision floating-point format is a 32-bit word divided into a 1-bit sign indicator s, an 8-bit biased
exponent e, and a 23-bit fraction f. For more information, see The Sign Bit, The Exponent Field, and The Fraction Field. A
representation of this format is given below.

The relationship between this format and the representation of real numbers is given by

2 of 5

11/26/2014 11:52 AM

Floating-Point Numbers :: Data Types and Scaling (Simulink Fixed Po...

jar:file:///C:/Program%20Files/MATLAB/R2011a/help/toolbox/fixpoint/...

Exceptional Arithmetic discusses denormalized values.

Double-Precision Format
The IEEE double-precision floating-point format is a 64-bit word divided into a 1-bit sign indicator s, an 11-bit biased
exponent e, and a 52-bit fraction f.For more information, see The Sign Bit, The Exponent Field, and The Fraction Field. A
representation of this format is shown in the following figure.

The relationship between this format and the representation of real numbers is given by

Exceptional Arithmetic discusses denormalized values.

Range and Precision

The range of a number gives the limits of the representation while the precision gives the distance between successive
numbers in the representation. The range and precision of an IEEE floating-point number depend on the specific format.

Range
The range of representable numbers for an IEEE floating-point number with f bits allocated for the fraction, e bits allocated
(e-1)
for the exponent, and the bias of e given by bias = 2
1 is given below.

where
Normalized positive numbers are defined within the range 2

(1bias)

Normalized negative numbers are defined within the range 2

-f

Positive numbers greater than (22 )2

(1bias)

bias

-f

to (22 )2

(1bias)

bias

-f

to (22 )2

bias

.
-f

and negative numbers greater than (22 )2

Positive numbers less than 2

and negative numbers less than 2
denormalized numbers.
Zero is given by a special bit pattern, where e = 0 and f = 0.

(1bias)

bias

are overflows.

are either underflows or

Overflows and underflows result from exceptional arithmetic conditions. Floating-point numbers outside the defined range
are always mapped to Inf.
Note You can use the MATLAB commands realmin and realmax to determine the dynamic range of

3 of 5

11/26/2014 11:52 AM

Floating-Point Numbers :: Data Types and Scaling (Simulink Fixed Po...

jar:file:///C:/Program%20Files/MATLAB/R2011a/help/toolbox/fixpoint/...

double-precision floating-point values for your computer.

Precision
Because of a finite word size, a floating-point number is only an approximation of the "true" value. Therefore, it is important
to have an understanding of the precision (or accuracy) of a floating-point result. In general, a value v with an accuracy q
is specified by vq. For IEEE floating-point numbers,
s

v = (1) (2

ebias

)(1.f)

and
f

q = 2 2

ebias

Thus, the precision is associated with the number of bits in the fraction field.
Note In the MATLAB software, floating-point relative accuracy is given by the command eps, which returns the
distance from 1.0 to the next larger floating-point number. For a computer that supports the IEEE Standard 754,
-52
-16
eps = 2 or 2.22045 10 .

Floating-Point Data Type Parameters

The high and low limits, exponent bias, and precision for the supported floating-point data types are given in the following
table.

Data Type

Low Limit

Single

Double

Nonstandard

-126

-1022

High Limit

-38

2 10

(1 - bias)

2
-308

128

3 10

1024

2 10
-f

(2 - 2 ) 2

308

bias

Exponent Bias

Precision

127

1023

(e - 1)

-1

-23

-52

10
10

-7

-16

-f

Because of the sign/magnitude representation of floating-point numbers, there are two representations of zero, one
positive and one negative. For both representations e = 0 and f.0 = 0.0.
Back to Top

Exceptional Arithmetic
In addition to specifying a floating-point format, the IEEE Standard 754 specifies practices and procedures so that
predictable results are produced independently of the hardware platform. Specifically, denormalized numbers, Inf, and
NaN are defined to deal with exceptional arithmetic (underflow and overflow).
If an underflow or overflow is handled as Inf or NaN, then significant processor overhead is required to deal with this
exception. Although the IEEE Standard 754 specifies practices and procedures to deal with exceptional arithmetic
conditions in a consistent manner, microprocessor manufacturers might handle these conditions in ways that depart from
the standard. Some of the alternative approaches, such as saturation and wrapping, are discussed in Arithmetic
Operations.

Denormalized Numbers
Denormalized numbers are used to handle cases of exponent underflow. When the exponent of the result is too small
(i.e., a negative exponent with too large a magnitude), the result is denormalized by right-shifting the fraction and leaving
the exponent at its minimum value. The use of denormalized numbers is also referred to as gradual underflow. Without
denormalized numbers, the gap between the smallest representable nonzero number and zero is much wider than the gap
between the smallest representable nonzero number and the next larger number. Gradual underflow fills that gap and

4 of 5

11/26/2014 11:52 AM

Floating-Point Numbers :: Data Types and Scaling (Simulink Fixed Po...

jar:file:///C:/Program%20Files/MATLAB/R2011a/help/toolbox/fixpoint/...

reduces the impact of exponent underflow to a level comparable with roundoff among the normalized numbers. Thus,
denormalized numbers provide extended range for small numbers at the expense of precision.

Inf
Arithmetic involving Inf (infinity) is treated as the limiting case of real arithmetic, with infinite values defined as those
outside the range of representable numbers, or (representable numbers) < . With the exception of the special
cases discussed below (NaN), any arithmetic operation involving Inf yields Inf. Inf is represented by the largest biased
exponent allowed by the format and a fraction of zero.

NaN
A NaN (not-a-number) is a symbolic entity encoded in floating-point format. There are two types of NaN: signaling and
quiet. A signaling NaN signals an invalid operation exception. A quiet NaN propagates through almost every arithmetic
operation without signaling an exception. The following operations result in a NaN: , +, 0, 0/0, and /.
Both types of NaN are represented by the largest biased exponent allowed by the format and a fraction that is nonzero.
The bit pattern for a quiet NaN is given by 0.f where the most significant number in f must be a one, while the bit pattern for
a signaling NaN is given by 0.f where the most significant number in f must be zero and at least one of the remaining
numbers must be nonzero.
Back to Top
Was this topic helpful?

Fixed-Point Numbers

Yes

Arithmetic Operations

1984-2011 The MathWorks, Inc. Terms of Use Patents Trademarks Acknowledgments

5 of 5

11/26/2014 11:52 AM

Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
No ratings yet
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
42 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Lect4 Floats
No ratings yet
Lect4 Floats
64 pages
L2-Variables and Floating Point Number System
No ratings yet
L2-Variables and Floating Point Number System
38 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
31 pages
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
No ratings yet
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
34 pages
3. Floating_Point_Number
No ratings yet
3. Floating_Point_Number
36 pages
Fixed Point and Floating Point Number Representations
No ratings yet
Fixed Point and Floating Point Number Representations
5 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
COA UNIT-III PPTs Dr.G.Bhaskar ECE
No ratings yet
COA UNIT-III PPTs Dr.G.Bhaskar ECE
64 pages
Floating Point & fixed point Representation_BCA II
No ratings yet
Floating Point & fixed point Representation_BCA II
24 pages
Finite Word Length Effects
No ratings yet
Finite Word Length Effects
31 pages
5 Data - Floating - Point v1
No ratings yet
5 Data - Floating - Point v1
25 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Lec 06
No ratings yet
Lec 06
49 pages
Computer Architecture & Organization Unit 2
No ratings yet
Computer Architecture & Organization Unit 2
24 pages
Floating Point Arithmetic
100% (1)
Floating Point Arithmetic
30 pages
Floating Point Numbers: CS031 September 12, 2011
No ratings yet
Floating Point Numbers: CS031 September 12, 2011
22 pages
CH03-Data-II(2) (2)
No ratings yet
CH03-Data-II(2) (2)
31 pages
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
No ratings yet
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
51 pages
Chapter2 2.5
No ratings yet
Chapter2 2.5
34 pages
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
No ratings yet
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
32 pages
Fixed Point and Floating Point Number Representations
No ratings yet
Fixed Point and Floating Point Number Representations
7 pages
Cacc
No ratings yet
Cacc
106 pages
L4
No ratings yet
L4
29 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
3 pages
Lecture4
No ratings yet
Lecture4
154 pages
Number Systems - Data Representation (Numbers)
No ratings yet
Number Systems - Data Representation (Numbers)
27 pages
Integer Representation
No ratings yet
Integer Representation
34 pages
COMP0068 Lecture10 High Level Data Types
No ratings yet
COMP0068 Lecture10 High Level Data Types
25 pages
ELEC2041 Microprocessors and Interfacing Lectures 19: Floating Point Number Representation - I
No ratings yet
ELEC2041 Microprocessors and Interfacing Lectures 19: Floating Point Number Representation - I
24 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
30 pages
Cosc 2150: Computer Organization: Chapter 9, Part 3 Floating Point Numbers
No ratings yet
Cosc 2150: Computer Organization: Chapter 9, Part 3 Floating Point Numbers
39 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
30 pages
Design & Simulation of 32-Bit Floating Point Alu
No ratings yet
Design & Simulation of 32-Bit Floating Point Alu
3 pages
SW Lab 3 Fixed Point Simulation EE 462
No ratings yet
SW Lab 3 Fixed Point Simulation EE 462
7 pages
4-Floating-Point-inclass
No ratings yet
4-Floating-Point-inclass
33 pages
Module 1 Data Rep
No ratings yet
Module 1 Data Rep
14 pages
Floating Point 6up
No ratings yet
Floating Point 6up
7 pages
This Unit: Arithmetic and ALU Design Floating Point Arithmetic
No ratings yet
This Unit: Arithmetic and ALU Design Floating Point Arithmetic
8 pages
Unit 2
No ratings yet
Unit 2
16 pages
3-EED220 Lecture 3
No ratings yet
3-EED220 Lecture 3
22 pages
Floating Point
No ratings yet
Floating Point
33 pages
ARCh Presentation1
No ratings yet
ARCh Presentation1
12 pages
IEEE 754 Floating Point Notes
No ratings yet
IEEE 754 Floating Point Notes
4 pages
Module 2 - PART D Floating
No ratings yet
Module 2 - PART D Floating
30 pages
01 DigitalNumericalFormats
No ratings yet
01 DigitalNumericalFormats
27 pages
49-139633911877-79
No ratings yet
49-139633911877-79
3 pages
Fixed & Floating Point
No ratings yet
Fixed & Floating Point
31 pages
Unit 5_share
No ratings yet
Unit 5_share
38 pages
Number Representation
No ratings yet
Number Representation
7 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
8 pages
Computer Architecture and Organization: Lecture 6: Floating Points
No ratings yet
Computer Architecture and Organization: Lecture 6: Floating Points
20 pages
Architetture Dei Calcolatori 2425 079 092_e00eb5fe056df070c4c89d6aa133367a
No ratings yet
Architetture Dei Calcolatori 2425 079 092_e00eb5fe056df070c4c89d6aa133367a
14 pages
ML System Optimization Lecture 11 Quantization
No ratings yet
ML System Optimization Lecture 11 Quantization
150 pages
FIXED and FLOAT
No ratings yet
FIXED and FLOAT
8 pages
ENSC254 - Floating Point Computation
No ratings yet
ENSC254 - Floating Point Computation
29 pages
Floating Point Representation: Reading: B&O 2.4
No ratings yet
Floating Point Representation: Reading: B&O 2.4
44 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Spectral Analysis With RTL-SDR Radio
No ratings yet
Spectral Analysis With RTL-SDR Radio
2 pages
HUAWEI E397 Datasheet
No ratings yet
HUAWEI E397 Datasheet
2 pages
Siemens xt75
No ratings yet
Siemens xt75
2 pages
Fresnel Zones and Their Effect
No ratings yet
Fresnel Zones and Their Effect
3 pages
SIG Compendium 01 Part2
No ratings yet
SIG Compendium 01 Part2
7 pages
Dtc-340 Rfxpert: RF Monitoring and Analysis Software
No ratings yet
Dtc-340 Rfxpert: RF Monitoring and Analysis Software
1 page
Nokia Fbus PC Connexion
No ratings yet
Nokia Fbus PC Connexion
4 pages
Fantasi Usb-2 Asi/Sdi Input+Output Adapter: Features
No ratings yet
Fantasi Usb-2 Asi/Sdi Input+Output Adapter: Features
1 page
DTC-320 Stream Expert
No ratings yet
DTC-320 Stream Expert
1 page
DVBControl Brochure
No ratings yet
DVBControl Brochure
32 pages
DVBAnalyzer Price
100% (1)
DVBAnalyzer Price
2 pages
DVB-T and Splat
No ratings yet
DVB-T and Splat
6 pages
TSReader Hardware Support
No ratings yet
TSReader Hardware Support
24 pages
TSReader Software Support
No ratings yet
TSReader Software Support
4 pages
Overview of DVB-T
No ratings yet
Overview of DVB-T
19 pages
DVB PDH SDH Adapter
No ratings yet
DVB PDH SDH Adapter
3 pages
Nevion DVB Ts Mux
No ratings yet
Nevion DVB Ts Mux
4 pages
An Architecture For The Delivery of DVB Services Over IP Networks
No ratings yet
An Architecture For The Delivery of DVB Services Over IP Networks
9 pages
Matlab Modulation Functions
No ratings yet
Matlab Modulation Functions
2 pages
Audacity Settings For Recording
No ratings yet
Audacity Settings For Recording
4 pages
Prelim PPRS
No ratings yet
Prelim PPRS
18 pages
Henry
No ratings yet
Henry
19 pages
1.2 Number System Student Lecture Notes
No ratings yet
1.2 Number System Student Lecture Notes
28 pages
Lucky 13 Challenge
No ratings yet
Lucky 13 Challenge
1 page
Slide #2
No ratings yet
Slide #2
67 pages
Fifth Grade
100% (1)
Fifth Grade
3 pages
Single Character Gairaigo: Furigana
No ratings yet
Single Character Gairaigo: Furigana
1 page
Chapter_02 Number Systems Part1
No ratings yet
Chapter_02 Number Systems Part1
71 pages
Multipliers PDF
No ratings yet
Multipliers PDF
39 pages
Blue Place Value and Value Math Worksheet in Simple Style
No ratings yet
Blue Place Value and Value Math Worksheet in Simple Style
6 pages
Learning Task 3 Counting Numbers
No ratings yet
Learning Task 3 Counting Numbers
5 pages
BM Q3 Module1-Fraction - Decimal.Percent
No ratings yet
BM Q3 Module1-Fraction - Decimal.Percent
10 pages
Arabic or Decimal Numbers To Roman Numbers Table
No ratings yet
Arabic or Decimal Numbers To Roman Numbers Table
10 pages
UEM Sol To Exerc Chap 020 PDF
No ratings yet
UEM Sol To Exerc Chap 020 PDF
24 pages
Number System
No ratings yet
Number System
8 pages
Maths VIII Holiday Homework
No ratings yet
Maths VIII Holiday Homework
13 pages
Multiplicative Inverse Worksheet Devisible Rule Caca
No ratings yet
Multiplicative Inverse Worksheet Devisible Rule Caca
4 pages
Operation Scientific Notation
No ratings yet
Operation Scientific Notation
20 pages
Unit 7 worksheet 1 (1)
No ratings yet
Unit 7 worksheet 1 (1)
3 pages
5th GR Cumulative 4
No ratings yet
5th GR Cumulative 4
2 pages
Digital Logic Design: Instructor: Yahya Ali Khan Email: Yahya - Ali@se - Uol.edu - PK
No ratings yet
Digital Logic Design: Instructor: Yahya Ali Khan Email: Yahya - Ali@se - Uol.edu - PK
46 pages
Mathematics: Performing Operations On Rational Numbers
No ratings yet
Mathematics: Performing Operations On Rational Numbers
17 pages
Ordering Fractions, Decimals and Percentages: Fraction Decimal Percentage
No ratings yet
Ordering Fractions, Decimals and Percentages: Fraction Decimal Percentage
13 pages
Maths
No ratings yet
Maths
11 pages
Number Line
No ratings yet
Number Line
10 pages
Comparing and Ordering Decimals
No ratings yet
Comparing and Ordering Decimals
15 pages
4th Simulation MATH
No ratings yet
4th Simulation MATH
6 pages
Assignment1 PDF
No ratings yet
Assignment1 PDF
7 pages
Task 1 - Math LP
No ratings yet
Task 1 - Math LP
11 pages
TD 1 Real Numbers
No ratings yet
TD 1 Real Numbers
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Floating Point Numbers

Uploaded by

Floating Point Numbers

Uploaded by

Floating-Point Numbers :: Data Types and Scaling (Simulink Fixed Po...

About Floating-Point Numbers

where d = 0,...,9 and p is an integer of unrestricted range.

where b = 0,1 and q is an integer of unrestricted range.

Floating-Point Numbers :: Data Types and Scaling (Simulink Fixed Po...

The IEEE Format

The Sign Bit

The Fraction Field

The Exponent Field

Floating-Point Numbers :: Data Types and Scaling (Simulink Fixed Po...

Exceptional Arithmetic discusses denormalized values.

Exceptional Arithmetic discusses denormalized values.

Range and Precision

Normalized negative numbers are defined within the range 2

Positive numbers greater than (22 )2

and negative numbers greater than (22 )2

Positive numbers less than 2

are either underflows or

Floating-Point Numbers :: Data Types and Scaling (Simulink Fixed Po...

double-precision floating-point values for your computer.

Floating-Point Data Type Parameters

Floating-Point Numbers :: Data Types and Scaling (Simulink Fixed Po...

1984-2011 The MathWorks, Inc. Terms of Use Patents Trademarks Acknowledgments

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.