0% found this document useful (0 votes)

22 views2 pages

Thompson 2015

Uploaded by

anuescapist

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views2 pages

Thompson 2015

Uploaded by

anuescapist

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

for this manuscript

An IEEE 754 Double-Precision Floating-Point

Multiplier for Denormalized and Normalized
Floating-Point Numbers
Ross Thompson James E. Stine
Air Force Research Laboratory Oklahoma State University
525 Brooks Road VLSI Computer Architecture Research Group
Rome, NY 13441 USA Department of Electrical and Computer Engineering
{stephen.thompson.37}@us.af.mil Stillwater, OK 74078 USA
{james.stine}@okstate.edu

Abstract—This paper discusses an optimized double-precision intermediate product, such that all modes reduce to truncation
floating-point multiplier that can handle both denormalized at the rounding bit position. The injection constant (INJ) is
and normalized IEEE 754 floating-point numbers. Discussions defined as:
of the optimizations are given and compared versus similar ⎧
implementations, however, the main objective is keeping com- ⎨ 0 : RZ
pliant for denormalized IEEE 754 floating-point numbers while IN J = 2−53 : RN E
still maintaining high performance operations for normalized ⎩ 2−52 − 2−104 : RI
numbers.
This only works when the product is in the range [1, 2), but
the mantissa can overflow into the range [2, 4) requiring a
I. PAPER injection correction (INJCOR) to be added into the product.
Although multiplication is straight-forward to implement, The ES algorithm provides little advice about computing the
there is still a need to preserve precision for denormal numbers. sticky, round, and carry[0] bits. Finding round and carry[0]
One particular method to handle denormalized IEEE-754 num- requires computing the carry propagate chain through all bits
bers uses injection-based rounding within its architecture [1]. in the lower path, −1 to −52, however, only the sum bit needs
While this technique in [1] is helpful for denormalized num- be generated at bit −1. Using a technique from [2], the sticky
bers, it does not completely address denormal operands and can be computed directly from the carry save format.
outputs with correct rounding within the IEEE 754 standard. To account for carry[0] carrying into the upper data path
The major contribution of this paper is that it utilizes the and the INJCOR, the technique in the ES paper uses a
injection-based rounding for multiplication and applies it cor- special increment decision, Tinc (INC in the ES paper). The
rectly to IEEE 754 normalized and denormalized numbers. equation definition in the ES paper is correct, however, we
believe there is a small mistake in the implementation logic
A. Multiplier Design I within [1]. Tinc should increment when the majority of round
(R), carry[0] (C), and Lx are high and the mantissa has
The first design produces a fully compliant IEEE 754 dou- overflowed Y 0[52] = 1 and the rounding mode is RNE. In
ble precision binary floating-point multiplier that is primarily the case of Lx = 1, R = 1, and C = 0, the logic incorrectly
designed around an injection- based rounding algorithm [1]. sets T inc = 0. The following equation correctly implements
To further improve the design additional hardware is added Tinc.
to fully support denormal operands and results. After partial ⎧
⎪ (C · R · RN E) + (L · R · RN E)
product reduction, the carry save pair is added with the ⎨ + ((L + C) · RI)
injection constant using a row of carry save adders and a T inc =
carry propagate adder. Concurrently, rounding is computed ⎩ + (C · L · (RZ + RN E))
⎪ : Y 0[53]
using the Even and Seidel (ES) fixed position injection based L·C : Y 0[53]
algorithm [1]. Finally, to support a second stage of rounding and the inexact
The Even/Seidel rounding algorithm (ES), as shown in Fig- flag, Tinx is computed so that it indicates if the current
ure 1, works by reducing the four rounding modes (RM); round rounding is not the infinitely precise answer.
⎧
to zero (RZ), round to nearest even (RNE), round up (RPI), ⎪ R+S : RZ · Y 0[53]
and round down (RNI) to a single truncation operation [1]. ⎪
⎨
R + S + (Lx ⊕ C[52]) : RZ · Y 0[53]
The process has two steps: first, RPI and RNI are reduced T inx =
⎪ R+S
⎪ : (RI + RN E) · Y 0[53]
into one of two modes, either RZ or a new mode round to ⎩
infinity (RI) using the product’s sign bit and the direction of R + S + (Lx ⊕ C[52]) : (RI + RN E) · Y 0[53]
the rounding. When the sign matches the rounding direction,
RPI and positive, or RNI and negative the rounding mode is When the exponent is less than emin (0), the product is no
RI, otherwise RZ. Second, an injection constant is added to the longer a valid IEEE 754 number. To correct this problem the

U.S. Government work not protected by U.S. copyright 62 ASAP 2015

Table of Contents
for this manuscript

Sum[53:0] Carry[53:0] Sum[−1:−52] Carry[−1:−52] Inj[−1:−52] A[52:0] B[52:0]

HA Array Carry/Save Adder shift[7:0] Partial Product Generation

[53:0] [53:0]
Sum Carry RM
HA Array (IP + INJ) Carry out [−1:−52] [−1:−52] Partial Product
[52:0] [52:0] Lx Round Reduction
Sticky Gen
S C Calculate Sum Carry
Injection
[53:−52] [53:−52]
Compound Adder (52 bit) C[0] Round Sticky A_exponent shifter shifter
RM RM 11 11 left/right left/right
B_exponent
Sum[53:−107] Carry(53:−107)
RM
INJ[−1:−107]
fix L fix L
(ovf) (novf) Carry Save Adder (3:2)
Increment
Inexact
[53:−107] [53:−107]
Y0[53] Y1[1] Y0[11] Exponent
Compound Adder and Injection Rounding
S + C + INJ + 1 S + C + INJ Y1 Y0
Y1 Y0
[51:−1] [51:−1]
Y1[53] Y0[53]
Y0[53:1]
Tinc Tinx Y0[53]
Y1[53:1] (inc) Y1[53] Result Selection
1 0 0 1

Shift Shift Z1[0] Z0[0]

1 0 [52:0]
Z1[52:1] Z0[52:1]
1 0 Pack, Exceptions, and Special Values
final_p[0]
final_p[52:1] X[63:0] Exceptions[3:0]
final_p[52:0]

Fig. 1: Details of the first step of the rounding hardware. This Fig. 2: Block diagram of the mantissa path when shifting the
follows the design of Even and Seidel, but generates signal carry save redundant from before adding the injection constant.
Tinx for the second stage of rounding. Also, the 106 bit sum Note: the barrel shifters in the packing and unpacking modules
is segmented into two carry/save parts [53:0] and [-1:-52]. have been removed.

mantissa is shifted right such that the exponent is increased An additional optimization can be made now that the shift-
to emin. However, this presents a problem for the rounding ing occurs before adding the carry-save intermediate product.
function as the rounding position has now moved resulting In the original ES method, the mantissas need to be normalized
in incorrectly rounded product. By applying multistep gradual to ensure the rounding position is known. Moreover, this
rounding, a second stage of rounding ensures the proper new method eliminates that requirement. The unpacking pre-
position is rounded [3]. Additionally the input to the ES normalization step can be removed if the shifter is modified
method requires the inputs be normalized. Denormal operands to shift in both directions. Doing so, dramatically reduces the
are detected using a leading zero counter and then shifted to unpacking delay by eliminating a shifter and removing the
normalize. This adds delay to the critical path which will be leading zero detection from the critical path.
reduced in the next section.
C. Conclusion
B. Multiplier Design II: Fast Denormal Rounding Two multipliers have been implemented in RTL-based Ver-
Multistep gradual rounding is expensive in terms of delay ilog and verified for compliance with the IEEE 754 standard.
and area as the sticky bit must be computed twice and an Both versions, the multistep and fast architectures, have been
extra +1 adder is needed. To work around the issue, the fully verified against SoftFloat [4]. Both designs are based
injection based rounding has to be modified to support adding upon the fixed position injection constant rounding method
the injection at variable positions. A better solution uses a presented by Even and Seidel, but extend their work to provide
fixed injection constant, but shifts the intermediate product support for rounding denormal numbers [1].
such that the round bit aligns to the injection constant before
adding. This is achieved by moving the barrel shifter from R EFERENCES
the final packing circuit and inserting it between the partial [1] G. Even and P.-M. Seidel, “A comparison of three rounding algorithms
product reduction and final carry-propagate addition. Shifting for IEEE floating-point multiplication,” IEEE Transactions on Comput-
the carry/save intermediate product effectively does two things: ers, vol. 49, no. 7, pp. 638–650, Jul 2000.
first, it denormalizes the mantissa in the event of an exponent [2] R. Yu and G. Zyner, “167 MHz radix-4 floating point multiplier,” in
below emin and second, it aligns the mantissa to the correct Proceedings of the 12th Symposium on Computer Arithmetic, 1995, pp.
rounding position. 149–154.
[3] C. Lee, “Multistep gradual rounding,” IEEE Transactions on Computers,
In the previous multiplier, the packing (denormalizing) vol. 38, no. 4, pp. 595–600, Apr 1989.
shifter needed to shift up to 54 bits to completely underflow a [4] J. Hauser, “The SoftFloat and TestFloat Validation Suite for Binary
result. Because the shift now occurs prior to overflow detection Floating-Point Arithmetic,” University of California, Berkeley, Tech.
(mantissa [2,4)) the shifter will need a maximum shift of 55. Rep., 1999, available at http://www.jhauser.us/arithmetic/TestFloat.html.

COA - Unit2 Floating Point Arithmetic 2
No ratings yet
COA - Unit2 Floating Point Arithmetic 2
67 pages
An Fpga Based 64-Bit Ieee - 754 Double Precision Floating Point Adder/Subtractor and Multiplier Using VHDL
No ratings yet
An Fpga Based 64-Bit Ieee - 754 Double Precision Floating Point Adder/Subtractor and Multiplier Using VHDL
11 pages
Floating Point: Adders and Multipliers
No ratings yet
Floating Point: Adders and Multipliers
45 pages
Floating Point Adders and Multipliers Adders and Multipliers
No ratings yet
Floating Point Adders and Multipliers Adders and Multipliers
44 pages
Design and Implementation of IEEE 754 Ad
No ratings yet
Design and Implementation of IEEE 754 Ad
7 pages
Implementation of IEEE 754 Compliant Single Precision Floating-Point Adder Unit Supporting Denormal Inputs On Xilinx FPGA
No ratings yet
Implementation of IEEE 754 Compliant Single Precision Floating-Point Adder Unit Supporting Denormal Inputs On Xilinx FPGA
5 pages
Design of Single Precision Floating Point Multiplication Algorithm With Vector Support
No ratings yet
Design of Single Precision Floating Point Multiplication Algorithm With Vector Support
8 pages
Lecture 10 (Temp)
No ratings yet
Lecture 10 (Temp)
50 pages
Floating Point
No ratings yet
Floating Point
3 pages
Emulation of FMA and Correctly-Rounded Sums: Proved Algorithms Using Rounding To Odd
No ratings yet
Emulation of FMA and Correctly-Rounded Sums: Proved Algorithms Using Rounding To Odd
9 pages
Floating Point Multiplier
100% (1)
Floating Point Multiplier
14 pages
Multiplication For 2's Complement System - Booth Algorithm: B B B B B B
No ratings yet
Multiplication For 2's Complement System - Booth Algorithm: B B B B B B
24 pages
Fast Inverse Square Root
No ratings yet
Fast Inverse Square Root
12 pages
Coa Unit 2
No ratings yet
Coa Unit 2
5 pages
International Journal of Engineering Research and Development
No ratings yet
International Journal of Engineering Research and Development
6 pages
2174 PDF
No ratings yet
2174 PDF
7 pages
Floating-Point Arithmetic in The Coq System
No ratings yet
Floating-Point Arithmetic in The Coq System
10 pages
Floating-Point Numbers
No ratings yet
Floating-Point Numbers
23 pages
9-Algorithms For Floating Point Arithmetic Operations-22-01-2024
No ratings yet
9-Algorithms For Floating Point Arithmetic Operations-22-01-2024
49 pages
Chapter 7 - Floating Point Arithmetic
No ratings yet
Chapter 7 - Floating Point Arithmetic
8 pages
Lab 3
No ratings yet
Lab 3
5 pages
Certifying Floating-Point Implementations Using Gappa
No ratings yet
Certifying Floating-Point Implementations Using Gappa
20 pages
Ry U: Fast Float-to-String Conversion: Ulf Adams
No ratings yet
Ry U: Fast Float-to-String Conversion: Ulf Adams
13 pages
Design and Implementation of Fast Floating Point Multiplier Unit
No ratings yet
Design and Implementation of Fast Floating Point Multiplier Unit
5 pages
Implementation of A High Speed Single Precision Floating Point Unit Using Verilog
No ratings yet
Implementation of A High Speed Single Precision Floating Point Unit Using Verilog
5 pages
10 1 1 961 4530 PDF
No ratings yet
10 1 1 961 4530 PDF
5 pages
Float Point Multiplier
No ratings yet
Float Point Multiplier
6 pages
Manage-Implementation of Floating - Bhagyashree Hardiya
No ratings yet
Manage-Implementation of Floating - Bhagyashree Hardiya
6 pages
FPGA Based Reciprocator
No ratings yet
FPGA Based Reciprocator
5 pages
Chapter 3
No ratings yet
Chapter 3
48 pages
Implementation of Floating Point Multiplier
No ratings yet
Implementation of Floating Point Multiplier
4 pages
Rounding Errors: Course Website
No ratings yet
Rounding Errors: Course Website
34 pages
Demystifying Floating Point - John Farrier - CppCon 2015
No ratings yet
Demystifying Floating Point - John Farrier - CppCon 2015
61 pages
Bu 33436438
No ratings yet
Bu 33436438
3 pages
COA Module 2
No ratings yet
COA Module 2
65 pages
Chapter 08 Computer Arithmetic 2
No ratings yet
Chapter 08 Computer Arithmetic 2
58 pages
Implementation of Binary To Floating Point Converter Using HDL
No ratings yet
Implementation of Binary To Floating Point Converter Using HDL
41 pages
Unit 4 - 1
No ratings yet
Unit 4 - 1
11 pages
Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic 33333
No ratings yet
Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic 33333
18 pages
Module 2 Book
No ratings yet
Module 2 Book
34 pages
Lab 1
100% (1)
Lab 1
10 pages
Floating-Point Numbers and Operations Representation
No ratings yet
Floating-Point Numbers and Operations Representation
8 pages
Cit335 Summary
No ratings yet
Cit335 Summary
10 pages
Floating-Point Numbers and Round-Off Errors by Kusal Kaluarachchi Medium
No ratings yet
Floating-Point Numbers and Round-Off Errors by Kusal Kaluarachchi Medium
2 pages
A Brief Introduction To The IEEE Standard
No ratings yet
A Brief Introduction To The IEEE Standard
4 pages
Ieee Arith
No ratings yet
Ieee Arith
3 pages
BCSE205L-Module 2 Division and Floating Point Arithmetic
No ratings yet
BCSE205L-Module 2 Division and Floating Point Arithmetic
36 pages
Lec 7
No ratings yet
Lec 7
18 pages
Unit 3 Chapter1 Computer Arithmetic
No ratings yet
Unit 3 Chapter1 Computer Arithmetic
28 pages
Ece552 10 Floating Point
No ratings yet
Ece552 10 Floating Point
15 pages
Q1: Why Is The Exponent Biased in Floating Point Hardware Design, and What Does Biased Mean in Floating Point?
No ratings yet
Q1: Why Is The Exponent Biased in Floating Point Hardware Design, and What Does Biased Mean in Floating Point?
2 pages
Floating Point Adder
No ratings yet
Floating Point Adder
14 pages
Design and Implementation of Floating Point Multiplier Using Wallace and Dadda Algorithm
No ratings yet
Design and Implementation of Floating Point Multiplier Using Wallace and Dadda Algorithm
6 pages
Floating-Point Arithmetic: Second Slide
No ratings yet
Floating-Point Arithmetic: Second Slide
4 pages
COA
No ratings yet
COA
14 pages
8.1.4 Data Representation - Floatng Point Numbers
No ratings yet
8.1.4 Data Representation - Floatng Point Numbers
3 pages
Floating Point Arithmetic Operations
No ratings yet
Floating Point Arithmetic Operations
61 pages
Document 6
No ratings yet
Document 6
1 page
Document 3
No ratings yet
Document 3
1 page
Document 2
No ratings yet
Document 2
7 pages
31 Design JJ New
No ratings yet
31 Design JJ New
8 pages
Havaldar 2016
No ratings yet
Havaldar 2016
5 pages
Ijspr 5901 30318
No ratings yet
Ijspr 5901 30318
5 pages
Shi Wal 95 A
No ratings yet
Shi Wal 95 A
8 pages
Floating Point Arithmetic A Comprehensive Guide
No ratings yet
Floating Point Arithmetic A Comprehensive Guide
9 pages
Assign 1 MTH308 Sol
No ratings yet
Assign 1 MTH308 Sol
3 pages
9 Computer Architecture and Organization
No ratings yet
9 Computer Architecture and Organization
52 pages
Floating Point Arithmetic
100% (1)
Floating Point Arithmetic
30 pages
IEEE Standard 754 Floating Point Numbers
No ratings yet
IEEE Standard 754 Floating Point Numbers
7 pages
CA - Unit 2 - Important Question & Ans.
No ratings yet
CA - Unit 2 - Important Question & Ans.
6 pages
02 - Data Representation 2
No ratings yet
02 - Data Representation 2
48 pages
AN4044 Application Note: Floating Point Unit Demonstration On STM32 Microcontrollers
No ratings yet
AN4044 Application Note: Floating Point Unit Demonstration On STM32 Microcontrollers
31 pages
IEEE Standard 754
No ratings yet
IEEE Standard 754
10 pages
Exam1 f09 v1
No ratings yet
Exam1 f09 v1
18 pages
4 Floating Point
No ratings yet
4 Floating Point
39 pages
Computer Based Numerical & Statistical Techniques (MCA - 106)
No ratings yet
Computer Based Numerical & Statistical Techniques (MCA - 106)
209 pages
The Design of An Ic Half Precision Floating Point Arithmetic Logi
No ratings yet
The Design of An Ic Half Precision Floating Point Arithmetic Logi
133 pages
COMP0068 Lecture10 High Level Data Types
No ratings yet
COMP0068 Lecture10 High Level Data Types
25 pages
Handbook of Floating-Point Arithmetic
No ratings yet
Handbook of Floating-Point Arithmetic
11 pages
Chapter 03
No ratings yet
Chapter 03
77 pages
Exm Opencl Tdfir Optimization Guide
No ratings yet
Exm Opencl Tdfir Optimization Guide
42 pages
COA - Advanced Sheet 2023
No ratings yet
COA - Advanced Sheet 2023
48 pages
Chapter 3 Arithmetic For Computers
No ratings yet
Chapter 3 Arithmetic For Computers
82 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
26 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Thompson 2015

Uploaded by

Thompson 2015

Uploaded by

Table of Contents

for this manuscript

An IEEE 754 Double-Precision Floating-Point

U.S. Government work not protected by U.S. copyright 62 ASAP 2015

Sum[53:0] Carry[53:0] Sum[−1:−52] Carry[−1:−52] Inj[−1:−52] A[52:0] B[52:0]

Shift Shift Z1[0] Z0[0]

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.