0% found this document useful (0 votes)

70 views7 pages

A High Performance Floating Point Coprocessor

The document summarizes a floating point coprocessor chip designed to accelerate floating point calculations for a microprocessor. The chip contains 34,000 transistors, implements 46 floating point instructions across four data types (single/double precision floating point and 16/32-bit integers), and operates at 100 nanoseconds per cycle for most operations. It utilizes various techniques like carry length detection and multi-cycle multiplication/division algorithms to optimize performance for the wide datapath required for floating point calculations.

Uploaded by

kiran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views7 pages

A High Performance Floating Point Coprocessor

Uploaded by

kiran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

690 lEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. .sc-l 9, NO.

5, OCTOBER 1984

A High Performance Flc)ating Point

Coprocessor
GIL WOLRICH, EDWARD McLELLAN, LARRY HARADA, JAMES MONTANARO,
AND ROBERT A. J. YODLOWSICI

.4fmfract — A 34000 transistor single-chip floating point coprocessor TABLE I

fabricated in 3 pm double metal NMOS technology is described. The FLOATING POINT DATA TYPES
fraction data path, including a shifter and 6(I bit carry propagate ALU, is
F = Floating ~ 1 bit sign, 8 bit exponent, 24 bit fraction
cycled in 100 ns for all operations requiring less than 19 bits of consecutive
D = double ~ 1 bit sign, 8 bit exponent 56 bit fraction
carry. A versatile carry length detection scheme, which requires minimal G (VAX only) =5 1 bit sign, 11 bit exponent, 53 bit fraction
additional logic, is used to extend the microcycle for the small percentage 32 bit integers, 16 bit integers
of operations in which a long carry exists. Three bit per cycle mrrkiplica-
tion and one and one half bit per cycle division afgoritfurrs were used to
achieve excellent overall performance.

INTRODUCTION

HIS paper describes a single-chip floating point accel-

T erator
chip set. Fabricated
(FPA) for the J-11, 16/32
in a 3 ~m drawn
bit microprocessor
double-metal NM(XS
process, the FPA implements a floating point instruction
set of 46 instructions, The FPA supports four data types:
single and double precision floating point and 16 and 32
bit integer (Table I).
The FPA measures 7.6X 6.3 mm, contains 34000 tran-
sistors, and dissipates 2 W. It is packaged in a 40 pin DIP.
A photomicrograph is shown in Fig. 1. The chip contains
six externally addressable floating point data registers, a
floating exception code register, and a floating point status
and mode control register. The principal functional blocks
in the FPA are the EU (execution unit), consisting of
fraction, ,exponent, and sign datapaths, and the BIU (bus
interface unit).

MICROARCHITECTURE

The fraction processor is a 60 bit wide data path (Fig. 2)

consisting of 7 single-ported RAM registers, 8 ROM con-
stants, an ALU, an 8-position argument shifter (left 2 to Fig. 1. Photomicrograph of FPA.
right 5), 2 operand registers, a Q register (multipler/quo-
tient/output), a 4-position Q shifter (left 2, left 1, no shift,
exponent and sign of floating point operands in parallel
or right 3), and an input register. The fraction shifters with the fraction. The BIU controls all 1/0 for the FPA
provide sufficient range for 3 bit/cycle multiplication, 1.5 and contains a second-state sequencer allowing operation
bit/cycle variable shift division, 5 bit/cycle alignment, and independent of the EU.
3 bit/cycle normalization algorithms.
The EU sequencer contains a two-bank folded PLA with
CARRY LENGTH DETECTION
13 inputs, 160 total AND terms, and 36 outputs. The
exponent processor is a 10 bit data path used to process the Hardware floating point processing is characterized by
very wide data paths. A great deal of hardware in the form
Manuscript received March 22, 1984; revised May 16, 1984. of carry lookahead schemes or carry save adders are often
The authors are with the Digital Equipment Corporation, Hudson, MA
01749. used in computing systems to reduce the penalty associated

0018-9200/84/1000-0690$01.00 W984 IEEE

wOLRICH et a[.: FLOATING POINT COPROCESSOR (591

-s==s
Po
ROM (CONSTANTS)

P4 e ENABLE DOUBLE PRECISION

t
P5

H 60
P9
1 %

I B REGISTER
(MU LTIPLICAND/DIV
Plo 4

60-+--i- P14 7

P15 -

P19 y

P20 4

P24 -
(PARTIAL PRODUCT/REMAINDER)
P25

*
P29

SHIFTER P30
(L2T0 R5)
1
P34

I
OSHIFTER
(Ll, L2, R3. 0)
1
2%--
P40

II
P44 GROUP PROPAGATES
ALSO USED FOR

x
P45 CARRY LOOKAHEAD

P49 E

x
.
P55

P59 e

MINIMUM STUTTER = 10
Yn
~ALLOW STUTTER

DATA BUS
MAX NOT STUTTER = 16
Y “STUTTER” TO
Fig. 2. Block diagram of fraction data path. CLOCK CIRCUIT

Fig. 3. FPA 60 blt ALU “stutter circuit.”

with long carry propagation delays. The FPA, by compari- all operations which have a carry length of 19 or greater.
son, achieves a fast (100 ns) microcycle time, including a 60 Two le~els of AND gating are used because the first level of
bit ALU operation using a new technique well suited to gating is already present for the minimal 5 bit group carry
VLSI applications, as well as designs using standard parts. lookahead logic. The mDing of two group propagate sig-
A simple carry length detection scheme is used to pro- nals indicates whether or not all of the propagates in that
duce a stutter signal that stretches the final phase of the group of 10 bits are asserted.
EU clock if a long carry propagation path exists. The Single precision processing uses only the upper half of
method takes advantage of the fact that most ALU op- the fraction data path. In order to avoid unnecessary
erations have a largest maximum carry length which is stutter cycles caused by data in the lower half of the clata
much less than the width of the ALU. By detecting a long path, an additional enable signal is included in the detec-
carry and providing additional time for the ALU to com- tion gates covering bits 34 to 5. The stutter signal may set
plete for a small percentage of operations, a data path can for as few as 10 consecutive propagates, but might not set
be run at a fast rate for most ( >95 percent) ALU cycles. for as many as 18. A propagate is a necessary but not
Fig. 3 shows the stutter circuit for the 60 bit wide ALU sufficient condition to imply an actual carry. For this
used in the J-II floating point accelerator chip. In Fig. 3 reason, an allow stutter signal is used to gate the stutter
the propagates produced in bit positions 54 to 5 of the signal to the clock circuitry. The allow stutter signal is not
ALU’s PG (propagate generate) logic are gated with a set for ALU operations in which all bit positions will
minimum of logic to produce a detection signal stutter for produce a generate. Unnecessary stutter cycles are there-
692 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. SC-19, NO. 5, OCTOBER 1984

Po
fore prevented for ALU operations in which carry propa-
gation is not a factor (i.e., A + A generates a carry at all
P3
D
bit positions).
P4 -
The optimal width of the AND function used to produce
the group propagate signals will depend on the technology
P7 —
in which the ALU is to be implemented. Fig. 4 shows an
P8
example of the stutter circuit where the width of the group
propagate is 4 bits. If the individual propagate bits are not PI 1
available, as in standard part ALU slices, then the ANDing
P12
of the group propagates can be used directly. The stutter
circuit is very inexpensive to implement, due to the fact P15

that the group propagates are already required for any

P16
ALU with lookahead carry.
Propagate terms become valid after a fixed combina- P19

tional delay allowing enough time to control the length of a P20

system clock phase. The current design uses only one
stutter signal to force a 100 ns clock phase extension P23

sufficient for the longest possible carry. A second stutter P24

signal detecting wider carry lengths could also be used to
more finely tune the clock phase extension (Fig. 5). P27

The difference between the maximum carry length which P28

always stutters and the minimum carry length which might

P31 D
stutter can be reduced by adding additional sets of AND
functions which produce overlapped groupings of consecu- r ALLOW STUTTER

MI NSTALL=8
tive propagates (Fig. 6).
MAX NOT STALL = 14
For the circuit described in Fig, 3 the probability of
requiring a stutter on random data is i- STUTTER

Fig. 4. Stutter detect for 32 bit ALU with 4 bit lookabead groupings.
P=((w–nZ)/n )(2* *(w- n))\2* *W

where bits retired. The required shift and ALU function for each
cycle is determined by examining the multiplier bits from
w = width of the ALU
the LSB’S of the Q register (bits 38:35 for F, bits 5:2 for
m = total bits not included in any detection gate
D). Since the main data path has only one shifter, the total
n = width of the detection gates shift required for each cycle combines the shift necessary to
(this equation is an upper bound since align the binary point for the present multiple, and the
data which sets more than 1 detection post shift required to complete the 3 bit retirement from
gate are counted more than once) the previous cycle. The previous group of multiplier bits
~=6tJ, m=lO, n=lo are held in a delay register which is initially cleared allow-
double precision
ing the algorithm to begin without any shifts being owed
P = 5/1024
from a previous cycle.
single precision ~=30,~=lo,~=lo Prior to the start of the multiplication, 3/4 times the
P = 2/1024, multiplicand is calculated and placed in the scratch register
for use in generating the multiple of 6. If the multiples 2,4,
The 60 bit ALU/shifter cycle of the J-n FPA can be or 8 are required, the normal multiplicand register is
completed in 100 ns for all operations in which the maxi- accessed and the partial product is appropriately shifted.
mum consecutive carry is less than 19 bits. When a multiple of 6 times the multiplicand is required,
the contents of the scratch register are used instead of the
ALGORITHMS multiplicand and added to the partial product shifted right
3 times. A special microinstruction is used to establish an
The FPA executes four data path assisted microinstruc- initial partial product of either zero if the LSB of the
tion which serve as the basis for executing the multiplica- multiplier is zero, or minus one times the multiplicand if
tion, division, alignment, and normalization algorithms. the multiplier LSB is a one. Table II details the single
The FPA uses a fixed 3 bit shift algorithm to perform shifter 3 bit multiplication algorithm implemented in the
multiplication. The algorithm requires the generation of FPA.
multiples O, 2, 4, 6, and 8 times the multiplicand. The The FPA uses a normalizing nonrestoring division al-
multiples are added or subtracted to the partial product gorithm which produces a quotient at a rate of 1.5
and the result is shifted to account for the three multiplier bits/cycle. If the partial remainder will be normalized for
WOLRICH et al.: FLOATING POINT COPROCESSOR 693

P4 ENABLE DOUBLE
PRECISION

=-L_
P25

P29

P30

P34
I
P35

P39

P40

P44

P45

P49

P50

P54

’55-

P59W Y
rALLOW STUTTER
r ALLOW STUTTER

STUTTER

STUTTER 2<
MIN STALL=
1 <MAX
10
NOT ST*LL

MI NSTALL =20
MAX NOT STALL.
. ,B

28
Q STUTTER 1 L! STUTTER 2

Fig. 5. Stutter circuit with second tier.

division (– 1< R < – 1/2, 1/2< R < 1) by a left shift of for F, bits 3:2 for D) and the Q register shifts either 1 or 2
one, then one new quotient bit is determined; when the bits left as required. When Q57 = 1, a Q shift of left one is
partial remainder requires more than a single left shift to forced and only one more quotient bit is accepted. When
become normalized, two quotient bits are determined. Ta- Q58 = 1, the division is completed and the normalized
ble III describes the next shift, ALU operation, and quo- quotient is in the Q register. The normalized quotient can
tient bits derived as a function of the MSBS of the partial be in the range 1/2< Q <1 or 1< Q <2 dependingon’the
remainder. If the 4 MSB’S of the partial remainder equal ratio of the initial dividend and divisor. If the initial
all ones or all zeros then a left shift of 2 will not normalize subtraction of the divisor from the dividend is positive,
the present partial remainder and the next cycle ALU then 1< quotient <2 and the final exponent is incre-
operation is A ~ A. This insures that the next partial mented.
remainder will remain R <1/2.
in If the
the range – 1/2 < The important feature of the FPA alignment and nor-
present partial remainder can be normalized by a left shift malization algorithms is that although the main shifter has
of one or two, the next ALU operation adds or subtracts limited range, the shift probability data for floating point
the divisor depending on the sign of the remainder in order addition and subtraction (Table IV) show this range to be
to drive it toward zero. The quotient bits are inserted at the all that is required for most operations. The FPA perfcmms
guard bit and LSB positions of the Q register (bits 35:34 78 percent (up to 5 bits of exponent difference) of the
694 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL SC-19. NO. 5, OCTOBER 1984

Plo

P14

P15

P19

P20

P24

P25

P29

P30

P34

P35

P39

P40

P49 1 I

P50
II

P54

P55 -

P59 — STUTTER
MI NSTALL=1O

MAX NOT STALL= 13

Y STUTTE R

Fig. 6. Stutter circuit with overlapped detection gates.

TABLE II
FPA 3 BIT/CYCLE MULTIPLICATION ALGORITHM

Present Alignment Previous Shift owed

Multiplier Shift Multiplier from
ALU/REG Group for Present Group Previous
A6A 0000 3 0000 o
A& A+ B; B=AC 0001 1 0001 ~
AGA+B;BcAC 0010 1 0010 ~
A&,4+ B; B==,4C 0011 2 0011 1
A~A+B:B=AC 0100 2 0100 1
AGA+B:B=SCR 0101 3 0101 0
A6A+B; B=scR 0110 3 0110 0
A~A+B:B=AC 0111 3 0111 0
.4+ A–B:B=AC 1000 3 1000 0
A* A–B:B=SCR 1001 3 1001 0
AFA– B; B=SCR 1010 3 1010 0
A* A–B; B=AC 1o11 2 1o11 1
A~A– B; B=AC 1100 2 1100 1
AeA– B: B=AC 1101 1 1101 2
Ah A–B:B=AC 1110 1 1110 2
/f+yI 1111 3 1111 0
AC+ Multiplicand
SCR + 3/4 Multiplicand
WOLRtCH et a[.: FLOATING POINT COPROCESSOR 695

TABLE III
FPA 1.5 BIT/CYCLE DIVISION ALGORtTHM

Quotient Bit(s) Formed

If Partiaf Remainder
Next Next Is Derived by
F59 F58 F57 F56 Shift ALU ADD/SUB Sfuft Left 2
Q3 Q2 Q3 Q2
o 0 0 0 SHL2 A~A 1 0 0 0
0 0 0 1 SHL2 SUB 1 0 0 0
0 0 1 0 SHL1 SUB 1 0
0 0 1 1 SHL1 SUB 1 0
1 1 0 0 SHL1 ADD o 1
1 1 0 1 SHL1 ADD o 1
1 1 1 0 SHL2 ADD O 1 1 1
1 1 1 1 SHL2 A~A O 1 1 1
F59 = Fraction MSB
Q3 = Quotient Register LSB
Q2 = Quotient Register Guard Bit

Note: Quotient bit(s) are inserted at bit positions Q35 and Q34 for single
precision operation instead of Q3 and Q2.

TABLE IV TABLE V
WEIGHTED DATA ON ALIGNMENT AND NORMALIZATION J-n FPA TYPICAL REGISTER-TO-REGISTER EXECUTION TIMES

FPA FPA ADDF/SUBF llps F 1 bit sign ‘“

Alignments Align Normafizatlons Normafize MULF 1,6 ~S floating = 8 blt exponent
Shift (percent) Cycles (percent) Cycles DIVF 2.7 ~S 24 bit fraction
ADDD/SUBD 1.1 ps D 1 bit sign
Result O 0.82 1
MULD 2,8 /&S double = 8 bit exponent
Overflow 17.43 1
DIVD 4.7 MS 56 bit fraction
0 26.28 1 60.25 1
1 13.29 1 8.01 1
2 8.77 1 3.17 1
3 6.53 1 1.54 3 left shift capability of the fraction shifter to accomplish the
4 7.77 1 1,33 3
5 8.07 1 1.03 3
3 bit shift.
6 5.13 2 0.63 4
7 3.78 2 1.02 4
8 1.10 2 0.93 4 PERFORMANCE
9 1.23 2 0.56 5
10 1.84 2 0.66 5 The combination of a 100 ns internal cycle and opti-
11 1.35 3 0.16 5
mized arithmetic algorithms provides excellent perfor-
12 1.54 3 0.22 6
13 0.81 3 0.31 6 mance. Table V shows typical execution times for regis, ter-
14 0.48 3 0.16 6 to-register operations.
15 0.58 3 0.03 7
The J-n FPA interfaces as a true coprocessor. The BIU
16 0.29 4 0.06 7
17 0.31 4 0.09 7 inputs all instruction stream data and decodes instructions
18 0.50 4 0.17 8 in parallel with the base processor. Support microcode in
19 0.32 4 0.15 8
4 0.08
the CPU initiates all 1/0 cycles required by the FPA. As a
20 0.26 8
21 0.40 5 0.07 9 coprocessor, floating point instruction execution can occur
22 0.30 5 0.07 9 simultaneously with integer code. This overlap can effec-
23 0.24 5 0.19 9
tively be used to reduce the execution time of mixed code
24 0.25 5 0.49 10
25 0.26 5 0.09 10 by interleaving floating point and nonfloating point in-
26 0.16 6 0,28 10 structions.
27-53 0.86
A second and more frequent type of instruction overlap
54-255 7.30
which provides a substantial performance gain in floating
*Reference D. W. Sweeney point intensive code is also achieved by the FPA, The IEIIU
supports the overlap of operand data loading for the next
aligns and 90 percent of the normalizations with a single floating point instruction, while the EU completes the
shift cycle. The average number of shift cycles is 1.5 for processing of the current instruction. The FPA asserts a
alignment and 1.2 for normalization. Alignment proceeds stall signal to prevent the CPU from initiating more than
at a 5 bit/cycle rate if the exponent difference is between 6 one new floating point instruction while the EU is still
and the length of the data type. Only one cycle is needed busy. Only the portion of FPA execution time, if any, that
for normalizations requiring shifts in the range of right one the CPU is stalled actually effects system performance.
to left two. If more than 2 bits of left shifting is required, a Two additional floating point processor chips have been
no shift cycle occurs to examine additional fraction bits. developed by extending the J-n FPA design to the VAX
Normalization then proceeds at 3 bits/cycle. The left shift architecture. The G floating point format and the extended
function of the fraction ALU is combined with the 2 bit multiply and integenze instructions are supported by in-
696 [EEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. SC- 19, NO. 5, OCTOBER 1984

~ ~
pm
Larry Harada received the B.S. degree in electri-
creasing the fraction and exponent data paths to 67 and 13
‘“”;8 cal engirieering in 1980, and the M.E. degree in
bits, respectively. The three designs achieve similar perfor- ;f;y
1981, both from Cornell University, Ithaca, NY.
* ,
mance. The carry length detection method and the 3 bit $%?, He joined Digital Equipment Corporation,
.&>.
multiplication are especially beneficial in applications re- %$?$$ Hudson, MA in July 1981. He is currently work-
@(*.,@
ing for the Digital LSI Manufacturing Group in
quiring wide data paths, as evidenced by the performance $&:
Hudson.
of these three floating point processor chips.

IU3FERENCES

[1] D. W, Sweeney, “An analysis of floating-point addition,” IBM SW.

J., vol. 4, pp. 31-42, 1965.
[2] O. L. MacSorley, “High-speed arithmetic in binary computers,”
Proc. IRE, vol. 49, pp. 67–91, 1961,
[3] E. Swartzlander, Ed., Computer Arithmetic. New York: Dowden,
Hutchinson, and Ross, 1980. James Montanaro received the B.S. and MS.

7
degrees in electrical engineering from Massachu-
setts Institute of Technology, Cambridge, MA, in
% 1980.
Prior to Joining Digital Equipment Corpora-
Gil Wolridh received the B.S. degree in electrical tion, Hudson, MA, in 1982, he was with In-
engineering from Rensselaer Polytechnic In- ~? tegrated Circuit Systems, Incorporated, West-
stitute, Troy, NY, in 1971, and the M. S. in boro, MA.
electrical engineering from Northeastern Univer-
sity, Boston, MA, in 1978.
He joined the Digital Equipment Corporation,
Hudson, MA, in April 1979. He is currently a
Principal Engineer with the Semiconductor En-
gineering Group in Hudson, MA.

Robert A. J. Yodlowski received the B.S. degree

in engineering physics from Cornell University,
Edward McLellan received the B.S. degree in
Ithaca, NY, in 1968, and the M.S. degree in
computer and systems engineering from Rensse-
electrical engineering from Syracuse University,
laer Polytechnic Institute, Troy, NY, in 1980.
Syracuse, NY, in 1970.
He is currently a Senior Engineer with the
He has been employed with the Digital
Semiconductor Engineering Group of the Digital
Equipment Corporation, Hudson, MA, since
Equipment Corporation. Hudson, MA.
1977. He is currently a Principal Engineer with
the Semiconductor Engineering Group in Hud-
son, MA. His interests include MOS circuit and
logic design.

DLD Lecture
No ratings yet
DLD Lecture
275 pages
Implementation of 4 Bit Floating Point Multiplication Using VHDL
No ratings yet
Implementation of 4 Bit Floating Point Multiplication Using VHDL
36 pages
COS 104 Project
No ratings yet
COS 104 Project
24 pages
Single and Double Precision Floating Point Multiplication and Division Alu
No ratings yet
Single and Double Precision Floating Point Multiplication and Division Alu
26 pages
Design of FPGA Based 32-Bit Floating Point Arithmetic Unit and Verification of Its VHDL Code Using MATLAB
No ratings yet
Design of FPGA Based 32-Bit Floating Point Arithmetic Unit and Verification of Its VHDL Code Using MATLAB
14 pages
Project Report Vlsi
No ratings yet
Project Report Vlsi
33 pages
Floating Point Arith
100% (1)
Floating Point Arith
8 pages
PIC Microcontrollers
75% (4)
PIC Microcontrollers
20 pages
Asembly Language
No ratings yet
Asembly Language
42 pages
Lecture 06 - MIPS Floating Point Arithmetic
No ratings yet
Lecture 06 - MIPS Floating Point Arithmetic
23 pages
A CMOS Floating Point Unit
No ratings yet
A CMOS Floating Point Unit
13 pages
MC6839 Floating-Point ROM Manual PDF
No ratings yet
MC6839 Floating-Point ROM Manual PDF
94 pages
Design and Implementation of Power Optimized 64 Bit Floating Point ALU Employing Block Enabling Technique
No ratings yet
Design and Implementation of Power Optimized 64 Bit Floating Point ALU Employing Block Enabling Technique
8 pages
Design and Implementation of Power Optimized High Performance 32 Bit Floating Point Alu Employing Block Enabling Technique
No ratings yet
Design and Implementation of Power Optimized High Performance 32 Bit Floating Point Alu Employing Block Enabling Technique
28 pages
Embedded Systems Design 10EC74
No ratings yet
Embedded Systems Design 10EC74
110 pages
881 Asm
No ratings yet
881 Asm
23 pages
Motherboard Vcore VRM Tier List
100% (1)
Motherboard Vcore VRM Tier List
3 pages
DSP Lab Manual
No ratings yet
DSP Lab Manual
26 pages
Performance Enhancement of Cisc Microcontroller: Mr. K. Sai Krishna Mr. G. Sreenivasa Raju
No ratings yet
Performance Enhancement of Cisc Microcontroller: Mr. K. Sai Krishna Mr. G. Sreenivasa Raju
6 pages
Fpga Implementation of FFT Algorithms Using Floating
No ratings yet
Fpga Implementation of FFT Algorithms Using Floating
5 pages
Floating 2
No ratings yet
Floating 2
5 pages
Microprocessors-Architecture and Programming
No ratings yet
Microprocessors-Architecture and Programming
31 pages
Floating Point Arithmetic Unit With Multi-Precision For DSP Applications
No ratings yet
Floating Point Arithmetic Unit With Multi-Precision For DSP Applications
8 pages
Design Amp Implementation of Floating Point ALU On A FPGA Processor
No ratings yet
Design Amp Implementation of Floating Point ALU On A FPGA Processor
5 pages
2174 PDF
No ratings yet
2174 PDF
7 pages
Design and Implementation of A High Performance Floating
No ratings yet
Design and Implementation of A High Performance Floating
15 pages
FPDSP Latest
No ratings yet
FPDSP Latest
14 pages
Design and Simulation of 32 Bit Floating Point FFT Processor Using VHDL
No ratings yet
Design and Simulation of 32 Bit Floating Point FFT Processor Using VHDL
8 pages
Verilog Project Report
No ratings yet
Verilog Project Report
13 pages
Out of Order Floating Point Coprocessor For RISC V ISA
No ratings yet
Out of Order Floating Point Coprocessor For RISC V ISA
7 pages
Implementation of Double Precision Floating Point Radix-2 FFT Using VHDL
No ratings yet
Implementation of Double Precision Floating Point Radix-2 FFT Using VHDL
7 pages
Synopsis and Literature Survey
No ratings yet
Synopsis and Literature Survey
10 pages
Design and Implementation of FPU For Optimised Speed: R. Bhuvanapriya, Menakadevi T
No ratings yet
Design and Implementation of FPU For Optimised Speed: R. Bhuvanapriya, Menakadevi T
12 pages
An Efficient Implementation of Oating Point Multiplier: Conference Paper
No ratings yet
An Efficient Implementation of Oating Point Multiplier: Conference Paper
6 pages
MCS-012 Solved Assignment 2023-24 - Protected
No ratings yet
MCS-012 Solved Assignment 2023-24 - Protected
50 pages
Advanced Microprocessor
No ratings yet
Advanced Microprocessor
8 pages
Floating Point Processor
No ratings yet
Floating Point Processor
5 pages
Chipmonk - VLSI Core Placement Checklist For ECE Students
No ratings yet
Chipmonk - VLSI Core Placement Checklist For ECE Students
4 pages
Shi Wal 95 A
No ratings yet
Shi Wal 95 A
8 pages
Implementation of Binary To Floating Point Converter Using HDL
No ratings yet
Implementation of Binary To Floating Point Converter Using HDL
41 pages
Manage-Implementation of Floating - Bhagyashree Hardiya
No ratings yet
Manage-Implementation of Floating - Bhagyashree Hardiya
6 pages
Design and Implementation of FPGA Based 32 Bit Floating Point Processor For DSP Application
No ratings yet
Design and Implementation of FPGA Based 32 Bit Floating Point Processor For DSP Application
5 pages
LIC Lab Manual
0% (1)
LIC Lab Manual
65 pages
Experiment 3: First Order Low Pass Filter and High Pass Filter
100% (3)
Experiment 3: First Order Low Pass Filter and High Pass Filter
3 pages
Design and Implementation of Floating Point ALU With Parity Generator Using Verilog HDL
No ratings yet
Design and Implementation of Floating Point ALU With Parity Generator Using Verilog HDL
6 pages
DSP Floating Point Formats
No ratings yet
DSP Floating Point Formats
29 pages
Lab3 Digital Design
No ratings yet
Lab3 Digital Design
4 pages
Design of Low-Area and High Speed Pipelined
No ratings yet
Design of Low-Area and High Speed Pipelined
6 pages
Efficient Implementation of Pipelined Double Precision Floating Point Unit On FPGA
No ratings yet
Efficient Implementation of Pipelined Double Precision Floating Point Unit On FPGA
6 pages
Implementation of 32 Bit Floating Point MAC Unit To Feed Weighted Inputs To Neural Networks
No ratings yet
Implementation of 32 Bit Floating Point MAC Unit To Feed Weighted Inputs To Neural Networks
4 pages
Abstract-A New Floating-Point Fused Multiply-Add (FMA) Design For The
No ratings yet
Abstract-A New Floating-Point Fused Multiply-Add (FMA) Design For The
5 pages
S S 32-B M C D: Imulation and Ynthesis of IT Ultiplier Using Onfigurable Evices
No ratings yet
S S 32-B M C D: Imulation and Ynthesis of IT Ultiplier Using Onfigurable Evices
8 pages
Synthesis of Single Precision Floating Point ALU: Department of Electronics and Communication Engineering
No ratings yet
Synthesis of Single Precision Floating Point ALU: Department of Electronics and Communication Engineering
20 pages
Assembly Quick Guide
100% (1)
Assembly Quick Guide
50 pages
Ijspr 1203 438
No ratings yet
Ijspr 1203 438
4 pages
Design and Synthesizing of Floating Point Adder Andmultiplier Using Cadence RTL Compiler
No ratings yet
Design and Synthesizing of Floating Point Adder Andmultiplier Using Cadence RTL Compiler
6 pages
TL866II Instructions
No ratings yet
TL866II Instructions
9 pages
DDCA Ch1
No ratings yet
DDCA Ch1
19 pages
Floating Point Multiplier
No ratings yet
Floating Point Multiplier
6 pages
Floating Point Alu
No ratings yet
Floating Point Alu
11 pages
10 1 1 961 4530 PDF
No ratings yet
10 1 1 961 4530 PDF
5 pages
Introduction To Mikroc PRO For PIC12
No ratings yet
Introduction To Mikroc PRO For PIC12
27 pages
Floating-Point Multiplication Unit With 16-Bit Significant and 8-Bit Exponent
No ratings yet
Floating-Point Multiplication Unit With 16-Bit Significant and 8-Bit Exponent
6 pages
Implementation of Floating Point Multiplier
No ratings yet
Implementation of Floating Point Multiplier
4 pages
Implementation of IEEE 754 Compliant Single Precision Floating-Point Adder Unit Supporting Denormal Inputs On Xilinx FPGA
No ratings yet
Implementation of IEEE 754 Compliant Single Precision Floating-Point Adder Unit Supporting Denormal Inputs On Xilinx FPGA
5 pages
Nba Ug Cs B 3 6 Syllabus
No ratings yet
Nba Ug Cs B 3 6 Syllabus
48 pages
Counters and Register
No ratings yet
Counters and Register
44 pages
MS-17591 10 140103 PDF
No ratings yet
MS-17591 10 140103 PDF
53 pages
Implementation of Ieee Single Precision Floating Point Addition and Multiplication On Fpgas
No ratings yet
Implementation of Ieee Single Precision Floating Point Addition and Multiplication On Fpgas
4 pages
Floating Point Adder
No ratings yet
Floating Point Adder
14 pages
Workshop Notes
No ratings yet
Workshop Notes
5 pages
FPGA Implementation of Addition Subtraction Module For Double Precision Floating Point Numbers Using Verilog
No ratings yet
FPGA Implementation of Addition Subtraction Module For Double Precision Floating Point Numbers Using Verilog
5 pages
Pesu - Be - Jan - May16
No ratings yet
Pesu - Be - Jan - May16
2 pages
ESP-WROOM-32 Datasheet: Espressif Systems
No ratings yet
ESP-WROOM-32 Datasheet: Espressif Systems
26 pages
Ghazanfar Asadi and Mehdi B. Tahoori Northeastern University, Dept. of ECE, Boston MA 02115 Email: (Gasadi, Mtahoori) @ece - Neu.edu
No ratings yet
Ghazanfar Asadi and Mehdi B. Tahoori Northeastern University, Dept. of ECE, Boston MA 02115 Email: (Gasadi, Mtahoori) @ece - Neu.edu
2 pages
16 Bit Microprocessor 8086
No ratings yet
16 Bit Microprocessor 8086
11 pages
Design and Implementation of An Optimized Double Precision Floating Point Divider On FPGA
No ratings yet
Design and Implementation of An Optimized Double Precision Floating Point Divider On FPGA
8 pages
Digital Electronics Lab: Institute - Uie
No ratings yet
Digital Electronics Lab: Institute - Uie
43 pages
Implementation of A High Speed Single Precision Floating Point Unit Using Verilog
No ratings yet
Implementation of A High Speed Single Precision Floating Point Unit Using Verilog
5 pages
hw1 11 12 13 16 31 12 33 PDF
No ratings yet
hw1 11 12 13 16 31 12 33 PDF
7 pages
Flip-Flops - Conversions
No ratings yet
Flip-Flops - Conversions
18 pages
Walter Dxdiag
No ratings yet
Walter Dxdiag
14 pages
CS 2252 - Microprocessors and Microcontrollers PDF
No ratings yet
CS 2252 - Microprocessors and Microcontrollers PDF
3 pages
Fault Injection and Fault Detection Technique For Sram Based FPGA
No ratings yet
Fault Injection and Fault Detection Technique For Sram Based FPGA
3 pages
pcf8575 - Datasheet
No ratings yet
pcf8575 - Datasheet
33 pages
Foxconn - 945u01 PDF
No ratings yet
Foxconn - 945u01 PDF
40 pages
Hardware Design and Arithmetic Algorithms For A Variable-Precision, Interval Arithmetic Coprocessor
No ratings yet
Hardware Design and Arithmetic Algorithms For A Variable-Precision, Interval Arithmetic Coprocessor
8 pages
A VLSI Analog Computer - Math Co-Processor For A Digital Computer
No ratings yet
A VLSI Analog Computer - Math Co-Processor For A Digital Computer
3 pages
Lab Report Draft
No ratings yet
Lab Report Draft
2 pages
HDL Lab4
No ratings yet
HDL Lab4
2 pages
Assignment 2 Matlab
No ratings yet
Assignment 2 Matlab
1 page
Desktop-Styled Attendance Machine Installation Guide V1.0.1
No ratings yet
Desktop-Styled Attendance Machine Installation Guide V1.0.1
1 page
Mastering FT8 A Comprehensive Guide to the Ultimate Digital Mode
From Everand
Mastering FT8 A Comprehensive Guide to the Ultimate Digital Mode
Duarte Braga
No ratings yet
IP Routing Protocols All-in-one: OSPF EIGRP IS-IS BGP Hands-on Labs
From Everand
IP Routing Protocols All-in-one: OSPF EIGRP IS-IS BGP Hands-on Labs
Redouane MEDDANE
No ratings yet
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
Analog Dialogue, Volume 45, Number 4: Analog Dialogue, #4
From Everand
Analog Dialogue, Volume 45, Number 4: Analog Dialogue, #4
Analog Dialogue
No ratings yet
Analog Dialogue, Volume 48, Number 1: Analog Dialogue, #13
From Everand
Analog Dialogue, Volume 48, Number 1: Analog Dialogue, #13
Analog Dialogue
4/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

A High Performance Floating Point Coprocessor

Uploaded by

A High Performance Floating Point Coprocessor

Uploaded by

690 lEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. .sc-l 9, NO.

A High Performance Flc)ating Point

.4fmfract — A 34000 transistor single-chip floating point coprocessor TABLE I

HIS paper describes a single-chip floating point accel-

The fraction processor is a 60 bit wide data path (Fig. 2)

0018-9200/84/1000-0690$01.00 W984 IEEE

P4 e ENABLE DOUBLE PRECISION

Fig. 3. FPA 60 blt ALU “stutter circuit.”

that the group propagates are already required for any

tional delay allowing enough time to control the length of a P20

sufficient for the longest possible carry. A second stutter P24

The difference between the maximum carry length which P28

always stutters and the minimum carry length which might

Fig. 5. Stutter circuit with second tier.

MAX NOT STALL= 13

Fig. 6. Stutter circuit with overlapped detection gates.

Present Alignment Previous Shift owed

Quotient Bit(s) Formed

FPA FPA ADDF/SUBF llps F 1 bit sign ‘“

[1] D. W, Sweeney, “An analysis of floating-point addition,” IBM SW.

Robert A. J. Yodlowski received the B.S. degree

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.