High Speed Multiplication Using BCD Codes For DSP Applications
High Speed Multiplication Using BCD Codes For DSP Applications
Abstract In digital systems, multiplier is the prominent deciding factor to the overall speed, area and power consumption.
The intention of this project is to improve the parallel decimal multiplication. The proposed decimal multiplier uses internally a
redundant BCD (Binary Coded Decimal) code. The overloaded BCD or ODDS (Overloaded Decimal Digit Set) representation
was proposed to improve the decimal Multi-operand addition, sequential and parallel decimal multiplications. The proposed
system goes through three main stages. First, Partial Product Generation (PPG) algorithm uses radix10 recoding that produces a
reduced number of partial products. Second, Partial Product Reduction (PPR) algorithm is used to reduce the partial products into
two 2d-digit words (A,B). Third, Non- redundant BCD conversion produces final BCD products (P=A+B). The parallel decimal
multiplier simplifies the implementation and increases the operation speed. The proposed decimal multiplier reduces the overall
multiplier area for similar target delays with respect to the fastest implementation. The high speed area efficient decimal
multiplication using CSA adder reduces the delay by 12.23% compared to existing system.
1
All Rights Reserved 2015 IJARTET
ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com
The rest of the paper is organized as follows. Section II 5:1 muxes when the sign of the corresponding SD radix-10
discusses about various literatures regarding the different digit is negative. Before being reduced the d+ 1 partial
multiplication and arithmetic. Section III describes the product, coded in (4221), are aligned according to their
concept of existing Section IV reports the experimental decimal weights. Each p-digit column of the partial product
results. Finally, Section V concludes the paper. array is reduced to two decimal digits using one of the
decimal digit p: 2 CSA trees. The number of digits to be
II. RELATED WORK reduced for each column varies from p=d+1 to p= 2. Thus,
Vazquez A, Antelo et al [2], explain basic the d+ 1 partial product are reduced to two 2d digit operands
implementation of the decimal processor on the FPGA S and H coded in (4221).
decimal operations can be accelerated by a processor on a ppi [0] ppi [k] ppi [h]
FPGA board connected to the computer by a standard bus
without an advanced input/output interface. The largest
group of this connected-by-bus accelerators is probably
constituted by Graphics Processing Units, or GPUs. Field 3:2 3:2 3:2 3:2 3:2
programmable Gate-Arrays (FPGAs) are chips in which the FA FA FA FA FA
hardware can be programmed by the user at logic gate level
to implement any processor. By connecting FPGAs to the 3:2
CPU via a standard bus, can implement custom made 3:2 3:2 FA
accelerators at low cost. FPGA based accelerators can FA FA
exploit data parallelism and might suffer from the
communication bandwidth. 3:2 3:2
FA FA
The main drive for decimal units is the need in
X
financial transaction and accounting for correctly rounded
decimals that binary arithmetic cannot guarantee.
To overcome this loss of precision, financial applications 3:2
implement decimal arithmetic operations run 1001000 FA
times slower than the corresponding binary operations.
Alternatively, Decimal Floating-Point (DFP) can be directly X
implemented in hardware and run of magnitudes faster than
the software computation. 3:2
FA 3:2
III. RADIX-10 PARALLEL DECIMAL MULTIPLIER FA
X
X
A. SD Radix-10 Architecture X
B. Partial Product Generation The final product is a 2d-digit BCD word given by
P=2H +S. Before being added, S and H need to be
The generation of the d+1 partial product is processed. S is recoded from (4221) to BCD excess-6. The
performed by an encoding of the multiplier into d SD radix- H 2 multiplication is performed in parallel with the
10 digits. Each SD radix-10 digit controls a level of 5:1 recoding of S. This 2 blocks uses a (4221) to (5421) digit
muxes, which selects a positive multiplicand multiple (0, X, recoder and a 1-bit wired left shift to obtain the operand 2H
2X, 3X, 4X, 5X) coded in (4221). To obtain each partial coded in BCD shows in Figure 3. For the final BCD carry-
product a level of XOR gates inverts the output bits of the propagate addition uses a quaternary tree (Q-T) adder based
2
All Rights Reserved 2015 IJARTET
ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com
on conditional speculative decimal addition. It has low multiples (0X, 1X, 2X, 3X, 4X, 5X) coded in XS-3
latency and requires less hardware than other alternatives. encoding has several advantages.
4 (d+1) . . . . . 4 (d+1)
Selection of Multiples Yb
d+1 partial
Co
product
s wi,2 wi,1 wi,0 wi-1,3 Reduction
c Tree
B. Sign Digit Radix-10 Generation partial product signs are encoded into their MSDs. The
The partial product generation stage comprises the generation of the most significant partial product PP[d] is
recoding of the multiplier to a SD radix-10 described and only depends on Ysd-1.
representation, the calculation of the multiplicand multiples
in XS-3 code and the generation of the ODDS partial C. Partial Product Reduction
products. The SD radix-10 encoding produces d SD radix-
10 digits Y bk [-5, 5], with k = 0,. . . , d - 1, Yd-1 being the PPR tree consists of three parts a regular binary
MSD (most significant digit) of the multiplier shown in CSA tree to compute an estimation of the decimal partial
Figure 3. product sum in a binary carry-save form (S, C). A sum
correction block to count the carries generated between the
Y digit columns and a decimal digit 3:2 compressor which
Y
increments the carry-save sum according to the carries count
Ys to obtain the final double-word product (A,B), A being
Digit in XS-3
represented with excess-6 BCD digits and B being
[-3,12] represented with BCD digits. The PPR tree can be viewed as
Y5K . . Y1K adjacent columns of h ODDS digits each, h being the
column height see figure 4.3 and h _ d + 1.
5Xi 4Xi 3Xi 2Xi 1Xi Finally addition of digits Gi, Zi, Wzi of the
column, Gi + Zi + Wzi [0, 45]. We have designed a
decimal 3:2 digit compressor that reduces digits Wzi, Gi and
MUX-5
Zi to two digits Ai, Bi. The final BCD product by using a
single BCD carry propagate addition P = A+B, which is the
last step in the multiplication. It required that Ai + Bi [0,
18] to reduce the delay of the final BCD carry-propagate
adder operand A is obtained in excess-6, so that we compute
[Ai] = Ai + e in excess e = 6. The output digits sum [Ai] +
Bi [6,24].
D. Decimal 64 Implementation
The maximum number of carries transferred
between adjacent columns of the binary 17:2 CSA tree is 15.
ppi [k] These carries are labeled Ci+1[0]. . . ,Ci+1[14] (output
carries) and Ci[0] , . . ,Ci[14] (input carries). The binary
Fig. 4. SD radix-10 Generation of Partial Product Digit 17:2 CSA tree is built of a first level composed of a 9:2
compressor and a 8:2 compressors, and a second level
Each digit Ybk is represented with a 5-bit hot-one composed of a 4:2 compressor. To balance the delay of the
code (Y1k, Y2k, Y3k, Y4k, Y5k) to select the appropriate 17:2 CSA tree and the bit counter, m = 14 has been chosen.
multiple {1X, . . , 5X} with a 5:1 mux and a sign bit Ysk The 14-bit counter produces the 4-bit digit Wmi. The
that controls the negation of the selected multiple shown in computation of Wmi * 6 deserves a more detailed
figure 4. The negative multiples are obtained by tens description. The 4-bit digit Wmi = Wi,3 ,Wmi,2 ,Wmi,1,
complementing the positive ones. This is equivalent to Wmi,0, with Wmi,j being the bits of the digit, is
taking the nines complement of the positive multiple and conveniently represented as,
then adding 1. As we have shown in Section 2, the nines
complement can be obtained simply by bit inversion. This Wmi = Wg[0]i+1 x 2 + Wmi,0 (1)
needs the positive multiplicand multiples to be coded in XS-
3, with digits in [-3,12]. The d least significant partial 3
products PP[d-1], . . , PP[0] are generated from digits Ybk Wg[0]i+1 = Wmi, j x 2j-1 (2)
by using a set of 5:1 muxes. The xor gates at the output of j1
the mux invert the multiplicand multiple, to obtain its 9s
complement, if the SD radix-10 digit is negative (Ysk = 1). Wmi has been split into two parts, the 3 most
On the other hand, if the signals (Y1k, Y2k, Y3k, Y4k, Y5k) significant bits of Wg[0]i+1 and least-significant bit,
are all zero then PP[k] = 0, but it has to be coded in XS-3 bit Wmi,0. Then, results in
encoding 0011. The two least significant bits to 1 the input
to the XOR gate is Ysk* = Ysk Ybk is zero denotes the Wi = Wg[0]i+1 x 2 + Wmi,0 + Ci+1[14] (3)
Boolean OR operator), where Ybk is zero equals 1 if all the
signals (Y1k Y2k Y3k Y4k, Y5k) are zero. In addition, the
4
All Rights Reserved 2015 IJARTET
ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com
5
All Rights Reserved 2015 IJARTET
ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com
6
All Rights Reserved 2015 IJARTET