0% found this document useful (0 votes)
38 views6 pages

High Speed Multiplication Using BCD Codes For DSP Applications

High Speed Multiplication Using BCD Codes for DSP Applications

Uploaded by

kudd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views6 pages

High Speed Multiplication Using BCD Codes For DSP Applications

High Speed Multiplication Using BCD Codes for DSP Applications

Uploaded by

kudd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

ISSN 2394-3777 (Print)

ISSN 2394-3785 (Online)


Available online at www.ijartet.com

International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)


Vol. 2, Issue 1, January 2015

High Speed Multiplication Using BCD Codes


For DSP Applications
Balasundaram1, Dr. R. Vijayabhasker2
PG Scholar, Dept. of Electronics & Communication Engineering, Anna University Regional Centre, Coimbatore,
Tamilnadu, India1
Assistant Professor, Dept. of Electronics & Communication Engineering, Anna University Regional Centre, Coimbatore,
Tamilnadu, India2

Abstract In digital systems, multiplier is the prominent deciding factor to the overall speed, area and power consumption.
The intention of this project is to improve the parallel decimal multiplication. The proposed decimal multiplier uses internally a
redundant BCD (Binary Coded Decimal) code. The overloaded BCD or ODDS (Overloaded Decimal Digit Set) representation
was proposed to improve the decimal Multi-operand addition, sequential and parallel decimal multiplications. The proposed
system goes through three main stages. First, Partial Product Generation (PPG) algorithm uses radix10 recoding that produces a
reduced number of partial products. Second, Partial Product Reduction (PPR) algorithm is used to reduce the partial products into
two 2d-digit words (A,B). Third, Non- redundant BCD conversion produces final BCD products (P=A+B). The parallel decimal
multiplier simplifies the implementation and increases the operation speed. The proposed decimal multiplier reduces the overall
multiplier area for similar target delays with respect to the fastest implementation. The high speed area efficient decimal
multiplication using CSA adder reduces the delay by 12.23% compared to existing system.

Keywords- Carry Save Adder, redundant excess-3 code, Parallel multiplication.Introduction


I. INTRODUCTION use of binary data for doing the arithmetic operations in
almost all the computer systems is the speed and simplicity
Decimal fixed-point and floating-point formats are
of binary arithmetic, efficiency in storing the binary data.
important in financial, commercial, and user-oriented
But, for the Digital Signal processing and commercial
computing. Since area and power dissipation are critical
applications, the use of decimal arithmetic is still relevant.
design factors in state-of-the-art DFPUs, multiplication and
But the speed of the operation major concern for the
division are performed iteratively by means of digit-by digit
decimal software. Moreover, the commercial databases
algorithms and therefore they present low performance.
contain more decimal data than binary data. For the purpose
Moreover, the aggressive cycle time of these processors puts
of processing, these decimal data are converted into binary
an additional constraint on the use of parallel techniques for
data. And, once the processing is completed, those are
reducing the latency of DFP multiplication in high-
again converted back into the decimal format. Cause some
performance DFPUs.
delay. Binary Coded Decimal (BCD) on-going research
which is carried out in almost everywhere using BCD with
The improvement of parallel decimal reduced area and delay.
multiplication by exploiting the redundancy of two decimal
representations: the ODDS and the redundant BCD excess-3
The main objectives to improve the performance of
(XS-3) representation, a self complementing code with the
BCD multiplication. Other objectives are given below.
digit set [-3,12]. The general redundant BCD arithmetic is
used to (that includes the ODDS, XS-3 and BCD
To avoid long carry-propagations in the generation
representations to accelerate parallel BCD multiplication in
of decimal positive multiplicand multiples.
two ways such as Partial Product Generation and Partial
To obtain the negative multiples from the
Product Reduction.
corresponding positive ones easily.
To simplify conversion of the partial products
The design of area efficient high speed with
generated in XS-3 to the ODDS representation for
reasonable power consumption The decimal multiplication
efficient partial product reduction.
is one of the most important decimal arithmetic operations
which have a growing demand in the area of commercial,
financial, and scientific computing. The reasons behind the

1
All Rights Reserved 2015 IJARTET
ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com

International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)


Vol. 2, Issue 1, January 2015

The rest of the paper is organized as follows. Section II 5:1 muxes when the sign of the corresponding SD radix-10
discusses about various literatures regarding the different digit is negative. Before being reduced the d+ 1 partial
multiplication and arithmetic. Section III describes the product, coded in (4221), are aligned according to their
concept of existing Section IV reports the experimental decimal weights. Each p-digit column of the partial product
results. Finally, Section V concludes the paper. array is reduced to two decimal digits using one of the
decimal digit p: 2 CSA trees. The number of digits to be
II. RELATED WORK reduced for each column varies from p=d+1 to p= 2. Thus,
Vazquez A, Antelo et al [2], explain basic the d+ 1 partial product are reduced to two 2d digit operands
implementation of the decimal processor on the FPGA S and H coded in (4221).
decimal operations can be accelerated by a processor on a ppi [0] ppi [k] ppi [h]
FPGA board connected to the computer by a standard bus
without an advanced input/output interface. The largest
group of this connected-by-bus accelerators is probably
constituted by Graphics Processing Units, or GPUs. Field 3:2 3:2 3:2 3:2 3:2
programmable Gate-Arrays (FPGAs) are chips in which the FA FA FA FA FA
hardware can be programmed by the user at logic gate level
to implement any processor. By connecting FPGAs to the 3:2
CPU via a standard bus, can implement custom made 3:2 3:2 FA
accelerators at low cost. FPGA based accelerators can FA FA
exploit data parallelism and might suffer from the
communication bandwidth. 3:2 3:2
FA FA
The main drive for decimal units is the need in
X
financial transaction and accounting for correctly rounded
decimals that binary arithmetic cannot guarantee.
To overcome this loss of precision, financial applications 3:2
implement decimal arithmetic operations run 1001000 FA
times slower than the corresponding binary operations.
Alternatively, Decimal Floating-Point (DFP) can be directly X
implemented in hardware and run of magnitudes faster than
the software computation. 3:2
FA 3:2
III. RADIX-10 PARALLEL DECIMAL MULTIPLIER FA
X
X
A. SD Radix-10 Architecture X

The Radix-10 architecture for d-digit BCD decimal 3:2


fixed-point parallel multiplication is based on the techniques FA
for partial product generation and reduction respectively.
The code (4221) and (5211) is used instead of BCD to X
represent the partial product is the main feature of this
architecture. This improves the reduction of decimal partial
product with respect to other proposals, in terms of latency 3:2
and area is expected. The architecture of the d-digit SD FA
radix -10 multiplier consists of the following stages,
Generation of decimal partial products coded in (4221),
reduction of partial products and a final BCD carry- H (4221) S
propagate addition. Fig. 1. Binary P:2 CSA Tree

B. Partial Product Generation The final product is a 2d-digit BCD word given by
P=2H +S. Before being added, S and H need to be
The generation of the d+1 partial product is processed. S is recoded from (4221) to BCD excess-6. The
performed by an encoding of the multiplier into d SD radix- H 2 multiplication is performed in parallel with the
10 digits. Each SD radix-10 digit controls a level of 5:1 recoding of S. This 2 blocks uses a (4221) to (5421) digit
muxes, which selects a positive multiplicand multiple (0, X, recoder and a 1-bit wired left shift to obtain the operand 2H
2X, 3X, 4X, 5X) coded in (4221). To obtain each partial coded in BCD shows in Figure 3. For the final BCD carry-
product a level of XOR gates inverts the output bits of the propagate addition uses a quaternary tree (Q-T) adder based

2
All Rights Reserved 2015 IJARTET
ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com

International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)


Vol. 2, Issue 1, January 2015

on conditional speculative decimal addition. It has low multiples (0X, 1X, 2X, 3X, 4X, 5X) coded in XS-3
latency and requires less hardware than other alternatives. encoding has several advantages.

C. Partial Product Reduction First, it is a self-complementing code the negative


multiplicand multiple can be obtained by just inverting the
The partial product arrays generated by the SD bits of the corresponding positive one. The available
radix-10 encoding each column of p digits is reduced to two redundancy allows a fast and simple generation of
digits by means of a decimal digit p:2 CSA tree shown in multiplicand multiples in a carry-free way. Finally, the
Figure 1. The decimal carries are passed between adjacent partial products can be recoded to the ODDS representation
digit columns and decimal coding method used for decimal by just adding a constant factor into the partial product
carry-save addition. reduction tree.
X Y
hi,3 hi,2 hi,1 hi,0
ai,j bi,j ci,j
Generation of Multiples SD Radix-10
Recoder
5X 4X 3X 2X 1X
BCD-4221 to BCD-5211
Recoder

4 (d+1) . . . . . 4 (d+1)
Selection of Multiples Yb

PP [0] PP[k] PP[d-1] PP[d]


L1-shifter
4(d+1) 4d

d+1 partial
Co
product
s wi,2 wi,1 wi,0 wi-1,3 Reduction
c Tree

Fig. 2. Scheme of x2 for BCD-4221 A B


h
To perform the decimal coding instead of BCD for
an efficient implementation of decimal carry-save addition BCD
with binary CSAs or full adders use (4221) and (5211). The Adder
use of these codes avoids the need for decimal corrections
and need to focus on the 2 decimal multiplication shown in
Figure 2. The Decimal p:2 CSA Trees for Digits Coded in P
(4221) Operands Long carry propagation because of that (BC
Fig. 3. Combinational SD Radix-10 Architecture
area and delay is more in this system.
The ODDS uses a similar 4-bit binary encoding as
IV. IMPLEMENTATION non-redundant BCD techniques explains binary carry-save
adders and compressor trees, can be adapted efficiently to
The algorithm and architecture of a BCD parallel perform decimal operations. A variety of redundant decimal
multiplier that exploits some properties of two different formats and arithmetic have been proposed to improve the
redundant BCD codes to speed up its computations are
performance of BCD multiplication.
redundant BCD excess-3 code (XS-3) and the overloaded The BCD carry-save format represents a radix-10
BCD representation (ODDS). Proposed techniques are
operand using a BCD digit and a carry bit at each decimal
developed to reduce significantly the latency and area of multiplication area and power dissipation are critical design
previous representative high performance implementations. factors in DFPU. Multiplication and division are performed
iteratively by means of digit-by-digit algorithms for
A. Partial Product Generation reducing the latency of DFP multiplication in high-
performance DFPUs.
Partial products are generated in parallel using a
signed-digit radix-10 recoding of the BCD multiplier with
the digit set [-5,5] and a set of positive multiplicand
3
All Rights Reserved 2015 IJARTET
ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com

International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)


Vol. 2, Issue 1, January 2015

B. Sign Digit Radix-10 Generation partial product signs are encoded into their MSDs. The
The partial product generation stage comprises the generation of the most significant partial product PP[d] is
recoding of the multiplier to a SD radix-10 described and only depends on Ysd-1.
representation, the calculation of the multiplicand multiples
in XS-3 code and the generation of the ODDS partial C. Partial Product Reduction
products. The SD radix-10 encoding produces d SD radix-
10 digits Y bk [-5, 5], with k = 0,. . . , d - 1, Yd-1 being the PPR tree consists of three parts a regular binary
MSD (most significant digit) of the multiplier shown in CSA tree to compute an estimation of the decimal partial
Figure 3. product sum in a binary carry-save form (S, C). A sum
correction block to count the carries generated between the
Y digit columns and a decimal digit 3:2 compressor which
Y
increments the carry-save sum according to the carries count
Ys to obtain the final double-word product (A,B), A being
Digit in XS-3
represented with excess-6 BCD digits and B being
[-3,12] represented with BCD digits. The PPR tree can be viewed as
Y5K . . Y1K adjacent columns of h ODDS digits each, h being the
column height see figure 4.3 and h _ d + 1.

5Xi 4Xi 3Xi 2Xi 1Xi Finally addition of digits Gi, Zi, Wzi of the
column, Gi + Zi + Wzi [0, 45]. We have designed a
decimal 3:2 digit compressor that reduces digits Wzi, Gi and
MUX-5
Zi to two digits Ai, Bi. The final BCD product by using a
single BCD carry propagate addition P = A+B, which is the
last step in the multiplication. It required that Ai + Bi [0,
18] to reduce the delay of the final BCD carry-propagate
adder operand A is obtained in excess-6, so that we compute
[Ai] = Ai + e in excess e = 6. The output digits sum [Ai] +
Bi [6,24].
D. Decimal 64 Implementation
The maximum number of carries transferred
between adjacent columns of the binary 17:2 CSA tree is 15.
ppi [k] These carries are labeled Ci+1[0]. . . ,Ci+1[14] (output
carries) and Ci[0] , . . ,Ci[14] (input carries). The binary
Fig. 4. SD radix-10 Generation of Partial Product Digit 17:2 CSA tree is built of a first level composed of a 9:2
compressor and a 8:2 compressors, and a second level
Each digit Ybk is represented with a 5-bit hot-one composed of a 4:2 compressor. To balance the delay of the
code (Y1k, Y2k, Y3k, Y4k, Y5k) to select the appropriate 17:2 CSA tree and the bit counter, m = 14 has been chosen.
multiple {1X, . . , 5X} with a 5:1 mux and a sign bit Ysk The 14-bit counter produces the 4-bit digit Wmi. The
that controls the negation of the selected multiple shown in computation of Wmi * 6 deserves a more detailed
figure 4. The negative multiples are obtained by tens description. The 4-bit digit Wmi = Wi,3 ,Wmi,2 ,Wmi,1,
complementing the positive ones. This is equivalent to Wmi,0, with Wmi,j being the bits of the digit, is
taking the nines complement of the positive multiple and conveniently represented as,
then adding 1. As we have shown in Section 2, the nines
complement can be obtained simply by bit inversion. This Wmi = Wg[0]i+1 x 2 + Wmi,0 (1)
needs the positive multiplicand multiples to be coded in XS-
3, with digits in [-3,12]. The d least significant partial 3

products PP[d-1], . . , PP[0] are generated from digits Ybk Wg[0]i+1 = Wmi, j x 2j-1 (2)
by using a set of 5:1 muxes. The xor gates at the output of j1
the mux invert the multiplicand multiple, to obtain its 9s
complement, if the SD radix-10 digit is negative (Ysk = 1). Wmi has been split into two parts, the 3 most
On the other hand, if the signals (Y1k, Y2k, Y3k, Y4k, Y5k) significant bits of Wg[0]i+1 and least-significant bit,
are all zero then PP[k] = 0, but it has to be coded in XS-3 bit Wmi,0. Then, results in
encoding 0011. The two least significant bits to 1 the input
to the XOR gate is Ysk* = Ysk  Ybk is zero denotes the Wi = Wg[0]i+1 x 2 + Wmi,0 + Ci+1[14] (3)
Boolean OR operator), where Ybk is zero equals 1 if all the
signals (Y1k Y2k Y3k Y4k, Y5k) are zero. In addition, the
4
All Rights Reserved 2015 IJARTET
ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com

International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)


Vol. 2, Issue 1, January 2015

Digit Wti is obtained by the concatenation of the


most-significant bit of Wg[0]i+1* 2 and LSB of Wg[0]i, A
row of decimal 3:2 digit compressors is used to reduce the
3-operand partial product sum (G, Z, Wz) to two BCD
operands (A, B), with A represented in excess-6.

E. Decimal 128 Implementation

The maximum height of the partial product array by


the 34 x34-digit BCD multiplier is h = 35. The optimal
value for parameter m is m = 31. Therefore, the addition of
these carries has been split into two parts. First, a 31-bit
Fig. 6. 34x34 Decimal Multiplication
counter evaluates Wmi, the 5-bit sum of the 31 fastest
carries. Then, the two slowest carries, Ci+1[31] and
Ci+1[32], are added to Wmi into a second 5-bit counter.
Decimal-128 multiplication implemented in
digits Si, Ci, Wt[0]i, Wt[1]i, are reduced to two digits Gi;Zi
Verilog HDL. Its simulation output is shown in Figure 6.
[0, 15] using a 4-bit binary 4:2 CSA. Finally, the three Let p is a partial product s and c are two decimal numbers, y
digits Gi, Zi, Wzi are reduced to two excess-6 BCD digits
is the carry counter output selection of inputs add the two
Ai and Bi by using the decimal digit 3:2 compressor shown
decimal number and produce the output.
in figure 3.
C. Performance Comparision
It reduces the overall critical path latency, area and
improving speed of parallel decimal multiplication to avoid TABLE I
long carry-propagations which Reduces the number of DECIMAL 64 DIGIT MULTIPLICATION
partial products generation.
Existing Proposed
PAR
V. SIMULATION OUTPUTS
AME
TRS No. Power Delay No. Power Delay
A. 16x16 (64) Digit Multiplication of (mw) (ns) of (mw) (ns)
LUT LUT
PPG 4 56.77 8.353 1 56.28 8.153

PPR 22 324.89 11.812 19 323.39 11.511


16*16 186 326.62 27.181 183 324.62 24.1

The existing parallel decimal multiplication and


the high speed area efficient compared in terms of area,
delay and power and its performances were tabulated in
Fig. 5. 16x16 Digit Multiplication Table 1. From the obtained results, it is clear that parallel
decimal multiplication reduced area, increased delay and
64 decimal multiplication is designed and power than the existing.
implemented in Verilog HDL. Its simulation output is TABLE II
128 DECIMAL MULTIPLICATIONS
shown in Figure 5. Let ai and bi are two decimal 4 bit
numbers, p is the carry counter output selection inputs add
the two decimal number and produce the output sum.
PAR Existing Proposed
AME
B. 34x34 (128) Digit Multiplication No. Power Delay No. Power Delay
TRS
of (mw) (ns) of (mw) (ns)
LUT LUT
PPG 31 325.39 14.511 31 323.39 11.511
PPR 44 326.62 12.864 39 324.41 12.812

34*34 193 358.93 20.283 189 325.93 12.223

5
All Rights Reserved 2015 IJARTET
ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com

International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)


Vol. 2, Issue 1, January 2015

The high speed area efficient decimal


multiplication using CSA adder reduces the delay by BIOGRAPHY
12.23% compared to existing shown in Table 2. The
existing module CSA adder uses each carry multiply by 2 Mr.S.Balasundaram Received BE
module, it requires additional hardware and increases the Degree in (ECE) From Sri Nandhanam
power. The proposed method to speedup the operation by College of Engineering and
reduce the number of partial product. Technology, Tirupattur in 2011.
Currently He is Doing ME (VLSI
VI. CONCLUSION Design) in Anna University Regional
Centre, Coimbatore. His General Area
The high speed and area efficient decimal of Interest are VLSI Design, VLSI
multiplier using CSA. The existing and proposed Testing and Verification.
implemented and their results were compared. From the
obtained results, it is clear that the proposed decimal
performs in terms of reduced area and delay because of arry
save adder. Compared to the conventional method, the Dr.R.Vijayabhasker completed U.G. in
proposed method reduces the delay by 12.23%. EEE, PG in Power Electronics and
Implementing this high speed area efficient binary coded Drives, Ph.D in Electrical Engg.
decimal in Finite Impulse Response (FIR) filter can be Working as Assistant Professor, ECE
considered, which is extensively used as high speed DSP Dept. Anna University Regional Centre.
application and can be implement in FPGA. Fields of Interest are VLSI Design,
VLSI Signal Processing, DSP. He has
REFERENCES published his works in various Annexure
I and II journals.
[1]. Alvaro Vazques, Elisardo Antelo and Javier Bruguera (2014),
Fast Radix-10 Multiplication Using Redundant BCD Codes,
IEEE Vol. 63, No.8, pp.325-338.

[2]. Carlough S and Schwarz E (2007), Power Six Decimal Divide,


Proc. 18th IEEE Symp. on Application-Specific Systems,
Architectures, and Processors, Vol. 89, No. 8, pp. 28133.

[3]. Dadda L (2007), Multioperand and Parallel Decimal Adder: A


Mixed Binary and BCD Approach, IEEE Transactions on
Computers, Vol. 56, No. 10, pp. 13201328.

[4]. Dadda L and Nannarelli A (2008), A Variant of a Radix-10


Combinational Multiplier, IEEE Int. Symposium in Circuits
and Systems, ISCAS 2008, Vol. 37, No. 2, pp. 33703373.

[5]. Erle M.A, Schwarz E.M and M. J. Schulte (2005), Decimal


Multiplication With Efficient Partial Product Generation, Proc.
17th IEEE Symposium on Computer Arithmetic, Vol. 73, No. 4,
pp. 2128.

[6]. Erle M.A and M. J. Schulte (2003), Decimal Multiplication


Via Carry-Save Addition, Proc. IEEE Int. Conf. on
Application-Specific Systems, Architectures, and Processors,
Vol. 51, No.7, pp. 348358.

[7]. Gorgin S and Jaberipur G (2013), High Speed Parallel Decimal


Multiplication with Redundant Internal Encodings, IEEE
Transactions on Computers, Vol.45, No 160, pp. 232-249.
[8]. Han L and Ko S (2013), High Speed Parallel Decimal
Multiplication with Redundant Internal Encodings, IEEE
Transactions on Computers, Vol. 62, No. 5 pp. 956968.
[9]. G.Jaberipur, and A. Kaivani (2009), Improving the Speed of
Parallel Decimal Multiplication, IEEE Transactions on
Computers, Vol. 58, No. 11, pp.3952.
[10]. Vazquez A,Antelo E and Montuschi P (2010), Improved
Design of High-Performance Parallel Decimal Multipliers,
IEEE Transactions on Computers, Vol. 59, No. 5, pp. 679693.

6
All Rights Reserved 2015 IJARTET

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy