0% found this document useful (0 votes)
128 views4 pages

Power Efficient Approximate Booth Multiplier

The document proposes a novel approximation technique for radix-4 booth multiplication to reduce power consumption. The approximation is introduced in the partial product generation and accumulation circuits. Specifically, it proposes approximating the generation of radix-4 recoded partial products by using OR gates on the exact and approximate recoded partial products. It accumulates the approximate partial products column-wise using varying input OR gates based on the column size. Evaluation shows the approximate booth multiplier achieves 41% area reduction and 49% power reduction compared to an exact booth multiplier, with better metrics than other approximate multipliers. It performs similar to an exact multiplier when tested in a discrete cosine transform application.

Uploaded by

sohan kamble
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views4 pages

Power Efficient Approximate Booth Multiplier

The document proposes a novel approximation technique for radix-4 booth multiplication to reduce power consumption. The approximation is introduced in the partial product generation and accumulation circuits. Specifically, it proposes approximating the generation of radix-4 recoded partial products by using OR gates on the exact and approximate recoded partial products. It accumulates the approximate partial products column-wise using varying input OR gates based on the column size. Evaluation shows the approximate booth multiplier achieves 41% area reduction and 49% power reduction compared to an exact booth multiplier, with better metrics than other approximate multipliers. It performs similar to an exact multiplier when tested in a discrete cosine transform application.

Uploaded by

sohan kamble
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Power Efficient Approximate Booth Multiplier

Suganthi Venkatachalam Hyuk Jae Lee Seok-Bum Ko


Department of Electrical and Department of Electrical and Department of Electrical and
Computer Engineering, Computer Engineering, Computer Engineering,
University of Saskatchewan, Seoul National University, University of Saskatchewan,
Canada. Korea. Canada.
suganthi.venkat@usask.ca hjlee@capp.snu.ac.kr seokbum.ko@usask.ca

Abstract—Power consumption is an important constraint in induced approximation by over scaling the voltage is
multimedia and deep learning applications. Approximate discussed. Fixed width multipliers produce n most significant
computing offers efficient approach to reduce power bits output for n × n inputs. Truncation and rounding are
consumption. In this paper, novel approximation is proposed for performed to produce fixed word size output introducing
radix-4 booth multiplication. Approximation is introduced in quantization error. Various techniques are applied to reduce
partial product generation and partial product accumulation the quantization error after truncation in fixed point multipliers
circuits. Radix-4 partial product generation and accumulation and can be found in literatures [9-11]. In [9], a simple
approximation is proposed which remarkably enhances the truncation and rounding technique for fixed point multiplier is
performance. The proposed approximate booth multiplier
introduced. In [10], a self-compensation fixed width multiplier
achieves 41% area reduction and 49% power reduction
with a Fast Fourier Transform application is discussed. A
compared to an exact booth multiplier. Also, it has better area,
power and error metrics compared to existing works on probabilistic estimation bias derived from probabilistic
approximate multipliers. The proposed multiplier is evaluated analysis of partial product array to reduce the truncation error
with an image processing application- in Discrete Cosine is introduced in [11].
Transform (DCT) encoding part of JPEG compression and Booth multipliers reduce number of partial products
found to perform almost similar to exact multiplication unit. almost by half and are widely used. Although truncation of
fixed point booth multipliers has received its due attention,
Keywords—approximate computing, booth multiplication,
very few works have focused on approximation of booth
error analysis, low power.
multipliers [6, 12]. In [12], approximation is introduced in
generation of radix-8 partial products of booth multipliers. In
I. INTRODUCTION case of radix-4 booth multiplier approximation, the
Due to the demand for high computational performance in approximate technique used in [6] relies on approximation by
signal processing and machine learning applications, eliminating the generation of few partial product rows and
approximate computing [1] is a potential solution considered thereby reducing the accumulation hardware.
in recent times. The idea of approximate computing is to In our paper, a novel approximation technique is
replace exact computational units with approximate ones. introduced in the generation of radix-4 recoded partial
Approximate adders and multipliers are discussed widely products. After the generation of partial products and
in literature. Approximate full adder also known as Lower part corresponding correction terms, OR gates are used for efficient
OR Adder (LOA) is discussed in [2]. Approximation in carry approximation of the generated exact and approximate
select adder is proposed in [3]. A segmentation based error recoded partial products. Although the use of OR gates is a
tolerant adder is analyzed in [4]. Multipliers are an integral popular method of approximation [2, 7], in this work, varying
part and more resources consuming operation in applications inputs OR gates are used in well-fitting positions to get better
such as digital signal processing. Extensive research is performance in terms of design and accuracy.
performed in approximation of multipliers [5-7]. Instead of In [13], approximate adders are evaluated and a metric
exact 4-2 compressors, approximate 4-2 compressors are used called normalized mean error distance (NMED) is proposed.
for dadda multiplier structure in multipliers in [5]. The effects The performance of approximate multipliers in terms of
of eliminating one or more rows in the partial product accuracy is discussed in this paper using NMED and mean
accumulation is extensively analyzed in [6], where unsigned relative error distance (MRED). MRED is the mean of relative
AND array multipliers and booth multipliers are error distances of all possible input combinations. Area, power
approximated. Approximation driven by bit significance in and delay performance of the proposed multiplier is analyzed
[7] is performed using clustering of two or more rows in the and compared with existing approximation works. The
partial product array. approximate multiplier is tested in the Discrete Cosine
Moreover, approximation of multipliers using truncation Transform (DCT) of JPEG compression of a standard image,
to produce fixed width multipliers and using Voltage Over and found that it achieves similar PSNR values as an exact
Scaling (VOS) [8] is proposed in the literature. In [8], timing multiplier.

978-1-5386-4881-0/18/$31.00 ©2018 IEEE


The rest of the paper is organized as follows. Section II After approximation, the logic equation is reduced as in
discusses radix 4 booth multiplier and approximation methods (3).
applied to the booth multiplier. Section III covers results and
discussion. In section III, the proposed multiplier is compared =~ . + .~ . (3)
with exact multiplier and existing approximate multipliers in After novel approximation in generation, the generated
terms of area, power, NMED and MRED. In section IV, values are accumulated using OR gates as in Figure 3. Row-
approximated booth multipliers are applied in DCT wise clustering using OR gates is discussed in [7]. Similar
application of image multiplication and tested. Conclusion and technique is applied to accumulate the partial products. After
future works are discussed in section V. approximated partial product generation in LPminor, the
approximate partial products are accumulated column-wise
II. APPROXIMATION IN RADIX-4 BOOTH MULTIPLIER using varying inputs OR gate, based on the number of
elements in each column.
A. Radix-4 Booth encoding and partial product generation
Consider two n-bits signed inputs A and B, which produce D. LPmajor approximation
signed 2n bits output Pout. The 2’s complement A, B and Pout An approximation is proposed, where the correction term
can be given as is ORed with least partial product of its associated row as in
Figure 3. The main purpose of this approximation is to
= − 2 +∑ 2
maintain the alignment which considerably enhances the delay
= − 2 +∑ 2 performance of the LPmajor. Exact half-adder, full-adder and 4-
2 compressors are used to accumulate and produce sum and
carry rows. Sum and carry in LPmajor are added using LOA of
= − 2 +∑ 2 (1) [2].

The input B is grouped into bits {b2i+1, b2i, b2i-1} which TABLE I. BOOTH RECODING AND PARTIAL PRODUCT GENERATION
would belong to one of the encoded values of {-2, -1, 0, 1, 2}.
The encoded values are recoded to signals and based on the Partial product Pij
signals, partial products are generated. The three signal for ajaj-1
b2i+1 b2i b2i-1 negi twoi zeroi
recoding scheme proposed in [13] is used in this paper. Table 00 01 10 11
I shows three recoded signals negi, , twoi and zeroi
corresponding to {b2i+1, b2i, b2i-1} and the partial products Pij 0 0 0 0 0 1 0 0 0 0
generated for all possible values of {negi, , twoi , zeroi}and 0 0 1 0 0 0 0 0 1 1
inputs ajaj-1 .The corresponding partial product generation
circuit and related K-Map is shown in Figure 1. The generated 0 1 0 0 0 0 0 0 1 1
partial product matrix is shown in Figure 2. In figure 2, the 0 1 1 0 1 0 0 1 0 1
partial products generated after partial product encoding are 1 0 0 1 1 0 1 0 1 0
accumulated with its corresponding correction terms and sign
values. 1 0 1 1 0 0 1 1 0 0
1 1 0 1 0 0 1 1 0 0
B. Approximation in Booth multipliers 1 1 1 0 0 1 0 0 0 0
In this paper, an 8- bit approximate booth multiplier is
designed and analyzed. The proposed approximation
technique can also be applied to large multipliers. The lower
part of the partial product matrix is further divided into lower
part major (LPmajor) and lower part minor (LPminor), and
varying approximations are applied as shown in Figure 3. The
approximation techniques applied to LPmajor and LPminor are
explained in detail in the following sections.

C. LPminor approximation
Approximation is applied at both partial product
generation stage and partial product accumulation stage.
Novel method to approximate partial product generation is (a)
introduced in this paper. The K-map in Figure 1(a) is modified
as shown in Figure 4(a). As can be seen, 4 out of 32 cases are
altered and highlighted, which greatly reduces the amount of
logic circuits. The probability of error is 1/8. By intentionally
introducing few errors in partial product generation, the logic
complexity is reduced from equation (2) to equation (3). Exact
partial product generation logic equation can be given as
(b)
=~ ~ .~ +~ .
Fig. 1. Radix-4 Booth Partial product generation – (a) K-map and
+~ .~ + . (2) (b) corresponding logic cirucit.

978-1-5386-4881-0/18/$31.00 ©2018 IEEE


s0 s0 s0 p07 p06 p05 p04 p03 p02 p01 p00 given in Table III. MRED and NMED of the approximate
1 s1 p17 p16 p15 p14 p13 p12 p11 p10 cor0 units are found by exhaustive analysis of all possible 65536
1 s2 p27 p26 p25 p24 p23 p22 p21 p20 cor1
inputs using MATLAB.
s2 p37 p36 p35 p34 p33 p32 p31 p30 cor2
cor3 From Table II, the area and power gain of the approximate
fs15 fs14 fs13 fs12 fs11 fs10 fs9 fs8 fs7 fs6 fs5 fs4 fs3 fs2 fs1 fs0 multipliers over exact multiplier can be seen. The proposed
fc14 fc13 fc12 fc11 fc10 fc9 fc8 fc7 fc6 fc5 fc4 fc3
multiplier has 41% reduction in area and 49% reduction in
power compared to an exact multiplier. The proposed
multiplier has higher area and power gain than other existing
Fig. 2. Generated partial product matrix. approximate multipliers. When compared to ACBM [5], the
proposed multiplier has 23% and 26% reduction in area and
power respectively. The proposed multiplier has an
improvement of 8% and 22% area and power in comparison
LPmajor LPminor
N-input OR
s0 s0 s0 p07 p06 p05 p04 p03 p02 p01 p00
to PPBM [6].
Approximate
HA 1 s1 p17 p16 p15 p14 p13 p12 p11 p10 cor0
Table III shows MRED and NMED figures with Area
Power Product values (APP). The proposed multiplier exhibits
1 s2 p27 p26 p25 p24 p23 p22 p21 p20 cor1
1 s2 p37 p36 p35 p34 p33 p32 p31 p30|cor3 cor2

fs15 fs14 fs13 fs12 fs11 fs10 fs9 fs8 fs7 P6 P5 P4 P3 P2 P1 P0


better accuracy with one order of magnitude better MRED and
fc14 fc13 fc12 fc11 fc10 fc9 fc8 fc7 slightly better NMED in comparison with ACBM [5]. With
P15 P14 P13 P12 P11 P10 P9 P8 P7
relation to PPBM [6], the proposed one has two order of
magnitude lower MRED and one order lower NMED and
shows better performance. Compared to exact multiplier, the
Fig. 3. Approximated partial product matrix.
proposed multiplier has 70.22% improvement in APP, while
ACBM and PPBM has APP improvement of 47.44% and
III. RESULTS AND DISCUSSION 58.71% respectively. Hence the proposed multiplier has
overall great MRED and NMED performance, with the lowest
The exact booth multiplier, our proposed approximate
APP, compared to an exact multiplier and existing
multiplier and previous works [5, 6] are designed in Verilog
approximate multipliers.
for n = 8. They are implemented in TSMC 65 nm library in the
typical process environment using Synopsys design compiler.
Inexact 4-2 compressor in design 2 of [5] are applied to least TABLE II. COMPARISON OF APPROXIMATE MULTIPLIER WITH EXACT
significant 8 columns of the exact booth multiplier to design MULTIPLIER AND OTHER WORKS IN TERMS OF AREA AND POWER
an Approximate Compressor Booth Multiplier (ACBM). By Type of Area Power
partial product perforation technique proposed in [6], the Area(µm2) Power(mW) Gain Gain
approximation is executed in exact booth multiplier by multiplier (%) (%)
eliminating the partial product generation and accumulation of
Exact 814.7 0.223 - -
first row of the partial product matrix. However, the correction
term ‘cor0’ is retained in second row of the partial product multiplier
matrix and forms a Partial Perforated Booth Multiplier Proposed 478.8 0.113 0.41 0.49
(PPBM). The performance data of the exact, proposed and multiplier
existing inexact multipliers in terms of design and accuracy ACBM 625.3 0.153 0.23 0.32
are given in Tables II and III. [5]
PPBM 517.7 0.145 0.36 0.35
[6]

TABLE III. MRED AND NMED FIGURES OF APPROXIMATE


MULTIPLIERS

Type of the MRED NMED APP (µm2


multiplier mW).
Exact - - 181.68
multiplier
Proposed 6.75x10-2 3.53x10-3 54.10
(a) multiplier
ACBM [5] 1.34x10-1 3.72x10-3 95.49
~ aj PPBM [6] 1.86 6.25x10-2 75.01
negi
pij
aj
~ negi
~ zero i IV. IMPACT OF APPROXIMATION IN JPEG
(b) Joint Photographic Experts Group is a popular lossy
standard used for image compression. DCT [15] is a
Fig. 4. Approximated Radix-4 Booth Partial product generation – (a) K-
map and (b) corresponding logic cirucit.
mathematical transformation involving matrix multiplication
which is an integral process in signal processing applications
Table II compares approximate multipliers with an exact like JPEG. Single matrix multiplication operation of 2 input
multiplier in terms of area and power with a uniform delay set blocks of 8 × 8 matrices requires 512 multiplications. Matrix
at 0.5 ns. Error performance of approximate multipliers is multiplication being the resources consuming operation, using

978-1-5386-4881-0/18/$31.00 ©2018 IEEE


approximate circuits to perform such operations would ACKNOWLEDGMENT
drastically reduce resources consumption, while taking This work was supported by the Natural Sciences and
advantage of the tolerance to imprecise data. Engineering Research Council of Canada (NSERC) and
To analyze the impact the approximation in a real life Department of Electrical Engineering, University of
application, matrix multiplication blocks using approximate Saskatchewan.
multipliers are designed and used in forward DCT transform This work was also supported by the Korean Federation
of the JPEG encoding application. Standard Lena image is of Science and Technology Societies (KOFST) grant funded
taken as a benchmark. Peak Signal to Nosie ratio (PSNR) is by the Korean government (MSIP: Ministry of Science, ICT
used to evaluate the quality of the images. PSNR is more and Future Planning)
consistent with human visual system and based on mean
square error (MSE) of the original image and compressed
image. PSNR can be given as REFERENCES
[1] J. Han and M. Orshansky, "Approximate computing: An emerging
= 10 (4) paradigm for energy-efficient design," 2013 18th IEEE European Test
Symposium (ETS), Avignon, 2013, pp. 1-6.
where 255 is the maximum value, a pixel can hold in the [2] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie and C. Lucas, "Bio-
grayscale image. Inspired Imprecise Computational Blocks for Efficient VLSI
Implementation of Soft-Computing Applications," in IEEE
Figure 5 shows the original image and images after JPEG Transactions on Circuits and Systems I: Regular Papers, vol. 57, no. 4,
compression using exact and approximate multipliers. The pp. 850-862, April 2010.
PSNR values after using exact and approximate multipliers- [3] K. Du, P. Varman and K. Mohanram, "High performance reliable
variable latency carry select addition," 2012 Design, Automation &
the proposed, ACBM and PPBM in JPEG application are Test in Europe Conference & Exhibition (DATE), Dresden, 2012, pp.
31.81, 31.77, 31.69 and 18.51 dB respectively. It is found that 1257-1262.
the proposed multiplier and ACBM shows PSNR almost [4] Ning Zhu, W. L. Goh and K. S. Yeo, "An enhanced low-power high-
similar to the exact multiplier. It can be suggested that speed Adder For Error-Tolerant application," Proceedings of the 2009
approximate multiplier can be used in applications such as 12th International Symposium on IC, Singapore, 2009, pp. 69-72.
image processing to significantly reduce the power with [5] A. Momeni, J. Han, P. Montuschi and F. Lombardi, "Design and
minimal loss in quality. Analysis of Approximate Compressors for Multiplication," in IEEE
Transactions on Computers, vol. 64, no. 4, pp. 984-994, April 2015.
[6] G. Zervakis, K. Tsoumanis, S. Xydis, D. Soudris and K. Pekmestzi,
"Design-Efficient Approximate Multiplication Circuits Through Partial
Product Perforation," in IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 24, no. 10, pp. 3105-3117, Oct. 2016.
[7] I. Qiqieh, R. Shafik, G. Tarawneh, D. Sokolov and A. Yakovlev,
(a) (b) (c) (d) (e) "Energy-efficient approximate multiplier design using bit significance-
driven logic compression," Design, Automation & Test in Europe
Conference & Exhibition (DATE), 2017, Lausanne, 2017, pp. 7-12.
Fig. 5. JPEG Image compression. (a) Original input image. JPEG images
using (b) exact multiplier with a PSNR of 31.81 dB (c) proposed multiplier [8] R. Venkatesan, A. Agarwal, K. Roy and A. Raghunathan, "MACACO:
with a PSNR of 31.77 (d) ACBM with a PSNR of 31.69 dB (e) PPBM with Modeling and analysis of circuits for approximate computing," 2011
a PSNR of 18.51. IEEE/ACM International Conference on Computer-Aided Design
(ICCAD), San Jose, CA, 2011, pp. 667-673.
[9] M. J. Schulte and E. E. Swartzlander, "Truncated multiplication with
V. CONCLUSIONS AND FUTURE WORK correction constant [for DSP]," Proceedings of IEEE Workshop on
VLSI Signal Processing, Veldhoven, 1993, pp. 388-396 .
With significant area and power improvement compared [10] Hong-An Huang, Yen-Chin Liao and Hsie-Chia Chang, "A self-
to equivalent exact multiplier, a minimal error approximate compensation fixed-width booth multiplier and its 128-point FFT
booth multiplier was investigated. A novel radix-4 applications," 2006 IEEE International Symposium on Circuits and
approximation is introduced. Partial product generation and Systems, Island of Kos, 2006, pp. 4 pp.-3541.
accumulation circuits are approximated. Partial product [11] C. Y. Li, Y. H. Chen, T. Y. Chang and J. N. Chen, "A Probabilistic
generation circuit in booth multiplier is an important and Estimation Bias Circuit for Fixed-Width Booth Multiplier and Its DCT
Applications," in IEEE Transactions on Circuits and Systems II:
power consuming process. The exact partial product Express Briefs, vol. 58, no. 4, pp. 215-219, April 2011.
generation circuit is modified by introducing an error
[12] H. Jiang, J. Han, F. Qiao and F. Lombardi, "Approximate Radix-8
probability of 0.125 and reducing the logic complexity, Booth Multipliers for Low-Power and High-Performance Operation,"
resulting in high performance circuit. In partial products in IEEE Transactions on Computers, vol. 65, no. 8, pp. 2638-2644,
accumulation, OR gates are efficiently used to reduce area and Aug. 1 2016.
power while also maintaining reasonable accuracy. [13] J. Liang, J. Han and F. Lombardi, "New Metrics for the Reliability of
Approximate and Probabilistic Adders," in IEEE Transactions on
The proposed model has significant area and power Computers, vol. 62, no. 9, pp. 1760-1771, Sept. 2013.
improvement compared with an exact multiplier. The design [14] E. de Angel and E. E. Swartzlander, "Low power parallel
requires small area and power and has better error metrics multipliers," VLSI Signal Processing, IX, San Francisco, CA, 1996, pp.
compared to the equivalent state of the art works. A real life 199-208.
application is chosen to discover the effect of approximation. [15] N. Ahmed, T. Natarajan and K. R. Rao, "Discrete Cosine Transform,"
In future, the approximate multipliers are to be tested with in IEEE Transactions on Computers, vol. C-23, no. 1, pp. 90-93, Jan.
1974.
more real-life application scenarios and error compensation
block is to be designed to further improve the accuracy of the
proposed multiplier.

978-1-5386-4881-0/18/$31.00 ©2018 IEEE

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy