Power Efficient Approximate Booth Multiplier
Power Efficient Approximate Booth Multiplier
Abstract—Power consumption is an important constraint in induced approximation by over scaling the voltage is
multimedia and deep learning applications. Approximate discussed. Fixed width multipliers produce n most significant
computing offers efficient approach to reduce power bits output for n × n inputs. Truncation and rounding are
consumption. In this paper, novel approximation is proposed for performed to produce fixed word size output introducing
radix-4 booth multiplication. Approximation is introduced in quantization error. Various techniques are applied to reduce
partial product generation and partial product accumulation the quantization error after truncation in fixed point multipliers
circuits. Radix-4 partial product generation and accumulation and can be found in literatures [9-11]. In [9], a simple
approximation is proposed which remarkably enhances the truncation and rounding technique for fixed point multiplier is
performance. The proposed approximate booth multiplier
introduced. In [10], a self-compensation fixed width multiplier
achieves 41% area reduction and 49% power reduction
with a Fast Fourier Transform application is discussed. A
compared to an exact booth multiplier. Also, it has better area,
power and error metrics compared to existing works on probabilistic estimation bias derived from probabilistic
approximate multipliers. The proposed multiplier is evaluated analysis of partial product array to reduce the truncation error
with an image processing application- in Discrete Cosine is introduced in [11].
Transform (DCT) encoding part of JPEG compression and Booth multipliers reduce number of partial products
found to perform almost similar to exact multiplication unit. almost by half and are widely used. Although truncation of
fixed point booth multipliers has received its due attention,
Keywords—approximate computing, booth multiplication,
very few works have focused on approximation of booth
error analysis, low power.
multipliers [6, 12]. In [12], approximation is introduced in
generation of radix-8 partial products of booth multipliers. In
I. INTRODUCTION case of radix-4 booth multiplier approximation, the
Due to the demand for high computational performance in approximate technique used in [6] relies on approximation by
signal processing and machine learning applications, eliminating the generation of few partial product rows and
approximate computing [1] is a potential solution considered thereby reducing the accumulation hardware.
in recent times. The idea of approximate computing is to In our paper, a novel approximation technique is
replace exact computational units with approximate ones. introduced in the generation of radix-4 recoded partial
Approximate adders and multipliers are discussed widely products. After the generation of partial products and
in literature. Approximate full adder also known as Lower part corresponding correction terms, OR gates are used for efficient
OR Adder (LOA) is discussed in [2]. Approximation in carry approximation of the generated exact and approximate
select adder is proposed in [3]. A segmentation based error recoded partial products. Although the use of OR gates is a
tolerant adder is analyzed in [4]. Multipliers are an integral popular method of approximation [2, 7], in this work, varying
part and more resources consuming operation in applications inputs OR gates are used in well-fitting positions to get better
such as digital signal processing. Extensive research is performance in terms of design and accuracy.
performed in approximation of multipliers [5-7]. Instead of In [13], approximate adders are evaluated and a metric
exact 4-2 compressors, approximate 4-2 compressors are used called normalized mean error distance (NMED) is proposed.
for dadda multiplier structure in multipliers in [5]. The effects The performance of approximate multipliers in terms of
of eliminating one or more rows in the partial product accuracy is discussed in this paper using NMED and mean
accumulation is extensively analyzed in [6], where unsigned relative error distance (MRED). MRED is the mean of relative
AND array multipliers and booth multipliers are error distances of all possible input combinations. Area, power
approximated. Approximation driven by bit significance in and delay performance of the proposed multiplier is analyzed
[7] is performed using clustering of two or more rows in the and compared with existing approximation works. The
partial product array. approximate multiplier is tested in the Discrete Cosine
Moreover, approximation of multipliers using truncation Transform (DCT) of JPEG compression of a standard image,
to produce fixed width multipliers and using Voltage Over and found that it achieves similar PSNR values as an exact
Scaling (VOS) [8] is proposed in the literature. In [8], timing multiplier.
The input B is grouped into bits {b2i+1, b2i, b2i-1} which TABLE I. BOOTH RECODING AND PARTIAL PRODUCT GENERATION
would belong to one of the encoded values of {-2, -1, 0, 1, 2}.
The encoded values are recoded to signals and based on the Partial product Pij
signals, partial products are generated. The three signal for ajaj-1
b2i+1 b2i b2i-1 negi twoi zeroi
recoding scheme proposed in [13] is used in this paper. Table 00 01 10 11
I shows three recoded signals negi, , twoi and zeroi
corresponding to {b2i+1, b2i, b2i-1} and the partial products Pij 0 0 0 0 0 1 0 0 0 0
generated for all possible values of {negi, , twoi , zeroi}and 0 0 1 0 0 0 0 0 1 1
inputs ajaj-1 .The corresponding partial product generation
circuit and related K-Map is shown in Figure 1. The generated 0 1 0 0 0 0 0 0 1 1
partial product matrix is shown in Figure 2. In figure 2, the 0 1 1 0 1 0 0 1 0 1
partial products generated after partial product encoding are 1 0 0 1 1 0 1 0 1 0
accumulated with its corresponding correction terms and sign
values. 1 0 1 1 0 0 1 1 0 0
1 1 0 1 0 0 1 1 0 0
B. Approximation in Booth multipliers 1 1 1 0 0 1 0 0 0 0
In this paper, an 8- bit approximate booth multiplier is
designed and analyzed. The proposed approximation
technique can also be applied to large multipliers. The lower
part of the partial product matrix is further divided into lower
part major (LPmajor) and lower part minor (LPminor), and
varying approximations are applied as shown in Figure 3. The
approximation techniques applied to LPmajor and LPminor are
explained in detail in the following sections.
C. LPminor approximation
Approximation is applied at both partial product
generation stage and partial product accumulation stage.
Novel method to approximate partial product generation is (a)
introduced in this paper. The K-map in Figure 1(a) is modified
as shown in Figure 4(a). As can be seen, 4 out of 32 cases are
altered and highlighted, which greatly reduces the amount of
logic circuits. The probability of error is 1/8. By intentionally
introducing few errors in partial product generation, the logic
complexity is reduced from equation (2) to equation (3). Exact
partial product generation logic equation can be given as
(b)
=~ ~ .~ +~ .
Fig. 1. Radix-4 Booth Partial product generation – (a) K-map and
+~ .~ + . (2) (b) corresponding logic cirucit.