Low-Voltage Low-Power Adders: Unit-Iv
Low-Voltage Low-Power Adders: Unit-Iv
UNIT-IV
Standard adder cells as a basic building blocks are used in designing and fabricating of different
kinds of adder architectures.
Half Adders:
➢ The half adders are the simplest and most fundamental kind of adders.
➢ It consists of two binary operands (A&B) that have a pair of single-bits as inputs and
produces a two-bit binary number (SC) as its resultant.
➢ It is constructed using two half adders and an OR gate. There is a total of three inputs for the
full adder, two for the input numbers A and B, and one for the carry- in Cin.
➢ The outputs are the sum and carry-out.
P a g e 2 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
➢ The transistor level implementation of a conventional CMOS full-adder cell design using a
total of 32 transistors shown in below figure.
➢ Its modified version, based on CMOS transmission gates and inverters use only 20 transistors.
➢ The modified conventional CMOS full adder configuration has been widely accepted and
utilized in numerous applications; it often exhibits a critical delay that actually limits the
systems total performance.
P a g e 3 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
➢ Specifically, wherever two or more of these full adders are cascaded together to perform
multiple bit addition, the systems speed takes a hit.
➢ Therefore, it is better designed as shown in below fig.
P a g e 4 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
Fig: Full adder without XOR gates; (a) logic diagram (b) transistor diagram
➢ Above fig. shows its logic circuit and its transistor diagram.
➢ As shown in fig (a) the implementation of this full adder is realized by reusing the Cout term in
the sum terms as a common sub expression.
➢ The logic functions for this implementation are as follows.
Cout = A.B+Cin (A+B)
Sum = A.B.Cin + (A=B+Cin ).𝐶 ̅̅̅̅̅̅
𝑜𝑢𝑡
➢ Further, the full adder is simplified to transmission function adder based on transmission
function theory, which is the fundamental unit of the arithmetic unit in CMOS full adder.
P a g e 5 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
P a g e 6 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
➢ Using a power supply voltage of 3.3v the critical path delay of the 10 transistor full adder
measures at 0.086ns while in the T.F.A it measures at 0.12ns. also, with the same supply
voltage and running a clock frequency of 1ghz the 10 transistor full adder has an average
dissipation of 81µw of power, where as the T.F.A dissipates about 170µw.
P a g e 7 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
➢ Figure shows that the carry bit ripples through the chain of the cascaded full adders from a
lower bit to the next higher order.
➢ Of all the adder architectures, the RCA occupies the smallest area and offers good
performance for random input data, but it is unfavorable choice for circuits with non-random
inputs because of delay characteristics, it depends heavily on the length of carry propagation
path.
➢ Since all the full adders are connected together by the carry chain a worst-case addition will
require the carry to ripple from the position of the least significant bit to that of the most
significant bit.
➢ The worst-case delay increases linearly with the length of carry propagation path which
depends on the no. of bits processed by the operand’s “n”.
➢ However, carry propagation can be enhanced by exploiting faster logic circuit technologies
and faster full adder designs RCA is subjected to a glitching problem.
Example:
Consider a 4-Bit RCA and its static simulation is depicted is shown in figure. Here we make an
assumption that the inputs Ai are set to zero whereas Bi and Cin rise from 0 to 1.
P a g e 8 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
➢ Ideally, outputs “Si” should remain at zero. Because of the delay characteristics of the
carry signal along the chain of cascaded full adders the outputs delay spurious transitions
as shown in fig. below.
Fig: Power dissipation versus power supply voltage at various clock frequencies
➢ This is known as Glitching phenomenon. These dynamic transitions cause extra power
dissipation.
➢ As mentioned earlier, the carry propagation time for an RCA can be minimized by
utilizing various implementations of enhanced full adder architectures.
P a g e 9 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
Example:
Consider two prototypes of 32-bit RCA’s are constructed. One prototype uses a
transmission gate full adder (T.F.A), where as other prototype which uses a 10 transistor full
adder is constructed with a two transistor inverter driver (10-T.F.A).
➢ At a power supply voltage of 2.8v the critical path delay time for a 32 but RCA that uses the
T.F.A prototype is 7.2ns while it is observed to be 4.1ns for the 10-transistor full adder
prototype, thereby exhibiting a speed improvement of 44 percent over the former.
➢ For the power consumption consideration, it is observed that the 10-transistor prototype
dissipates 2.1mw which is 81 percent less than 11mw dissipated by the T.F.A prototype.
➢ Both of the 32-bit RCA’S were simulated at a supply voltage of 2.8v and a clock frequency of
125MHZ. the power consumption at different values of supply voltage and clock frequencies
is shown in figure.
➢ It is clear that the 10-transistor prototype displays enhanced power dissipation over the T.F.A
for the operation range of 2.8v to 5v.
➢ It can operate satisfactorily at frequencies up to 350MHZ at a supply voltage of 5v.
➢ This means that large architecture can be built to operate at very high frequencies without
compromising small area and low power characteristics which are the main criteria for
today’s evolving technology.
➢ Carry ripple delays grow linearly with the size of the input operand for the RCA, but these
delays can be shortened by generating the carries of each stage in parallel.
➢ It is an adder with time propagation duration in 0(logn) and whose area size requirement is
in 0(n*log n)
➢ The delay time of the CLA architecture therefore exhibits logarithmic dependency on the
size of the adder, which allows the propagation delay of the carry signal to be minimized.
➢ In the CLA, however a carry does not depend explicitly on the preceding one. It can,
however, be expressed as a function of the relevant propagate and general signals, Pi and Gi
as well as the initial carry in Cin. Therefore, the CLA comes in handy for better delay
reduction performance.
➢ In addition, the CLA consumes more area and power because of its large number of logic
gates.
P a g e 10 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
➢ For particular combinations of inputs AI and BI the propagate signal PI determines whether the
carry in to the ith block would propagate to the output, where as the generate signal G I
determines if a carry out would be set from inside the block independently from the inputs.
Gi = Ai .Bi
Pi = Ai xor Bi
➢ Carry generation occurs when Ai= Bi = 1, a carry of 1 is produced at the ith position, yet when
Ai = Bi = 0, a carry of 0 gets generated.
➢ On the other hand, carry propagation occurs when Ai ≠ Bi for some i = 0, 1, 2,3,4,5, then Cin
is said to propagate to the fifth bit position.
➢ Besides the Pi and Gi signals, the Boolean variables for the CLA adder are
Si = Pi xor Ci
Ci+1 = Gi +Pi .Ci
P a g e 11 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
➢ The logic schematic of a 4-bit carry generator and the block diagram of a 4-bit CLA are
depicted in figure b and c respectively.
➢ As seen in fig b the carry generation requires only two gate delays. This makes the addition
of two n- bit operands extremely fast as compared to the RCA.
➢ However, it costs more gates to implement this logic circuit because for large values of n, a
huge number of gates and very big fan- in gates are required.
Where k= 3, 7, 11, 15
GK* denotes group generated carry Pk* denotes group propagated carry
➢ GK* and PK* are used to generate the group carry -ins.
C4 = G3* + C0 .P3*
C8 = G7* + G3*.P7*+C0.P3*.P7*
C12 = G11*+ G7*.P11*+G3*.P7*.P11*+C0.P3*.P7*.P11*
The outputs of the look ahead carry generator C4, C8, C12 Serve as inputs to the subsequent
groups.
The operation of the 16-bit CLA has four steps.
✓ First, all the group produce bit –generate carry Gi and bit propagate carry Pi.
✓ Second, each group produces group generate carry GK* and group propagate carry PK* which
are generated in parallel.
P a g e 12 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
✓ Next the carry look ahead generator produces the group carries C4, C8, C12, which are fed
directly to group1, group2, group3, respectively.
✓ Lastly all four groups generate their individual internal carries and then the sum bits.
A Variation of basic CLA addition algorithm, namely the ELM adder will be analyzed. The
ELM addition algorithm incorporates a binary tree of simple processors running 0 (log n) time
and it is also based on the concept of carry propagate and carry generate. The fig. shows the
block diagram of 8- bit ELM adder.
P a g e 13 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
In terms of worst case delay the ELM adder exhibits the best delay performance followed by the
CLA and the RCA. This is because the ELM adder has the least number of worst-case gate
delays, whereas the RCA suffers the greatest delay due to the long carry chain. Meanwhile, the
CLA dissipates the most power when compared to its ELM and RCA counterparts.
Fig: (a) Conceptual representation. (b) CMOS Realization of the one-stage MCC
The Manchester adder uses the MCC as its carry network. The conceptual representation and
CMOS realization of a one stage MCC are depicted in fig. referring to fig(a), a one stage MCC
can be conceptually analyzed as having three switches each manipulated by controlling signals
Gi, Pi And ANi from the above equations. It is clear that at any time, only one of the three signals
Gi,Pi and ANi is at logic at 1.the carry out signal Ci-1 is connected to 0. If ANi is high or to 1 if Gi
is high, and to the incoming carry Cin, if Pi is high.
The CMOS implementation of the MCC is illustrated in fig(b), once a carry is generated, it
quickly propagates along the carry chain composed of transmission gates until it is finally
absorbed.
Buffers are usually inserted between them to partition the n bits into separate groups in order to
reduce the delay and strengthen the carry signal.
P a g e 14 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
P a g e 15 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
It can be seen that classical CSL averages a 56.9 percent performance delay reduction at the cost
of using a much larger area as compared to that of the RCA.
Hybrid adders which refer to the elementary combination of two or more design pure design
methods aim to reduce power dissipation improve cost effectiveness and achieve other
performance enhancements as well.
Its 16-bit implementation is illustrated in figure Pi , Gi , Si, and Ci+1 denote the propagate signal,
generate signal sum signal, and carry out signal for each bit i respectively where i=0,1,2,……15.
The pair of CSL adder blocks may be based on the MCC adder, which will supply the required
generate and propagate signals, gi and pi to the look ahead carry generator. The multiplexers
then select the final carry C16, and also the sum bits when the block carry in signals are known.
The RCA makes use of a row of cascaded binary F.A’S to compute the summation of two
operands. In fact, with slight modification this row of F.A’S can also be viewed as a mechanism
to reduce three binary numbers into two binary numbers in multi operand addition. This method
is used in the carry save adder where it is indeed an RCA with its carries saved rather than
propagated, therefore, the CSA operator is often called a 3:2 counter. The block diagram for
RCA and CSA are depicted in fig. below.
P a g e 16 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
Fig: Sample block diagram of (a) the RCA; (b) the CSA
A CSA tree consists of CSA operators and one adder at the root of the tree. The CSA operators
are used to transform an arbitrary number of operands in the addition process to produce two
adding operands, after which the adder at the root of the CSA tree computes the final sum.
Fig: the addition process (a) without CSA operation (b) with CSA operation
Below figure shows the addition of three 1- bit binary numbers A, B, C implemented without the
CSA operator and with the CSA operator respectively.
The 1-bit multi operand addition can be extended to the n- bit multi operand addition by
cascading the CSA operators. An n-bit CSA consists of n disjoint FA’s operating in parallel.
Each F.A has three ith bit inputs generates two outputs, namely an ith bit partial sum, S, and an ith
bit carry, C.
As for adding more than three operands, there is a second or further subsequent levels of the
CSA operators. They receive S and C from the previous CSA operator level, together with
another input operand, and produce a set of new S and C values. The levels with CSA operators
contain no carry propagation. The carries propagate only in the last step.
Below fig shows a CSA for the addition of four 4-bit binary numbers A, B, C, D with an initial
with an initial carry in C0.
P a g e 17 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
The implementation of the CSA can be further expanded to add k operands. Here, (k-2) CSA
levels and one CPA are required to realize the addition operation. The time to obtain the
summation is
T = (K-2).TCSA +TCPA
Performance Evolution:
The timing and comparison for the two operation trees, without and with CSA implementation is
illustrated in below table.
Number 8 24 40 56 64 8 24 40 56 64
of bits n
Without 3.12 8.46 13.16 17.17 19.33 802 1337 2245 3412 3873
CSA
With 2.72 8.06 12.77 15.57 18.12 364 1186 1993 2934 3341
CSA
Reduction 13 5 3 9 6 9 11 11 14 14
P a g e 18 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
The below figure depicts the technology trends of the Microprocessor (MPU) printed gate length
and power supply voltage, beginning with the year 2001 and projecting upto year 2006.
Most of the process technology studies for low voltage and low-power applications converge to
the conclusion that scaled BiCMOS/ CMOS technology will remain the dominant solution in the
future. The technology was at 95nm in 2001 and it is reduced to 65nm in 2003. It is conceivable
that once the problem in manufacturing yield is overcome, by 2016, the gate length will reduce
to 13nm. As for the power supply voltage, it was at 1.2V in 2001 and it is expected to experience
a ladder like reduction to 0.9V by 2007. In the long term, it is predicted that it will continue to
reduce to 0.6V by 2016due to probability and reliability issues.
High speed adder that uses low power consumption became a most crucial component of
processor, because it is heavily used in Arithematic Logic Unit, Floating Point Unit, and for
address generation during cache or memory access.
The relentless drive for adders with low power dissipation can be addressed at various design
levels, namely
a. Architecture level
b. Circuit level
c. Layout level
d. Device level and
e. Process Technology Level
P a g e 19 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
At the circuit level, to achieve considerable power savings, the designer can use many of
different adder types available as described below. Another potential approach is by
implementing a proper choice of logic styles for a given adder type.
CMOS logic styles can be categorized into static and dynamic logic styles. Static logic families
evaluate the output whenever there is variation in input, while the dynamic logic gates evaluate
the output only once with each clock cycle.
In contrast to the static gate design, dynamic gates are clocked and work in the precharge and
evaluation phases. The static logic eliminates the precharging phase and thus reduces the extra
power dissipation caused by clocking.
Static logic
Static logic circuits allow versatile implementation of logic functions based on static, or steady-
state, behavior of simple CMOS structures. A typical static logic gate generates its output levels
as long as the power supply is provided. This approach, however, may require a large number of
transistors to implement a function, and may have cause considerable time delay. A basic
function of static CMOS logic is explained with example of 2- input NAND gate. There is
conducting path between the output node and the ground only if input voltage VA and VB are
equal to logic high value. If one of the inputs at low logic value then there is a path between
voltage supply and output node is created i.e. except during switching, output connected to either
VDD or GND via a low resistance path.
Dynamic logic
In high density, high performance digital implementations where reduction of circuit delay and
silicon area is a major objective, dynamic logic circuits offer several significant advantages over
static logic circuits. Fig. 2, shows a generalized CMOS dynamic logic circuit. The operation of
all dynamic logic gates depends upon on temporary storage of charge in parasitic. This
operational property necessitates periodic updating of internal node voltage levels, since stored
charge in capacitor cannot retain indefinitely. Consequently, dynamic logic circuits require
periodic clock signals in order to control charge refreshing. In the following, a dynamic CMOS
circuit technique which allows us to significantly reduce the number of transistors used to
implement any logic function is introduced. The circuit based on first precharging the output
node capacitance and subsequently, evaluating the output level according to the applied inputs.
The precharge phase is setting the circuit at a predefined initial state while the actual logic
response is determined during the evaluation phase. Static CMOS offers good performance but
cannot keep up with dynamic logic styles in terms of propagation delay. The shorter delays
mostly have to be traded off for increased power dissipation.
P a g e 21 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
The serial connection of pMOS or nMOS require increased width in order to acquire a
reasonable conducting current to drive capacitive loads. This is because connecting pMOS or
Nmos devices in series can be visualized as a number of cascaded transistors. The delay time
imposed by these devices is defined by
τ = C.R
1 𝑊
α
𝑅 𝐿
Here the Channel Width is inversely proportional to R, therefore inorder to minimize the delay
time, W must be increased.
P a g e 22 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
This logic style eliminates the problem of vigilantly sizing the series transistors, there by
requiring one half as many transistors as compared to the static CMOS XOR gate. When the
output of the nMOS pass transistor network at node X is logically high, at (VDD – Vth), where
Vth is the threshold voltage, it causes a major setback by inducing an incomplete turnoff of
the pMOS in the inverter, thus resulting a high short circuit current. To restrain this current, a
pMOS device is then coupled across the output of the inverter gate in order to pull up the
output node X to full VDD
Another logic design that uses pass transistor is DPL, which is a verification of CPL. The
XOR/XNOR gate using DPL is shown below.
By using both the pMOS and nMOS devices, the DPL prevents the problem of the nMOS
threshold voltage dropping in CPL logic design.
P a g e 23 | 24
Sri. L. GuruKumar, Asst. Prof., UNIT-IV Low Power VLSI Design[E.C.E.]
The following figure shows the XOR/XNOR gate Dual-rail Domino Dynamic Logic
Contrary to the static techniques, dynamic techniques require a precharge and evaluation phase.
The precharge stage occurs when the CLK signal is at a low value, while the evaluation stage
takes place when the clock signal is at high value. Because of the precharge and evaluation
phases dynamic design abolishes all the spurious transitions and its corresponding power
consumption, which is intrinsically present in any static logic designs.
P a g e 24 | 24