EC8095 NOTES STANN - by WWW - Easyengineering.net 1a
EC8095 NOTES STANN - by WWW - Easyengineering.net 1a
net
ww
w.E
asy
E ngi
nee
rin
g.n
et
MOS Transistor, CMOS logic, Inverter, Pass Transistor, Transmission gate, Layout Design Rules,
Gate Layouts, Stick Diagrams, Long-Channel I-V Characteristics, C-V Characteristics, Non ideal I-V
Effects, DC Transfer characteristics, RC Delay Model, Elmore Delay, Linear Delay Model, Logical
effort, Parasitic Delay, Delay in Logic Gate, Scaling.
INTRODUCTION: (VLSI)
In 1958, Jack Kilby built the first integrated circuit flip-flop at Texas Instruments.
Bell Labs developed the bipolar junction transistor. Bipolar transistors were more reliable,
less noisy and more power-efficient.
ww
In 1960s, Metal Oxide Semiconductor Field Effect Transistors (MOSFETs) began to enter
in the production.
MOSFETs offer the compelling advantage that; they draw almost zero control current
while idle.
w.E
They come in two flavors: nMOS and pMOS, using n-type and p-type silicon respectively.
asy
In 1963, Frank Wanlass at Fairchild described the first logic gates using MOSFETs.
Fairchild’s gates used both nMOS and pMOS transistors, naming as Complementary
Metal Oxide Semiconductor (CMOS).
En
Power consumption became a major issue in the 1980s as hundreds of thousands of
nee
CMOS processes were widely adopted and replaced nMOS and bipolar processes for all
In 1965, Gordon Moore observed that plotting the number of transistors that can be most
rin
economically manufactured on a chip gives a straight line on a semi logarithmic scale.
g.n
e t
ww
Very large scale Integration:
Very large scale Integration (VLSI) with gates counting up to lakhs. Example: 16-bit
microprocessor (8086).
w.E
The feature size of a CMOS manufacturing process refers to the minimum dimension of a
transistor that can be reliably built.
Ultra large scale Integration:
asy
Ultra Large-Scale Integration (ULSI) is the process of integrating millions of transistors on a
single silicon semiconductor microchip.
En
**********************************************************************************
MOS transistor
gi nee
Explain the basic concept of nMOS and pMOS transistor with relevant symbol.
A Metal-Oxide-Semiconductor (MOS) structure is created by superimposing layers of
conducting and insulating materials.
rin
type transistor (pMOS).
g.n
CMOS technology provides two types of transistors. They are n-type transistor (nMOS) and p-
The transistor consists of a stack of the conducting gate, an insulating layer of silicon dioxide
(SiO2) and the silicon wafer, also called as substrate, body or bulk.
t
A pMOS transistor consists of p-type source and drain region with an n-type body.
An nMOS transistor consists of n-type source and drain region with a p-type body.
e
The MOS transistor is a majority-carrier device, in which the current in a conducting channel
is controlled by gate voltage. t
In an nMOS transistor, the majority carriers are electrons.
In a pMOS transistor, the majority carriers are holes.
Figure 2 shows a simple MOS structure. The top layer of the structure is a good conductor
called the gate.
Transistor gate is polysilicon, i.e., silicon formed from many small crystals. The middle layer
is a very thin insulating film of SiO2, called the gate oxide. The bottom layer is the doped
silicon body.
The figure 2 shows a p-type body, in which the carriers are holes. The body is grounded and
voltage is applied to the gate.
The gate oxide is a good insulator, so almost zero current flows from the gate to the body.
Inversion layer:
In Figure 2(c), when a higher positive potential greater than threshold voltage (Vt) is applied,
more positive charges are attracted to the gate.
ww
The holes are repelled and some free electrons in the body are attracted to the region under the
gate. This conductive layer of electrons in the p-type body is called the inversion layer.
w.E
The threshold voltage depends on the number of dopants in the body and the thickness tox of
the oxide.
asy
En
gi nee
rin
g.n
e t
Figure 2: MOS structure demonstrating (a) accumulation, (b) depletion, and (c) inversion layer
Draw the small signal model of device during cut-off, linear and saturation. (April 2018)
Discuss the cutoff, linear and saturation region operation of MOS transistor. (Nov 2009)
The MOS transistor operates in cutoff region, linear region and saturation region.
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
Cutoff region:
In Figure 3(a), the gate-to-source voltage (Vgs) is less than the threshold voltage (Vt) and
source is grounded.
Junctions between the body and the source or drain are reverse biased, so no current flows.
Thus, the transistor is said to be OFF and this mode of operation is called cutoff.
If Vgs < Vt , the transistor is cutoff (OFF).
Linear Region:
In Figure 3(b), the gate voltage is greater than the threshold voltage.
An inversion region of electrons, called the channel connects the source and drain, creating a
conductive path and making the transistor ON.
If Vgs > Vt , the transistor turns ON. If Vds is small, the transistor acts as a linear resistor, in
which the current flow is proportional to Vds.
The number of carriers and the conductivity increases, with the gate voltage.
ww
w.E
asy
En
gi nee
rin
g.n
e t
Figure 3: nMOS transistor demonstrating cutoff, linear, and saturation regions of operation
The voltage between drain and source is Vds = Vgs - Vgd. If Vds = 0 (i.e., Vgs = Vgd), there is
no electric field to push current from drain to source.
When a small positive voltage Vds is applied to the drain (Figure 3(c)), current Ids flows
through the channel from drain to source.
This mode of operation is termed as linear, resistive, triode, nonsaturated, or
unsaturated.
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
Saturation
region:
The current increases with increase in both the drain voltage and gate voltage.
If Vds becomes sufficiently large that Vgd < Vt , the channel is no longer inverted near the
drain and becomes pinched off (Figure 3(d)).
As electrons reach the end of the channel, they are injected into the depletion region near the
drain and accelerated toward the drain.
Above this drain voltage, current Ids are controlled only by the gate voltage. This mode is
called saturation.
If Vgs > Vt and Vds is large, the transistor acts as a current source, in which the current
flow becomes independent of Vds.
ww
The pMOS transistor in Figure 4 operates in just the opposite fashion. The n-type body is tied
to high potential, junctions of p-type source and drains are normally reverse-biased.
When the gate has high potential, no current flows between drain and source.
w.E
When the gate voltage is lowered by a threshold Vt, holes are attracted to form a p-type
channel beneath the gate, allowing current to flow between drain and source.
asy
En
gi nee
Figure 4: pMOS transistor
rin
g.n
*****************************************************************************
IDEAL I-V CHARACTERISTICS OF MOS TRANSISTOR
Derive an expression to show the drain current of MOS for various operating region.
e
Explain one non-ideality for each operating region that changes the drain current. (NOV t
2018)
Explain the dynamic behavior of MOSFET transistor with neat diagram. (April 2018)
Explain the electrical properties CMOS. (Nov 2017)
Explain in detail about the ideal I-V characteristics of a NMOS and PMOS device. (MAY
2013)
Discuss in detail with necessary equations the operation of MOSFET and its current-voltage
characteristics. (April/May 2011, May 2016).
Derive drain current of MOS device in different operating regions. (Nov/Dec
2014)(May/June 2013) (Nov 2012, Nov 2016)
Explain in detail about the ideal I-V characteristics and non-ideal I-V characteristics of a
NMOS and PMOS device. (May/June 2013)
Derive expressions for the drain-to-source current in the nonsaturated and saturated regions
of operation of an nMOS transistor. (Nov 2007, Nov 2008)
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
MOS transistor has three regions of operation:
Cutoff or sub threshold region
Linear region
Saturation region
The current through an OFF transistor is zero. When a transistor turns ON (Vgs > Vt), the gate
attract electrons to form a channel.
Current is measured from the amount of charge in the channel.
The charge on each plate of a capacitor is Q = CV. Thus, the charge in the channel Qchannel is
Qchannel = Cg (Vgc - Vt)
where Cg : Capacitance of the gate to the channel
Vgc - Vt : Amount of voltage attracting charge to the channel.
If the source is at Vs and the drain is at Vd ,
ww
Average channel voltage is Vc = (Vs + Vd)/2 = Vs + Vds /2.
Gate and channel voltage Vgc is Vg – Vc = Vgs – Vds /2,
w.E
asy
En
gi
Figure 5: Average gate to channel voltage
nee
If the gate has length L and width W and the oxide thickness is tox, as shown in Figure 6,
Then the capacitance Cg is
C WL WL c WL rin
g ox o t
ox
ox t
ox
ox
----------(1)
The ox/tox term is called as Cox. Capacitance (Cox) is a per unit area of the gate oxide.
e t
Average velocity (v) of carrier is proportional to the lateral electric field (field between source and
drain). The constant of proportionality µ is called the mobility.
v = µE --------------(2) Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
The electric field E is the voltage difference between drain and source (Vds) divided by the
channel length (L).
V
E -------------(3)
ds
L
The time required for carriers to cross the channel is L divided by v.
The current between source and drain is the total amount of charge in the channel divided by the
time required to cross.
ww
w.E ----------- (4)
Equation (4) is called linear or resistive, because when Vds << VGT, Ids increases linearly with
Vds, like an ideal resistor.
k’ is the k prime, k’ = µ Cox. asy
En
If Vds > Vdsat = VGT, the channel is no longer inverted in the drain region. Channel is pinched
off.
further effect on current. gi
Beyond this point (called the drain saturation voltage), increasing the drain voltage has no
nee
Substituting Vds = Vdsat in Eq (4), we can find an expression for the saturation current (Ids)
that is independent of Vds.
I V ---------------------
2
(5)
rin
2 GT
ds
This expression is valid for Vgs > Vt and Vds > Vdsat .
g.n
Summarizes the current in the three regions: e t
Explain the dynamic behavior of MOSFET transistor with neat diagram. (April 2018)
Discuss the CV characteristics of the CMOS. (Nov 2012, May 2014, Nov
2015, Nov 2016)
Explain the electrical properties CMOS. (Nov 2017)
Each terminal of an MOS transistor has capacitance to the other terminals.
Capacitances are nonlinear and voltage dependent (C-V).
SIMPLE MOS CAPACITANCES MODEL:
The gate of an MOS transistor is a good capacitor. Its capacitance is necessary to attract
ww
charge to invert the channel, so high gate capacitance is required to obtain high Ids.
The gate capacitor can be viewed as a parallel plate capacitor with the gate on top, channel on
w.E
bottom and the thin oxide dielectric between.
The capacitance is Cg = Cox WL. ----------------(1)
asy
En
gi nee
rin
g.n
e t
In addition to the gate, the source and drain also have capacitances. These capacitances are
called parasitic capacitors.
The source and drain capacitances arise from the p–n junctions between the source or drain
diffusion and the body. These capacitances are called diffusion capacitance Csb and Cdb.
The depletion region acts as an insulator between the conducting p- and n-type regions,
creating capacitance across the junction.
The capacitance of junctions depends on the area and perimeter of the source and drain
diffusion, the depth of the diffusion, the doping levels and the voltage.
As diffusion has both high capacitance and high resistance, it is generally made as small as
possible in the layout.
ww
DETAILED MOS GATE CAPACITANCE MODEL:
MOS gate places above the channel and may partially overlap the source and drain diffusion
areas.
w.E
The gate capacitance has two components, (i) the intrinsic capacitance Cgc (over the channel)
and (ii) the overlap capacitances Cgol (to the source and drain).
asy
The intrinsic capacitance was approximated as a simple parallel plate with capacitance
C0 =WLCox.
En
The intrinsic capacitance has three components representing the different terminals connected
to the bottom plate are Cgb (gate-to-body), Cgs (gate-to-source), and Cgd (gate-to-drain).
in Table 1. gi
The behavior in three regions (Cutoff, Linear and Saturation) can be approximated as shown
nee
r ing
.ne
t
Table1: Approximation for intrinsic MOS gate capacitance
The capacitance depends on both the area AS and sidewall perimeter PS of the source
diffusion region. The area is AS = WD.
Where, Cjbs - Capacitance of the junction between the body and the bottom of the
source Cjbssw - Capacitance of the junction between the body and the side walls of the
source
In summary, MOS transistor can be viewed as a four-terminal device with capacitances
w.E
asy
Figure 10: Capacitance of a MOS Transistor
En
The gate capacitance includes an intrinsic component and overlap terms with the source
and drain. The source and drain have parasitic diffusion capacitance to the body.
nee
Explain the DC transfer characteristic of CMOS inverter.[APRIL-2015, Nov 2015]
Draw and explain the DC and transfer characteristics of a CMOS inverter with
rin
necessary conditions for the different regions of operation. (Nov/Dec 2011)
(Nov/Dec 2012) (May/June 2013) (April/May 2012) (May/June 2014) (Nov/Dec
2013) (May 2016, May 2017, Nov 2008)
Explain the CMOS inverter DC characteristics. (Nov 2007, Nov 2009) g.n
e
The DC transfer characteristics of a circuit relate the output voltage to the input voltage.
(i) Static CMOS inverter DC Characteristics:
The DC transfer function (Vout Vs. Vin) for the static CMOS inverter shown in Figure 11.
t
ww
w.E
asy
Table 2: Relationships between voltages for the three regions of operation of a CMOS
inverter
En
Figure 12(a), shows Idsn and Idsp in terms of Vdsn and Vdsp for various values of Vgsn and Vgsp.
gi
Figure 12(b), shows the same plot of Idsn and |Idsp| in terms of Vout for various values of Vin.
nee
Operating points are plotted on Vout vs. Vin axes in Figure 12(c) to show the inverter DC
transfer characteristics.
The supply current IDD = Idsn = |Idsp| is plotted against Vin in Figure 13(d) showing that both
transistors are momentarily ON as Vin. rin
g.n
The operation of the CMOS inverter can be divided into five regions as indicated on figure
12(c).
e t
ww
w.E
asy
Table 3: Summary of CMOS inverter operation.
In region A, the nMOS transistor is OFF and the pMOS transistor pulls the output to VDD.
En
In region B, the nMOS transistor starts to turn ON. It is pulling the output down.
In region C, both transistors are in saturation.
gi
In region D, the pMOS transistor is partially ON.
nee
In region E, PMOS is completely OFF, making the nMOS transistor to pull the output down to
GND.
(ii) Beta ratio Effects:
For βp = βn, the inverter threshold voltage Vinv is VDD/2. rin
g.n
It allows a capacitive load to charge and discharge in equal times by providing equal current
source and equal sink capabilities.
Inverter with different beta ratios r = βp /βn is called skewed inverter.
et
If r > 1, the inverter is HI-skewed. If r < 1, the inverter is LO-skewed. If r = 1, the inverter has
normal skew or is unskewed.
Figure 13, shows the impact of skewing the beta ratio on the DC transfer characteristics.
As the beta ratio is changed, the switching threshold is varied.
Derive the noise margins for a CMOS inverter. (May 2010, Nov2016)
(iii) Noise Margins:
Noise margin (Noise immunity) is related to the DC voltage characteristics.
Noise Margin allows determining the allowable noise voltage on the input of a gate, so that
theoutput will not be corrupted.
Two parameters of the noise margin are LOW noise margin (NML), and the HIGH
noisemargin (NMH).
ww
w.E Figure 14: Noise Margin Definitions
asy
NML is defined as the difference in maximum LOW input voltage VIL and the maximum
En
LOW output voltage VOL. NML = VIL - VOL
The value of NMH is the difference between the minimum HIGH output voltage VOH and the
gi
minimum HIGH input voltage VIH. i.e., NMH = VOH - VIH
nee
Inputs between VIL and VIH are said to be in the indeterminate region or forbidden zone.
(iv) Pass Transistor DC Characteristics:
rin
The nMOS transistors pass 0’s well but 1’s poorly. Figure 15(a), shows an nMOS transistor
with the gate and drain tied to VDD.
Initially at Vs = 0. Vgs > Vtn, so the transistor is ON and current flow.
g.n
This loss is called a threshold drop.
The pMOS transistors pass 1’s well but 0’s poorly.
e
Therefore, nMOS transistors attempting to pass a 1 never pull the source above VDD – Vtn.
t
If the pMOS source drops below |Vtp|, the transistor cuts off.
Hence, pMOS transistors only pull down to a threshold above GND, as shown
in Figure 15(b).
Explain in detail about the non ideal I-V characteristics of a CMOS device. (MAY
2013)
Explain channel length modulation and body effect. (Nov 2009, May 2013)
MOS characteristics degrade with temperature. It is useful to have a qualitative
understanding of non ideal effects to predict their impact on circuit behavior.
(i) Mobility Degradation and Velocity Saturation:
Current is proportional to the lateral electric field Elat = Vds /L between source and drain.
A high voltage at the gate of the transistor attracts the carriers to the edge of the channel,
ww
causing carriers collision with the oxide interface that slows the carriers. This is called
mobility degradation.
Carriers approach a maximum velocity (vsat) when high fields are applied. This
(ii)
w.E
phenomenon is called velocity saturation.
asy
Current Ids is an independent of Vds for a transistor in saturation.
The p–n junction between the drain and body forms a depletion region with a width Ld that
En
increases with Vdb , as shown in Figure 16.
The depletion region effectively shortens the channel length to Leff = L - Ld
nee
Hence, increasing Vds decreases the effective channel length.
Shorter channel length results in higher current. Thus, Ids increases with Vds in saturation,
as shown in Figure 16.
rin
g.n
e t
Figure 16: Depletion region shortens effective channel length
In Saturation region, Idsis
2 V
ds VGT 1
I ds
V
2 A
Hence, VA is proportional to channel length. This channel length modulation model is a
gross oversimplification of nonlinear behavior.
Threshold voltage Vt increases with the source voltage, decreases with the body voltage,
decreases with the drain voltage and increases with channel length.
Body Effect:
When a voltage Vsb is applied between the source and body, it increases the amount of
charge required to invert the channel. Hence, it increases the threshold voltage.
The threshold voltage can be modeled as
where Vt0 is the threshold voltage when the source is at the body
potential, фs is the surface potential at threshold and γ is the body
ww effect coefficient.
(iv) Leakage:
w.E
Even when transistors are OFF, transistors leak small amounts of current.
Leakage mechanisms include subthreshold conduction between source and drain, gate
leakage from the gate to body and junction leakage from source to body and drain to body.
asy
Subthreshold conduction is caused by thermal emission of carriers over the potential
barrier set by the threshold.
Gate leakage is a quantum-mechanical effect caused by tunneling through the extremely
thin gate dielectric.
En
Junction leakage is caused by current through the p-n junction between the source/drain
diffusions and the body.
SCALING
gi nee
rin
Discuss the scaling principles and its limits. (MAY 2013, Nov 2017, Nov 2018)
Discuss the principle of constant field and lateral scaling. Write the effects of the above
g.n
scaling methods on the device characteristics. (Nov 2012, Dec 2011, Nov 2015, May 2016)
Explain need of scaling, scaling principles and fundamental units of CMOS inverter. (May
2107)
e
In VLSI design, the transistor size has reduced by 30% every two to three years. Scaling t
is reducing feature size of transistor.
Nowadays, transistors become smaller, switch faster, dissipate less power and cheaper.
Designers need to predict the effect of feature size scaling on chip performance to plan
future products and ensure existing products for cost reduction.
Transistor scaling:
Dennard’s Scaling Law predicts that the basic operational characteristics of a MOS
transistor can be preserved and the performance can be improved.
Parameters of a device are scaled by a dimensionless factor S.
These parameters include the following:
All dimensions (in the x, y, and z directions)
Device voltages
Doping concentration densities
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
Constant field scaling (Full Scaling):
In constant field scaling, electric fields remain the same as both voltage and distance
shrink.
1/S scaling is applied to all dimensions, device voltages and concentration densities.
Ids per transistor are scaled by 1/S.
2
No. of transistors per unit area is scaled by S .
Current density is scaled by S and power density remains constant.
1 1 2
o e.g., ( )S
S S
Lateral scaling (gate-shrink):
Another approach is lateral scaling, in which only the gate length is scaled.
ww
This is commonly called as gate shrink, because it can be done easily to an existing mask
database for a design.
w.E
Ids per transistor are scaled by S.
No. of transistors per unit area is scaled by S.
2 2
Current density is scaled by S and power density is scaled by S .
asy
The industry generally scales process generations with 30% shrink.
It reduces the cost (area) of a transistor by a factor of two.
En
A 5% gate shrink (S = 1.05) is commonly applied as a process, becomes mature to boost
the speed of components in that process.
gi nee
Constant voltage scaling (Fixed scaling) offers quadratic delay improvement as well
as cost reduction.
rin
It is also maintaining continuity in I/O voltage standards. Constant voltage scaling
increases the electric fields in devices.
Ids per transistor are scaled by S.
2
No. of transistors per unit area is scaled by S . g.n
e
3 3
Current density is scaled by S and power density is scaled by S .
A 30% shrink with Dennard scaling improves clock frequency by 40% and cuts power
consumption per gate by a factor of 2.
Maintaining a constant field has the further benefit, that many nonlinear factors and wear
t
out mechanisms are unaffected.
From 90nm generation technology, voltage scaling is dramatically slowed down due to
leakage. This may ultimately limit CMOS scaling.
w.E
output crossing 50%.
Contamination delay time (tcd):
Contamination delay time is defined as minimum time from the input crossing 50% to the
output crossing 50%.
Rise time (tr): asy
En
Rise time is defined as time for a waveform to rise from 20% to 80% of its steady-state
value
Fall
time (tf):
gi nee
Fall time is defined as time for a waveform to fall from 80% to 20% of its steady-state value
Edge rate is average of rise and fall time, ( trf) = (tr + tf )/2
rin
Delay estimation response curve:
When an input changes, the output will retain its old value for at least the contamination
delay and take on its new value in, at most the propagation delay.
g.n
Delays for the output rising is tpdr /tcdr and the output falling is tpdf /tcdf .
Rise/fall times are also called as slopes or edge rates.
e
Propagation and contamination delay times are also called as max-time and min-time
respectively. t
The gate that charges or discharges a node is called the driver. The gates and wires being
driven, are called the load. Propagation delay is usually called as delay.
Arrival times and propagation delays are defined separately for rising and falling transitions.
The delay of a gate may be different from different inputs. Earliest arrival times can also
be computed based on contamination delays.
Expression of delay for rising output is tPLH = 0.69
RP.CL Where, RP – effective resistance of pMOS
transistor
CL - load capacitance of CMOS inverter.
Expression of delay for falling output is tPHL = 0.69
RN.CL Where, RN – effective resistance of nMOS
transistor
Propagation delay of CMOS inverter is tP = (tPLH + tPHL) / 2
ww
RC Delay Model:
w.E
Discuss in detail about the resistive and capacitive delay estimation of a CMOS inverter
circuit.
(MAY 2013)
asy
Briefly explain about the RC delay model.
(or)
En
gi
RC delay model approximates the nonlinear transistor I-V and C-V characteristics with an
average resistance and capacitance over the switching range of the gate.
Effective Resistance:
nee
rin
The RC delay model treats a transistor as a switch in series with a resistor.
The effective resistance is the ratio of Vds to Ids.
A unit nMOS transistor is defined to have effective resistance R.
g.n
An nMOS transistor of k times unit width has resistance R/k, because it delivers k times as
much current.
e
A unit pMOS transistor has greater resistance, generally in the range of 2R–3R, because
of its lower mobility.
According to the long-channel model, current decreases linearly with channel length (L)
t
and hence resistance is proportional to L.
Gate and Diffusion Capacitance:
Equivalent RC Circuits:
Figure shows equivalent RC circuit models for nMOS and pMOS transistors of
width k with contacted diffusion on both source and drain.
The pMOS transistor has approximately twice the resistance of the nMOS
transistor, because holes have lower mobility than electrons.
ww
w.E
Stick diagram asy Figure: RC model of nMOS &pMOS transistors
En
gi
Explain about stick diagram in VLSI design. (April 2008)
nee
A stick diagram is a cartoon of a chip layout. A "stick diagram" is a paper and pencil tool
that use to plan the layout of a cell.
rin
The stick diagram resembles the actual layout, but uses "sticks" or lines to represent the
g.n
devices and conductors. Figure 17, shows a stick diagram for an inverter.
The stick diagram represents the rectangles with lines, which represent wires and
component symbols.
e
The stick diagram does not represent all the details of a layout, but it makes some
relationship much clearer and it is simple to draw. t
Layouts are constructed from rectangles, but stick diagrams are built from cartoon symbols
for components and wires.
Stick diagram Rules:
Rule 1: When two or more ‘sticks’ of the same type cross or touch each other, that
represents electrical contact.
Rule 2: When two or more ‘sticks’ of the different type cross or touch each other, there
is no electrical contact. If electrical contact is needed, we have to show the connection
explicitly.
Rule 3: When a poly crosses diffusion, it represents a transistor. If a contact is shown,
then it is not a transistor. A transistor exists where a polysilicon (red) stick crosses
either an n-diffusion (green) stick or a p-diffusion (yellow) stick.
Rule 4: In CMOS, a demarcation line is drawn to avoid touching of p-diff with n-diff.
All pMOS must lie on one side of the line and all nMOS will have to be on the other
side. Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
ww
w.E
asy
En
Figure : Symbols for wires used on various layers
gi nee
Drawing stick diagrams in color: Red for poly, green for n-diffusion, yellow for p-
diffusion, and shades of blue for metal are typical colors.
A few simple rules for constructing wires from straight-line segments ensure that, the stick
diagram corresponds to a feasible layout.
rin
allowed.
g.n
Wires cannot be drawn at arbitrary angles. Only horizontal and vertical wire segments are
Two wire segments on the same layer, which cross are electrically connected.
Figure 19, shows the stick figures for transistors.
e
Vias to connect wires, which do not normally interact, are drawn as black dots.
t
Each type of transistor is represented as poly and diffusion crossings, much as in the layout.
ww
w.E
asy
En
Layout Design Rules gi nee
rin
Draw and explain briefly the n-well CMOS design rules. (NOV 2007, April 2008, MAY 2014)
Discuss in detail with a neat layout, the design rules for a CMOS inverter.
g.n
Write the layout design rules and draw diagram for four input NAND and NOR. (Nov
2016) (April 2018)
Layout rules also referred to as design rules.
e
It can be considered as prescription for preparing the photomasks, which are used in the
t
fabrication of integrated circuits.
The rules are defined in terms of feature sizes (widths), separations and overlaps.
The main objective of the layout rules is to build reliable functional circuits in as small
area as possible.
Layout design rules describe how small features can be and how closely they can be
reliably packed in a particular manufacturing process.
Design rules are a set of geometrical specifications that dictate the design of the layout
masks.
A design rule set provides numerical values for minimum dimensions and line spacing.
Scalable design rules are based on a single parameter (λ), which characterizes the
resolution of the process. λ is generally half of the minimum drawn transistor channel
length.
Downloaded From: www.EasyEngineering.net
This length is the distanceDownloaded From:
between the www.EasyEngineering.net
source and drain of a transistor and is set by the
minimum width of a polysilicon wire.
Lambda based rule (Scalable design rule):
Lambda-based rules are round up dimensions of scaling to an integer multiple of λ.
Lambda rules make scaling layout small. The same layout can be moved to a new process,
simply by specifying a new value of λ.
The minimum feature size of a technology is characterized as 2λ.
Micron Design Rules (Absolute dimensions):
The MOSIS rules are expressed in terms of lambda.
These rules allow some degree of scaling between processes.
Only need to reduce the value of lambda and the designs will be valid in the next process
down in size.
These processes rarely shrink uniformly.
Thus, industry usually uses the actual micron design rules for layouts.
ww
There are set of micron design rules for a hypothetical 65 nm process.
We can observe that, these rules differ slightly but not immensely from lambda based rules
w.E
with lambda = 0.035 micro meter.
Upper level metal rules are highly variable depending on the metal thickness. Thicker
wires require greater widths, spacing and bigger vias.
asy
Two metal layers in an n-well process has the following:
En
Metal and diffusion have minimum width and spacing of 4 λ.
Contacts are 2 λ × 2 λ and must be surrounded by 1 λ on the layers above and below.
Polysilicon uses a width of 2 λ.
gi nee
Polysilicon overlaps diffusion by 2 λ where a transistor is desired and has a
spacing of 1 λ away where no transistor is desired.
rin
Polysilicon and contacts have spacing of 3 λ from other polysilicon or contacts.
N-well surrounds pMOS transistors by 6 λ and avoids nMOS transistors by 6 λ.
g.n
e t
Figure: Simplified λ -based design rules with CMOS inverter layout diagram
Design Rule:
Well Rules:
The n-well is usually a deeper implant than the transistor source/drain implants.
Therefore, it is necessary to provide sufficient clearance between the n-well edges and the
adjacent n+ diffusions.
Transistor Rules:
CMOS transistors are generally defined by at least four physical masks.
There are active (also called diffusion, diff, thinox, OD, or RX), n-select (also called n-
implant, n-imp, or nplus), p-select (also called p-implant, pimp, or pplus) and polysilicon
(also called poly, polyg, PO, or PC).
The active mask defines all areas, where n- or p-type diffusion is to be placed or where the
gates of transistor are to be placed.
ww
Contact Rules:
There are several generally available contacts:
w.E
Metal to p-active (p-diffusion)
Metal to n-active (n-diffusion)
Metal to polysilicon
asy
Metal to well or substrate
En
Metal
Rules:
Metal spacing may vary with the width of the metal line.
nee
Processes may allow vias to be placed over polysilicon and diffusion regions.
rin
Some processes allow vias to be placed within these areas, but do not allow the vias to the
boundary of polysilicon or diffusion.
Example: NAND3
Draw the layout diagram of NAND. (May 2017) g.n
Horizontal N-diffusion and p-diffusion strips
Vertical polysilicon gates
e t
Metal1 VDD rail at top
Metal1 GND rail at bottom
Draw diagram for four input NAND and NOR gate. (Nov 2017)
4 input NOR gate 4 input NANA gate
w.E
Latchup problem arises when parasitic bipolar transistors are formed by the substrate, well
and diffusion.
The cause of the latchup effect can be understood by examining the process cross-section
asy
of a CMOS inverter, as shown in Figure (a).
The schematic shows, a circuit composed of an npn-transistor, a pnp-transistor, and two
En
resistors connected between the power and ground rails (Figure (b)).
gi nee
rin
g.n
e t
The npn transistor is formed between the grounded n-diffusion source of the nMOS transistor, the
p-type substrate and the n-well.
The resistors are due to the resistance through the substrate or well to the nearest
substrate and well taps.
The cross-coupled transistors form a bistable silicon-controlled rectifier (SCR). Both
parasitic bipolar transistors are OFF.
Latchup can be triggered, when transient currents flow through the substrate during
normal chip power-up.
Latchup prevention is easily accomplished by
Minimizing Rsub and Rwell.
Use of guard rings
SOI process avoids latchup entirely, because they have no parasitic bipolar structures.
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
Process
parameters for MOS and CMOS:
CMOS
TECHNOLOGIES:
The four main CMOS
technologies are
n-Well process
p-Well
process
Twin-tub
Process
Silicon on Insulator
Explain the different steps involved in CMOS fabrication / manufacturing process with neat
diagrams.
(Nov 2007, Nov 2009, Nov 2016, NOV 2018)
ww
Describe with neat diagram the n-well and channel formation in CMOS process. (Nov/Dec
2014)(Nov/Dec 2011) (April/May 2011) (Nov/Dec 2012)
n-WELL PROCESS:
w.E
Step 1: Start with blank wafer
First step will be to form the n-well
– Cover wafer with protective layer of SiO2 (oxide)
asy
– Remove layer where n-well should be built.
p substrate
En
Step 2: Oxidation
0 gi nee
Grow SiO2 on top of Si wafer, at 900 – 1200 C with H2O or O2 in oxidation furnace.
rin
SiO2
Step 3: Photoresist
p substrate
g.n
• Spin on photoresist
– Photoresist is a light-sensitive organic polymer.
– Softens, where exposed to light.
e t
Photoresist
SiO2
p substrate
Step 4: Lithography
• Expose photoresist through n-well mask.
• Strip off exposed photoresist.
Photoresist
SiO2
p substrate
Step 5: Etch
• Etch oxide with hydrofluoric acid (HF).
• Only attracts oxide, where resist has been exposed.
P h o to re s is t
S iO 2
p s u b s tra te
ww
Step 6: Strip Photoresist
• Etch the remaining photoresist using a mixture of acids.
w.E SiO2
Step 7: n-well
p substrate
asy
En
n-well is formed with diffusion or ion implantation.
gi n well
nee
SiO2
Step 9: Polysilicon
• Deposit thin layer of oxide. Use CVD to form poly and dope heavily to
increase conductivity.
Polysilicon
Thin gate oxide
n well
p substrate
n well
p substrate
********
P-WELL PROCESS:
ww
A common approach to p-well CMOS fabrication is to start with moderately doped n-type
substrate (wafer), create the p-type well for the n-channel devices and build the p-channel
transistor in the native n-substrate.
w.E
Explain the twin tub process with a neat diagram. (Nov 2007, April
asy
2008)
Twin-tub process:
Step 1:
n- Substrate is taken initially, which is shown in figure.
Step 2:
En
gi
Next step is epitaxial layer deposition. Lightly doped epitaxial layer is deposited above n-
substrate.
Step 3:
nee
The next step is tub formation. Two wells are formed namely n-well and p-well.
Step 4:
Polysilicon layer is formed above overall substrate.
rin
Step 5: g.n
Polysilicon gates are formed for n-well and p-well by using photo-etching process.
+ +
e
n diffusion is formed in n-well, P diffusion is formed in p-well. These are used for VDD
contact and VSS contact. These are known as substrate formation.
Step 6: t
Then, contact cuts are defined as in n-well process. Then metallization is processed.
Elmore’s Delay
What is meant by Elmore’s delay and give expression for Elmore’s delay?
The Elmore delay model estimates the delay from a source, switching to one of the leaf nodes.
Delay is the sum over each node i of the capacitance Ci on the node multiplied by the effective
resistance R.
Propagation delay time :
t
C
R
pd i to source i
nodes i
asy
En
gi nee
rin
Figure: RC delay equivalent for series of transistors g.n
Linear delay model
The RC delay model is one, where delay is a linear function of the fanout of a gate.
e t
The normalized delay of a gate can be expressed in units of Y as d = f + p.
Where p is the parasitic delay inherent to the gate when no load is attached.
f is the effort delay or stage effort that depends on the complexity.
Effort delay of the gate is f = gh.
Where g is the logical effort (An inverter has a logical effort of 1).
Logical effort is defined as the ratio of the input capacitance of a gate to the input
capacitance of an inverter delivering the same output current.
h is the fanout or electrical effort. Electrical effort is defined as ratio of the output
capacitance to input capacitance.
More complex gates have greater logical efforts, indicating that they take longer time to
drive a given fanout.
For example, the logical effort of the 3-input NAND gate is 5/3.
Downloaded From: www.EasyEngineering.net
C www.EasyEngineering.net
Downloaded From:
The electrical effort can be computed as h out
C
in
Where Cout is the capacitance of the external load being driven and Cin is the capacitance of
the gate.
Normalized delay vs electrical effort for an idealized inverter and 3-input NAND gate shown
in
diagram.
The
y-intercepts indicate the parasitic delay. The slope of the lines is the logical effort.
The inverter has a slope of 1. The NAND gate has a slope of 5/3.
ww
w.E
asy
En
Design a four input NAND gate and obtain its delay during the transition from high to
low. (April 2018)
gi nee
Figure shows a model of an n-input NAND gate in which the upper inputs were all 1
and the bottom input rises. The gate must discharge the diffusion capacitances of all of the
internal nodes as well as the output.
rin
g.n
Elmore delay is
2 2
n 5 4 5 16 20
Delay for 4 input NAND gate: n RC = 4 RC = RC =18RC
2 2 2 2 2 2
Obtain the logical effort and path efforts of the given circuit. (April 2018)
Delay in Multistage Logic Networks:
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
The figure shows the logical and electrical efforts of each stage in a multistage path
as a function of the sizes of each stage.
The path of interest (the only path in this case) is marked with the dashed blue line. Observe
that logical effort is independent of size, while electrical effort depends on sizes.
The path logical effort G can be expressed as the products of the logical efforts of each
stage along the path.
G gi
ww The path electrical effort H can be given as the ratio of the output capacitance
the path must drive divided by the input capacitance presented by the path
w.E
The path effort F is the product of the stage efforts of each stage.
asy F f i g i hi
En
Introduce an effort to account for branching between stages of a path. This branching
effort b is the ratio of the total capacitance seen by a stage to the capacitance on the path.
b
C
onpath
C
C
onpath
offpath
gi nee
rin
The path branching effort B is the product of the branching efforts between stages.
B bi
g.n
e
The path effort (F) is defined as the product of the logical, electrical, and
branching efforts of the path. The product of the electrical efforts of the stages is actually
BH, not just H. t
F=GBH
Compute the delay of a multistage network. The path delay D is the sum of
the delays of each stage. It can also be written as the sum of the path effort delay DF
D d i DF P
D f
F i
P pi
The product of the stage efforts is F, independent of gate sizes. The path effort
delay is the sum of the stage efforts. The sum of a set of numbers whose product is constant
is minimized by choosing all the numbers to be equal.
The path delay is minimized when each stage bears the same effort. If a path has N
stages and each bears the same effort, that effort must be Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
fˆ = gi hi = F 1 / N
Thus, the minimum possible delay of an N-stage path with path effort F and
path parasitic delay P is
D=NF1/N +P
It shows that the minimum delay of the path can be estimated knowing only
the number of stages, path effort, and parasitic delays without the need to assign transistor
sizes.
Bubble pushing
CMOS stages are inherently inverting, so AND and OR functions must be built from NAND
and NOR gates.
ww
DeMorgan’s law helps with this conversion:
A.B AB
asy
En
gi nee
Figure: Bubble pushing with DeMorgan’s law
A
NAND gate is equivalent to an OR of inverted inputs. rin
A
NOR gate is equivalent to an AND of inverted inputs.
The
same relationship applies to gates with more inputs. g.n
Switching between these representations is easy and is often called bubble pushing.
Compound Gates
e t
Static CMOS also efficiently handles compound gates computing various inverting
combinations
of AND/OR functions in a single stage.
The function F = AB +CD can be computed with an AND-OR INVERT- 22 (AOI22) gate
and an inverter, as shown in Figure.
Logical
effort of compound gates can be different for different inputs.
Figure shows, how logical efforts can be estimated for the AOI21, AOI22 and a more
complex compound AOI gate.
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
ww
w.E
Figure: Logical efforts and parasitic delays of AOI gates
No charge must be delivered to node x, so the Elmore delay is simply R(6C) =6RC =2τ.
e t
Figure: 2 –input NAND gate Schematic Y=A.B
We define the outer input to be the input closer to the supply rail (e.g., B) and the
inner input to be the input closer to the output (e.g., A).
Therefore, if one signal is known to arrive later than the others, the gate is faster when
that signal is connected to the inner input.
When one input is far less critical than another, even symmetric gates can be made
asymmetric to favor the late input at the expense of the early one.
In a series network, this involves connecting the early input to the outer transistor and
making the transistor wider, so that, it offers less series resistance when the critical
input arrives.
In a parallel network, the early input is connected to a narrower transistor to reduce
the parasitic capacitance.
Consider the path in Figure (a). Under ordinary conditions, the path acts as a buffer
between A and Y.
ww If reset only occurs under exceptional circumstances and take place slowly, the circuit
should be optimized for input-to-output delay at the expense of reset.
This can be done with the asymmetric NAND gate in Figure (b).
w.E
asy
En
gi nee
Figure: Resettable buffer optimized for data input
Skewed gates rin
g.n
e
What is meant by skewed gate and give functions of skewed gate with schematic diagrams?
One input transition is more important than the other. HI-skew gates to favor the rising output
transition. t
LO-skew gates to favor the falling output transition.
This favoring can be done by decreasing the size of the noncritical transistor.
The logical efforts for the rising (up) and falling (down) transitions are called gu and gd,
respectively.
Figure
(a) shows, how a HI-skew inverter is constructed by downsizing the nMOS transistor.
This maintains the same effective resistance for the critical transition, while reducing the
input capacitance relative to the unskewed inverter of Figure (b).
Thus
reducing the logical effort on that critical transition to gu = 2.5/3 =5/6.
ww
are sometimes denoted with an H or an L on their symbol in a schematic.
w.E
asy
En
gi nee
rin
g.n
Figure: List of skewed gates
e t
w
voltage swing V.
.Ea
syE
ngi
nee
r ing
configurations are called circuit families.
.
Alternative (ratioed circuits, dynamic circuits and pass transistor circuits) CMOS logic
nMOS transistors provide more current than pMOS for the same size and capacitance, so net
nMOS networks are preferred.
Examples of combinational circuits
(i) CMOS inverter:
ww
w .Ea
syE
Figure: 3 –input NAND gate Schematic Y=A.B.C
ngi
(iv) Two input NOR gate:
nee
r ing
. net
Figure: 2-input NOR gate (a) schematic (b) Symbol Y = A + B
Example:
Sketch a static CMOS gate computing Y = (A + B + C) · D.
*************************************************************************************************
Briefly discuss about the classification of circuit families and comparison of the circuit
families. (May 2014, APRIL-2015)
Draw the CMOS logic circuit for the Boolean expression Z= A( B C ) DE and explain.
(April 2018)
ww
w .Ea
Advantages of static CMOS:
syE Figure: Static CMOS inverter
It has a relatively large logical effort. r
It requires both nMOS and pMOS transistors for each input.
ing
Gate delay is ncreased.
a. Bubble pushing . net
CMOS stages are inherently inverting, so AND and OR functions must be built from NAND
and NOR gates.
DeMorgan‟s law helps with this conversion:
A.B AB
AB A.B
ww
w .Ea
Figure: Logic using AOI22 gate
ngi
nee
r ing
. net
The logical effort and parasitic delay of different gate inputs are different.
Consider the falling output transition occurring, when one input hold a stable 1 value and the
other rises from 0 to 1.
If input B rises last, node x will initially be at VDD – Vt = VDD, because it was pulled up
through the nMOS transistor on input A.
The Elmore delay is (R/2)(2C) + R(6C) =7RC=2.33 τ
If input A raises last, node x will initially be at 0 V, because it was discharged through the
nMOS transistor on input B.
No charge must be delivered to node x, so the Elmore delay is simply R(6C) =6RC =2τ.
w
signal is connected to the inner input.
parasitic capacitance.
r ing
Consider the path in Figure (a). Under ordinary conditions, the path acts as a buffer between A
and Y.
When reset is asserted, the path forces the output low.
. net
If reset only occurs under exceptional circumstances and take place slowly, the circuit should
be optimized for input-to-output delay at the expense of reset.
This can be done with the asymmetric NAND gate in Figure (b).
w
Hence, pMOS transistors only pull down to a threshold above GND, as shown in Figure (b).
.Ea
syE
ngi
nee
Figure : Pass Transistor threshold drops
Figures show an implementation of the AND function and 2x1 multiplexer using only NMOS
transistors.
r ing
. net
AND Logic 2x1 multiplexer
In AND gate, if the B input is high, the top transistor is turned ON and copies the input A to
the output F.
When B is low, the bottom pass transistor is turned ON and passes a 0.
In 2x1 multiplexer, if the S selection input is high, the top transistor is turned ON and allows
input A to the output Y.
When S is low, the bottom pass transistor is turned ON and passes the B input.
An NMOS device is effective at passing a 0 but is poor at pulling a node to VDD. When the
pass transistor pulls a node high, the output only charges up to VDD -Vtn.
Application:
Pass transistors are essential to the design of efficient 6-transistor static RAM cells used
in modern systems.
ww
w .Ea
syE
ngi
nee
r ing
Figure: Complementary pass-transistor logic (CPL). . net
2.4.2: CMOS with transmission gates
Discuss in detail the characteristics of CMOS Transmission gates.(May 2016, May 2017, Nov 2017)
Explain Transmission gates with neat sketches. (April 2008, April 2018)
List out limitations of pass transistor logic. Explain any two techniques used to overcome
limitations. (NOV 2018)
A transmission gate in conjunction with simple static CMOS logic is called CMOS with
transmission gate.
A transmission gate is parallel pairs of nMOS and pMOS transistor.
A single nMOS or pMOS pass transistor suffers from a threshold drop.
Transmission gates solve the threshold drop but require two transistors in parallel.
The resistance of a unit-sized transmission gate can be estimated as R for the purpose of
delay estimation.
Current flow the parallel combination of the nMOS and pMOS transistors. One of the
transistors is passing the value well and the other is passing it poorly.
Downloaded From: www.EasyEngineering.net
7
Downloaded
A logic-1 is passed well through the From:
pMOS www.EasyEngineering.net
but poorly through the nMOS.
Estimate the effective resistance of a unit transistor passing a value in its poor direction as
twice the usual value: 2R for nMOS and 4R for pMOS.
.Ea
significantly increasing the capacitance.
syE
ngi
nee
Figure: Effective resistance of a unit transmission gate
Figure (a) redraws the multiplexer to include the Inverters that drive the diffusion inputs but
r ing
to exclude the output inverter. Figure (b) shows this multiplexer drawn at the transistor level.
. net
Figure: CMOSTG in a 2-input inverting multiplexer
2.5.1:Ratioed Circuits:
The ratioed gate consists of an nMOS pulldown network and pullup device called the static
load.
When the pulldown network is OFF, the static load pulls the output to 1.
When the pulldown network turns ON, it fights the static load.
The static load must be weak enough that, the output pulls down to an acceptable 0. Hence,
there is a ratio constraint between the static load and pulldown network.
ww
Advantage: Stronger static loads produce faster rising outputs.
Disadvantages:
o Degrade the noise margin and burn more static power when the output is 0.
w
o A resistor is a simple static load, but large resistors consume a large layout area in
.Ea
typical MOS processes.
Another technique is to use an nMOS transistor with the gate tied to VGG (Shown in fig.(b)).
syE
If VGG =VDD, the nMOS transistor will only pull up to VDD – Vt.
Figure (c) shows depletion load ratioed circuit.
ngi
nee
r ing
Figure: nMOS ratioed gates
. net
2.5.2: pseudo nMOS
Explain the detail about pseudo-nMOS gates with neat circuit diagram. (April/May 2011)
(Nov/Dec 2013)
ww
Implement NAND gate using pseudo- nMOS logic. (Nov 2013)
w .Ea
syE
ngi
nee
Figure: Pseudo-nMOS logic gates
2.5.3: Ganged capacitor:
r
Figure shows pairs of CMOS inverters ganged together. ing
. net
The truth table is given in Table, showing that the pair compute the NOR function. Such a
2
circuit is sometimes called a symmetric NOR, or ganged CMOS.
2
Figure: symmetric NOR gate.
When one input is 0 and the other 1, the gate can be viewed as a pseudo-nMOS circuit
with appropriate ratio constraints.
When both inputs are 0, both pMOS transistors turn on in parallel, pulling the output
high faster than they would, in an ordinary pseudo nMOS gate.
When both inputs are 1, both pMOS transistors turn OFF, saving static power dissipation.
Cascode Voltage Switch Logic (CVSL) seeks the benefits of ratioed circuits without the static
power consumption.
It uses both true and complementary input signals and computes both true and complementary
outputs using a pair of nMOS pulldown networks, as shown in Figure (a).
ww
The pulldown network f implements the logic function as in a static CMOS gate, while f uses
inverted inputs feeding transistors arranged in the conduction complement.
w
For any given input pattern, one of the pulldown networks will be ON and the other OFF.
.Ea
The pulldown network that is ON will pull that output low.
This low output turns ON the pMOS transistor to pull the opposite output high.
syE
When the opposite output rises, the other pMOS transistor turns OFF, so no static power
dissipation occurs.
Figure (b) shows a CVSL AND/NAND gate.
Advantage: ngi
nee
CVSL has a potential speed advantage because all of the logic is performed with nMOS
transistors, thus reducing the input capacitance.
r ing
. net
Describe the basic principle of operation of dynamic CMOS, domino and NP domino logic
with neat diagrams. (NOV 2011)
Dynamic Circuits:
Ratioed circuits reduce the input capacitance by replacing the pMOS transistors connected to
the inputs with a single resistive pullup.
The drawbacks of ratioed circuits include
o Slow rising transitions,
o Contention on the falling transitions,
o Static power dissipation and a nonzero VOL.
ww
Dynamic circuits avoid these drawbacks by using a clocked pullup transistor rather than a
pMOS that is always ON.
w
Figure compares (a) static CMOS, (b) pseudo-nMOS, and (c) dynamic inverters.
.Ea
syE
ngi
nee
Figure: Comparison of (a) static CMOS, (b) pseudo-nMOS, and (c) dynamic inverters
Dynamic circuit operation is divided into two modes, as shown in Figure.
r ing
(i) During precharge, the clock ф is 0, so the clocked pMOS is ON and initializes the output Y
high.
.
(ii) During evaluation, the clock is 1 and the clocked pMOS turns OFF. The output may
remain high or may be discharged low through the pulldown network.
net
Figure: Precharge and evaluation of dynamic gates
Advantages:
Dynamic circuits are the fastest used circuit family because they have lower input
capacitance and no contention during switching.
Zero static power dissipation.
Disadvantags:
They require careful clocking, consume significant dynamic power and are sensitive to noise
during evaluation mode.
Prepared by: B.ARUNKUMAR,AP/ECE
Foot transistor:
In Figure (c), if the input A is 1 during precharge, contention will take place because both the
pMOS and nMOS transistors will be ON.
When the input cannot be guaranteed to be 0 during precharge, an extra clocked evaluation
transistor can be added to the bottom of the nMOS stack.
To avoid contention as shown in the below figure, extra transistor is sometimes called as foot
is added.
w .Ea
syE
ngi
Figure: Generalized footed and unfooted dynamic gates
nee
The given below figure estimates the falling logical effort of both footed and
unfooted dynamic gates.
r ing
. net
Footed gates have higher logical effort than their unfooted concept but are still an
improvement over static logic.
The parasitic delay does increase with the number of inputs, because there is more diffusion
capacitance on the output node.
A fundamental difficulty with dynamic circuits is the monotonicity requirement. While a
dynamic gate is in evaluation, the inputs must be monotonically rising.
That is, the input can start LOW and remain LOW, start LOW and rise HIGH, start HIGH and
remain HIGH, but not start HIGH and fall LOW.
Figure shows waveforms for a footed dynamic inverter in which the input violates
monotonicity.
ww
w .Ea
syE Figure: Monotonicity problem
During precharge, the output is pulled HIGH.
ngi
When the clock rises, the input is HIGH, so the output is discharged LOW through the
pulldown network.
nee
The input later falls LOW, turning off the pulldown network. However, the precharge
transistor is also OFF, so the output floats, staying LOW rather than rising.
The output will remain low until the next precharge step.
r ing
The inputs must be monotonically rising for the dynamic gate to compute the correct function.
during evaluation. .
Unfortunately, the output of a dynamic gate begins HIGH and monotonically falls LOW
net
This monotonically falling output X is not a suitable input to a second dynamic gate expecting
monotonically rising signals, as shown in the below figure.
Dynamic gates sharing the same clock cannot be directly connected.
This problem is often overcome with domino logic.
Explain the domino logic families with neat diagrams. (NOV 2012, APRIL-2015, Nov 2017)
ww
that precharge occurs in parallel, but evaluation occurs sequentially.
w .Ea
syE
ngi
nee
r
Figure: Domino gates
ing
2.6.2: Dual Rail Domino Logic:
. net
Dual-rail domino gates encode each signal with a pair of wires. The input and output signal
pairs are denoted with _h and _l, respectively.
Table summarizes the encoding. The _h wire is asserted to indicate that the output of the gate
is “high” or 1. The _l wire is asserted to indicate that the output of the gate is “low” or 0.
When the gate is precharged, neither _h nor _l is asserted. The pair of lines should never be
both asserted simultaneously during correct operation.
Dual-rail domino gates accept both true and complementary inputs and compute both true and
complementary outputs, as shown in Figure (a).
This is identical to static CVSL circuits except that the cross-coupled pMOS transistors are
instead connected to the precharge clock.
Therefore, dual-rail domino can be viewed as a dynamic form of CVSL, sometimes called
DCVS.
Figure (b) shows a dual-rail AND/NAND gate and Figure (c) shows a dual-rail XOR/XNOR
gate. The gates are shown with clocked evaluation transistors, but can also be unfooted.
ww
w .Ea
syE
Disadvantages: ngi
Figure: Dual-rail domino gates
It requires more area, wiring and power.
nee
Dual-rail structures lose the efficiency of wide dynamic NOR gates.
Application:
It is useful for asynchronous circuits. r ing
2.6.3: Keepers
Dynamic circuits also suffer from charge leakage on the dynamic node.
If a dynamic node is precharged high and then left floating, the voltage on the
dynamic node will drift over time due to subthreshold, gate and junction leakage.
Dynamic circuits have poor input noise margins.
If the input rises above Vt,, while the gate is in evaluation, the input transistors will
turn ON weakly and can incorrectly discharge the output.
Both leakage and noise margin problems can be addressed by adding a keeper circuit.
Figure shows a conventional keeper on a domino buffer. The keeper is a weak
transistor that holds, or staticizes, the output at the correct level when it would
otherwise float.
Prepared by: B.ARUNKUMAR,AP/ECE
When the dynamic node X is high, the output Y is low and the keeper is ON to prevent
X from floating.
When X falls, the keeper initially opposes the transition, so it must be much weaker
than the pulldown network.
Eventually Y rises, turning the keeper OFF and avoiding static power dissipation.
w
the keeper can supply enough current to hold the output high.
r ing
. net
Figure: Differential keeper
2.6.4: Secondary precharge devices
Dynamic gates are subject to problems with charge sharing.
For example, consider the 2-input dynamic NAND gate in Figure (a). Suppose the output Y is
precharged to VDD and inputs A and B are low.
Also suppose that the intermediate node x had a low value from a previous cycle.
During evaluation, input A rises, but input B remains low, so the output Y should remain high.
However, charge is shared between CX and CY, shown in Figure (b). This behaves as a
capacitive voltage divider and the voltages equalize at
Charge sharing is serious when the output is lightly loaded (small CY ) and the internal
capacitance is large.
If the charge-sharing noise is small, the keeper will eventually restore the dynamic output to
VDD.
If the charge-sharing noise is large, the output may flip and turn off the keeper, leading to
incorrect results.
ww
Charge sharing can be overcome by precharging some or all of the internal nodes with
secondary precharge transistors.
w
These transistors should be small, because they only charge the small internal capacitances
.Ea
and their diffusion capacitance slows the evaluation.
It is sufficient to precharge every other node in a tall stack.
syE
ngi
nee
r ing
Figure: Charge-sharing noise
. net
2.6.6: NP and Zipper Domino
The HI-skew inverting static gates are replaced with predischarged dynamic gates using
pMOS logic.
A footed dynamic p-logic NAND gate is shown in Figure (b). When ф is 0, the first and third
stages precharge high while the second stage predischarges low.
When ф rises, all the stages evaluate. Domino connections are possible, as shown in Figure
(c).
In an ordinary dynamic gate, the input has a low noise margin (about Vt ), but is strongly
driven by a static CMOS gate.
The floating dynamic output is more prone to noise from coupling and charge sharing, but
drives another static CMOS gate with a larger noise margin.
In NORA, however, the sensitive dynamic inputs are driven by noise prone dynamic outputs.
Besides drawback and the extra clock phase requirement, there is little reason to use NORA.
Zipper domino is a closely related technique, that leaves the precharge transistors slightly ON
during evaluation by using precharge clocks. This swing between 0 and VDD – |Vtp| for the
wwpMOS precharge and Vtn and VDD for the nMOS precharge.
w .Ea
syE
ngi
nee
r ing
. net
Figure : NP Domino
**********************************************************************************
Explain the static and dynamic power dissipation in CMOS circuits with necessary
diagrams and expressions. (DEC 2011, Nov 2015, NOV 2016, May 2017, May 2010)
What are the sources of power dissipation in CMOS and discuss various design
techniques to reduce power dissipation in CMOS? (Nov 2012, May 2013, Nov
2014, May 2016)
The instantaneous power P (t) consumed by a circuit element is the product of the current and
the voltage of the element
P (t ) = I (t )V (t )
The energy consumed over time interval T is the integral of the instantaneous
power E T P (t) dt
0
E 1T
ww
The average power is Pavg T T 0 P (t) dt
Power is expressed in units of Watts (W). Energy is usually expressed in Joules ( J)
w By Ohm‟s Law, V = IR, so the instantaneous power dissipated in the resistor is
.Ea V 2 (t )
P R ( t) R
R 2
I R
syE
This power is converted from electricity to heat. VDD supplies power proportional to its
current PVDD (t) =IDD (t) VDD
ngi
When the capacitor is charged from 0 to VC, it stores energy EC
nee
Figure shows a CMOS inverter driving a load capacitance.
r ing
. net
When the input switches from 1 to 0, the pMOS transistor turns ON and charges the load to
VDD.
According to EC equation the energy stored in the capacitor is
The energy delivered from the power supply is
Gate switches at some average frequency fsw.
Over some interval T, the load will be charged and discharged Tfsw times.
Then, the average power dissipation is
This is called the dynamic power because it arises from the switching of the load.
Because most gates do not switch every clock cycle, it is often more convenient to express
switching frequency fsw as an activity factor α times the clock frequency f.
The dynamic power dissipation may be rewritten as
The activity factor is the probability that the circuit node transitions from 0 to 1, because that
is the only time the circuit consumes power.
A clock has an activity factor of α = 1 because it rises and falls every cycle.
wwThe total power of a circuit is calculated as,
Pdynamic = Pswitching + Pshort circuit
w .Ea
2.7.1: Dynamic power:
syE
Dynamic power consists mostly of the switching power.
ngi
nee
The supply voltage VDD and frequency f are known by the designer.
To estimate dynamic power, one can consider each node of the circuit.
node. r
The capacitance of the node is the sum of the gate, diffusion, and wire capacitances on the
ing
The activity factor can be estimated using switching probability or measured from logic
simulations.
. net
The effective capacitance of the node is, its true capacitance multiplied by the activity factor.
The switching power depends on the sum of the effective capacitances of all the nodes.
2.7.1.1: Sources of dynamic power dissipation:
Dynamic dissipation due to
Charging and discharging load capacitances as gates switchs.
“Short-circuit” current while both pMOS and nMOS stacks are partially ON
Explain various ways to minimize the static and dynamic power dissipation. (Nov 2013, May 2015)
Discuss the low power design principles in detail. (Nov 2017)
Low power design involves considering and reducing each of the terms in switching power.
i. As VDD is a quadratic term, it is good to select the minimum VDD.
ii. Choose the lowest frequency.
iii. The activity factor is reduced by putting unused blocks to sleep.
iv. Finally, the circuit may be optimized to reduce the overall load capacitance.
Switching power is consumed by delivering energy to charge a load capacitance, then
dumping this energy to GND.
Activity factor:
If a circuit can be turned OFF entirely, the activity factor and dynamic power go to zero.
ww Blocks are typically turned OFF, by stopping the clock called as clock gating.
The activity factor of a logic gate can be estimated by calculating the switching probability.
w
(a)Clock gating:
blocks. .Ea
Clock gating, AND‟s a clock signal with an enable to turn OFF the clock to idle
syE
The clock enable must be stable, while the clock is active.
Figure shows how an enable latch can be used to ensure the enable does not change
ngi
before the clock falls.
nee
Capacitance:
r ing
.
Switching capacitance comes from the wires and transistors in a circuit.
Wire capacitance is minimized through good floor planning and placement.
Device-switching capacitance is reduced by choosing smaller transistors.
net
Voltage:
Voltage has a quadratic effect on dynamic power.
Therefore, choosing a lower power supply significantly reduces power
consumption.
The chip may be divided into multiple voltage domains, where each domain is
optimized for the needs of certain circuits.
a. Voltage domains:
Selecting, which circuits belong in which domain and routing power supplies to
multiple domains.
Figure (Voltage domain crossing) shows direct connection of inverters in two
domains using high and low supplies, VDDH and VDDL, respectively.
ww
w Figure shows a block diagram for a basic DVS system.
.Ea
It determines the supply voltage and clock frequency sufficient to complete the
workload on schedule or to maximize performance without overheating.
Frequency:
syE
Dynamic power is directly proportional to frequency, so a chip should not run faster
than necessary.
ngi
Reducing the frequency allows downsizing transistors or using a lower supply voltage.
where Ioff is the subthreshold current at Vgs = 0 and Vds = VDD, and S is the subthreshold
slope.
2. Gate leakage:
Gate leakage occurs when carriers tunnel through a thin gate dielectric, when a voltage is
applied across the gate (e.g., when the gate is ON).
Gate leakage is a strong function of the dielectric thickness.
3. Junction leakage:
Junction leakage occurs when a source or drain diffusion region is at a different potential
from the substrate.
Leakage of reverse-biased diodes is usually negligible.
4. Contention current:
Static CMOS circuits have no contention current. However, certain alternative circuits
inherently draw current even while quiescent.
2.7.2.2: Methods of reducing static power:
Power gating:
To reduce static current during sleep mode is, to turn OFF the power supply to the
w .Ea
syE
ngi
The logic block receives its power from a virtual VDD rail, VDDV.
VDD.
nee
When the block is active, the header switch transistors are ON, connecting VDDV to
ing
.
Selective application of multiple threshold voltages can maintain performance on
net
critical paths with low-Vt transistors, while reducing leakage on other paths with high-
Vt transistors.
Variable threshold voltage:
Method to achieve high Ion in active mode and low Ioff in sleep mode is, by adjusting
the threshold voltage of the transistor by applying a body bias.
This technique is sometimes called variable threshold CMOS (VTCMOS).
Figure shows a schematic of an inverter using body bias.
**********************************************************************
Prepared by: B.ARUNKUMAR,AP/ECE
The growing market of portables such as cellular phones, gaming consoles and battery-powered
electronic systems demands microelectronic circuits design with ultra low power dissipation.
As the integration, size, and complexity of the chips continue to increase, the difficulty in providing
adequate cooling might either add significant cost or limit the functionality of the computing systems
which make use of those integrated circuits
As the technology node scales down to 65nm, there is not much increase in dynamic power
dissipation. However the static or leakage power reaches or exceeds the dynamic power levels
beyond 65nm technology node.
Hence the techniques to reduce power dissipation are not limited to dynamic power. In this, we
discuss circuit and logic design approaches to minimize dynamic, leakage and Total Power
dissipated in a CMOS circuit is sum of dynamic power, short circuit power and static or leakage
power.
ww
Design for low-power implies the ability to reduce all three components of power consumption in
CMOS circuits during the development of a low power electronic product.
w .Ea
In the sections to follow we summarize the most widely used circuit techniques to reduce each of
these components of power in a standard CMOS design.
syE
ngi
nee
r ing
Figure: Components of Power in CMOS circuit
. net
P total = CLVDD2 + tscVDDIpeak + VDDIleakage
Dynamic/Switching power is due to charging and discharging of load capacitors driven by the
circuit. Supply voltage scaling has been the most adopted approach to power optimization, since it
normally yields considerable power savings due to the quadratic dependence of switching/dynamic
power Pswitching on supply voltage VDD.
However lowering the supply voltage affects circuit speed which is the major short-coming of this
approach. So both design and technological solutions must be applied to compensate the decrease in
circuit performance introduced by reduced voltage. Some of the techniques often used to reduce
dynamic power are described below.
Choices between static versus dynamic topologies, conventional CMOS versus pass-transistor logic styles
and synchronous versus asynchronous timing styles have to be made during the design of a circuit.
In static CMOS circuits, the component of power due to short circuit current is about 10% of
the total power consumption.
However in dynamic circuits we don't come across this problem, since there is no any direct
dc path from supply voltage to ground.
Only in domino-logic circuits there is such a path, in order to reduce sharing, hence there is a small
amount of short-circuit power dissipation.
ww
w .Ea
syE Figure: Static NOR
ngi
nee
r ing
Figure: Dynamic NOR circuits
. net
P = CL* Vdd* (Vdd-Vt)
Logic Level Power Optimization:
During logic optimization for low power, technology parameters such as supply voltage are fixed,
and the degrees of freedom are in selecting the functionality and sizing the gates.
Path equalization with buffer insertion is one of the techniques which ensure that signal propagation
from inputs to outputs of a logic network follows paths of similar length to overcome glitches.
When paths are equalized, most gates have aligned transitions at their inputs, thereby minimizing
spurious switching activity/glitches (which is created by misaligned input transitions).
Static/Leakage power, originates from substrate currents and sub threshold leakages. For
ww
technologies 1 µm and above, PSwitching was predominant.
However for deep-submicron processes below 180nm, PLeakage becomes dominant factor. Leakage
w
power is a major concern in recent technologies, as it impacts battery lifetime.
.Ea
CMOS technology has been extremely power-efficient when transistors are not switching or in
stand-by mode, and system designers expect low leakage from CMOS chips.
syE
To meet leakage power constraints, multiple-threshold and variable threshold circuit techniques are
often used.
ngi
In multiple-threshold CMOS, the process provides two different threshold transitors. Low-threshold
are employed on speed-critical sub-circuits and ther are fast and leaky.
nee
High-threshold transistors are slower but exhibit low sub-threshold leakage, and they are employed
in noncritical/slow paths of the chip.
r ing
As more transistors become timing-critical multiple-threshold techniques tend to lose effectiveness.
Variable Body Biasing:
. net
Variable-threshold circuits dynamically control the threshold voltage of transistors through substrate
biasing and hence overcome shortcoming associated with multi-threshold design.
When a variable-threshold circuit is in standby, the substrate of NMOS transistors is negatively
biased, and their threshold increases because of the body-bias effect.
Similarly the substrate of PMOS transistors is biased by positive body bias to increase their Vt in
stand-by. Variable-threshold circuits can, in principle, solve the quiescent/static leakage problem,
but they require control circuits that modulate substrate voltage in stand-by.
Fast and accurate body-bias control with control circuit is quite challenging, and requires carefully
designed closed-loop control.
When the circuit is in standby mode the bulk/body of both PMOS and NMOS are biased by third
supply voltage to increase the Vt of the MOSFET as shown in the Figure.
However during normal operation they are switched back to reduce the Vt.
ww
Sleep Transistors are High Vt transistors connected in series with low Vt logic as shown below .
w
When the main circuit consisting of Low Vt devices are ON the sleep transistors are also ON
resulting in normal operation of the circuit.
.Ea
When the circuit is in Standby mode even High Vt transistors are OFF.
syE
Since High Vt devices appear in series with Low Vt circuit the leakage current is determined by
High Vt devices and is very low.
ngi
So the net static power dissipation is reduced.
nee
r ing
. net
Figure10: Circuit Design with Sleep Transistors
In dynamic threshold CMOS (DTMOS), the threshold voltage is altered dynamically to suit the
operating state of the circuit.
A high threshold voltage in the standby mode gives low leakage current, while a low threshold
voltage allows for higher current drives in the active mode of operation.
Dynamic threshold CMOS can be achieved by tying the gate and body together.
The supply voltage of DTMOS is limited by the diode built-in potential in bulk silicon technology.
The PN diode between source and body should be reverse biased.
ww
Short-circuit power, is caused by the short circuit currents that arise when pairs of PMOS/NMOS
transistors are conducting simultaneously.
In static CMOS circuits, short-circuit path exists for direct current flow from VDD to ground, when
w
VTn< Vin< VDD-|VTp|
.Ea
syE
ngi
nee
r
Figure12: Short Circuit Power in CMOS Circuits ing
. net
One way to reduce short circuit power is to keep the input and output rise/fall times the same. If Vdd
< Vtn + |Vtp| then short-circuit power can be eliminated.
If the load capacitance is very large, the output fall time is larger than the input rise time. The drain-
source voltage of the PMOS transistor is 0.
Hence the short-circuit power will be 0. If the load capacitance is very small, the output fall time is
smaller than the input rise time.
The drain-source voltage of the PMOS transistor is close to VDD during most of the transition
period. Hence the short-circuit power will be very large.
Static latches and Registers Dynamic latches and Registers, Pulse Registers, Sense Amplifier Based
Register, Pipelining, Schmitt Trigger, Monostable Sequential Circuits, Astable Sequential Circuits.
Timing Issues: Timing Classification of Digital System, Synchronous Design.
ww
3.1.1 The Bi-stability Principle
Static memories use positive feedback to create a bitable circuit. A bistab le circuit has two
stable states that represent 0 and 1.
w.E
The basic idea is shown in Figure 3.1a, which shows two inverters connected in cascade
along with a voltage-transfer characteristic (VTC).
asy
The output of the second inverter Vo2 is connected to the input of the firs t V i1, as shown
by the dotted lines in Figure 3.1a.
En
The resulting circuit has only three possible operation points (A, B, and C).
A and B are stable operation points, and C is a metastable operation point.
gi nee
rin
g.n
e t
Figure 3.1: a. Two inverters connected in cascade b. VTCs
Cross-coupled inverter pai r is biased at point C. It is amplified and regen erated around the
circuit loop.
The bias point moves away from C until one of the operation points A or B is reached.
C is an unstable operation point. Every deviation causes the operation point to run away
from its original bias. Operation points with this property are termed as metastable.
A bistable circuit has two stable states. In absence of any triggering, the circuit remains in a
ww
single state.
A trigger pulse must be applied to change the state of the circuit.
Common name for a bistable circuit is flip-flop.
w.E
3.1.2 SR Flip-Flops
asy
The SR or set-reset flip- flop implementation is shown in Figure (a) below.
This circuit is similar to the cross-coupled inverter pair with NOR gates replacing the
inverters.
En
The second input of the NO R gates is connected to the trigger inputs (S and R), that make
gi
it possible to force the outputs Q and Q bar.
nee
These outputs are complimentary (except for the SR = 11 state).
When both S and R are 0, the flip-flop is in a quiescent state and both outputs retain their
value.
rin
If a positive (or 1) pulse is applied to the S input, the Q output is force d into the 1 state
(with Qbar going to 0).
Vice versa, a 1 pulse on R resets the flip-flop and the Q output goes to 0.g.n
e t
Figure 3.3
When both S and R are high, both Q and Q bar are forced to zero. This input mode is
considered to be forbidden.
An SR flip- flop can be im plemented using a cross-coupled NAND structure as shown in
Figure 3.4
Clocked SR flip-flop:
Clocked SR flip- flop (a level-sensitive positive latch) is shown in Figure 3 .5.
It consists of a cross-coup led inverter pair, plus 4 extra transistors to drive the flip- flop
from one state to another a nd to provide clocked operation.
Consider the case where Q is high and R pulse is applied.
The combination of transistors M4, M7, and M8 forms a ratioed inverter.
In order to make the latc h switch, we must succeed in bringing Q below the switching
ww
threshold of the inverter M1-M2.
Once this is achieved, th e positive feedback causes the flip- flop to invert states. This
w.E
requirement forces to increase the sizes of transistors M 5, M6, M7, and M8.
asy
En
gi nee
Figure 3.5 CMOS clocked SR flip-flop
rin
The clocked SR flip- flop does not consume any static power.
g.n
3.1.3 Multiplexer Based Latches:
Multiplexer based latches can provide similar functionality to the SR latch .
e
But sizing of devices only affects performance and is not critical to the functionality. t
Figure 3.6 shows an implementation of static positive and negative latches based on
multiplexers.
For a negative latch, when the clock signal is low, the input 0 of the multiplexer is selected,
and the D input is passed to the output.
When the clock signal is high, the input 1 of the multiplexer connected to the output of the
latch.
The feedback holds the output stable while the clock signal is high.
Similarly in the positive latch, the D input is selected when clock is high and the output is
held (using feedback) when clock is low.
ww
the D input is copied to the Q output.
During this phase, the feed back loop is open due to the top transmission gate is OFF.
w.E
asy
En
gi nee
Figure 3.7 Transistor level implementation of a positive latch built using transmission gates.
rin
To reduce the clock load, implement a multiplexer based NMOS latch using two pass
transistors as shown in Figure 3.8.
g.n
The advantage of this approach is the reduced clock load of only two NMO S devices.
When CLK is high, the latch samples the D input, while a low clock-signal enables the
feedback- loop and puts the latch in the hold mode.
e t
Figure 3.8 Multiplexer based NM OS latch using NMOS only pass transistors for multiplexers.
Explain the operation of master-slave based edge triggered register. (May 2016)
Draw and explain the operation of conventional CMOS, pulsed and resettable latches.
(Nov 2012)
Discuss about CMOS register concept and design master slave trigge red register, explain
its operation with overlapping periods. (April 2018, NOV 2018)
An edge-triggered register is to use a master-slave configuration as shown in Figure 3.9.
The register consists of cascading a negative latch (master stage) with a positive latch
(slave stage).
A multiplexer based latch is used to realize the master and slave stages.
ww
On the low phase of the clock, the master stage is transparent and the D input is passed to
the master stage output, Q M .
w.E
During this period, the slave stage is in the hold mode, keeping its previous value.
On the rising edge of the clock, the master slave stops sampling the input and the slave
asy
stage starts sampling.
During the high phase of the clock, the slave stage samples the output of the master stage
(QM), while the master stag e remains in a hold mode.
En
A negative edge-triggered register can be constructed using the same principle by simp ly
switching the order of the positive and negative latch (i.e., placing the positive latch first).
gi nee
rin
g.n
e
Figure 3.9: Positive edge-triggered register based on a master-slave configuration.
A complete transistor level implementation of the master-slave positive edge-triggered
register is shown in Figure 3.10. t
When clock is low (CLK bar = 1), T1 is ON and T2 is OFF and the D input is sampled onto
node Q M.
During this period, T3 is OFF and T4 is ON and the cross-coupled inverters (I5 & I6) hold
the state of the slave latch.
ww
w.E
asy
En
gi nee
rin
Figure 3.11 Master-slave register based on NMOS-only pass transistors.
When the clock goes high, the slave stage should stop sampling the master stage output and
go into a hold mode.
g.n
However, since CLK and CLK are both high for a short period of time (the overlap period).
output. e
Both sampling pass transistors conduct and there is a direct path from the D input to the Q
As a result, data at the output can change on the rising edge of the clock, which is t
undesired for a negative edge triggered register.
The is known as a race condition in which the value of the output Q is a function of
whether the input D arrives at node X before or after the falling edge of C LK .
If node X is sampled in the metastable state, the output will switch to a value determined
by noise in the system.
Those problems can be avoided by using two non-overlapping clocks PHI and PH2.
By keeping the non-overlap time tnon_overlap between the clocks large enough, such that no
overlap occurs even in the presence of clock-routing delays.
During the non-overlap tim e, the FF is in the high- impedance state.
w.E
3.1.6 Low-Voltage Static Latches:
The scaling of supply voltages is critical for low power operation.
At very low power supply voltages, the input to the inverter cannot be raised above the
asy
switching threshold, resulting in incorrect evaluation.
Scaling to low supply voltages, hence requires the use of reduced threshold devices.
En
Multiple Threshold devices as shown in Figure 3.13.
The shaded inverters and transmission gates are implemented in low-threshold devices.
gi
The low threshold inverters are gated using high threshold devices to eliminate leakage.
nee
During normal mode of operation, the sleep devices are tuned on.
During idle mode, the high threshold devices in series with the low threshold inverter are
turned OFF (the SLEEP signal is high), eliminating leakage.
rin
g.n
e t
Figure 3.13: One solution for the leakage problem in low-voltage operation using MTCMOS.
************************** ************************************** **************
Discuss about the des ign of sequential dynamic circuits. (Nov 2012 , Nov 2017)
Explain the methodology of sequential circuit design of flip-flop. (May 2014)
A stored value remains valid as long as the supply voltage is applied to t he circuit, hence
the name static.
The major disadvantage of the static gate is, its complexity.
Registers are used in computational structures that are constantly clocked, such as pipelined
data path.
The requirement that the memory should hold state for extended periods of time.
This results in circuits, bas ed on temporary storage of charge on parasitic capacitors.
The principle is identical to the dynamic logic. In dynamic logic, logic signal is a charge,
ww
stored on a capacitor.
The absence of charge denotes as logic 0 and presence of charge denotes a s logic 1.
A stored value can be kept for a limited amount of time (range of milliseconds).
w.E
A periodic refresh of its value is necessary.
asy
3.2.1 Dynamic Trans mission-G ate Based Edge-triggered Registers:
Design a d-flipflop using trans mission gate. (Nov 2016)
Figure 3.14.
En
A dynamic positive edge-triggered register based on the master-slave concept is shown in
gi
When CLK = 0, the input data is sampled on storage node 1. It h as an equiva lent
nee
capacitance of C1 consisting of the gate capacitance of I1, the junction capacitance of T1,
and the overlap gate capacitance of T1.
rin
During this period, the slave stage is in a hold mode with node 2 in a high-impedance state.
On the rising edge of clock, the transmission gate T2 turns on. The value is sampled on
node1 before the rising edge propagates to the output Q.
Node 2 stores the inverted version of node 1. g.n
The reduced transistor provides high-performance and low-power systems.
e t
ww
w.E
asy
Figure 3.15 Impact of non-overlapping clocks
2
3.2.2 C MOS Dynamic Register:
2 En
The C MOS Register
gi nee
Figure 3.16 shows positive edge-triggered register based on the master-slave concept,
2
which is insensitive to clock overlap. This circuit is called the C MOS (Clocked CMOS)
register.
rin
1. CLK = 0 (CLK = 1):
g.n
sampling the inverted of D on the internal node X.
e
o The first tri-state driver is turned ON. The master stage act s as an inverter,
o The master stage is, in evaluation mode. The slave section is, in h old mode.
o Both transistors M7 and M8 are OFF, decoupling the output from the input. The
t
output Q retains i ts previous value stored on the output capacitor C L2.
ww
2
Figure 3.16 C MOS master-slave positive edge-triggered register.
It is similar to the trans mission-gate based register, presented earlier. However, there is
2
w.E
an important difference.
A C MOS register with CLK-CLK clocking is insensitive to overlap, as long as the rise
asy
and fall times of the clock edges are small.
En
gi nee
rin
g.n
2
Figure 3.17 C MOS D FF during overlap periods. e t
3.2.3 True Single-Phase Clocked Register (TSPCR):
Explain the operation of True Single Phase Clocked Register. (Nov 2016, April 2017)
In the two-phase clocking schemes, care must be taken in routing the two clock signals to
ensure that overlap is minimized.
2
While the C MOS provides a skew-tolerant solution, it is possible to design registers that
only use a single phase clock.
ww
A register can be constructed by cascading positive and negative latches.
The main advantage is the use of a single clock phase.
w.E
The disadvantage is, increase in the number of transistors (12 transistors are required).
asy
En
gi nee
rin
Figure 3.19 TSPC approach.
g.n
The TSPC latch circuits can be reduced, as in Figure 3.20, where only the first inverter is
controlled by the clock.
Number of transistors is reduced and clock load is reduced by half.
e t
Explain in detail about timing issues needed for a logic operation. (April 2017)
Explain the timing basics in synchronous design in detail. (Nov 2017)
(A)Sequencing methods:-
Three methods of sequencing block of combinational logic are possible, as shown in
figure below.
In flip-flop based system, one flip flop use one cycle boundary.
Token (data) advances from one cycle to the next on the rising edge. If a token arrives too
early, it waits at the flip flop until next cycle.
In 2-phase system, phases may be separated by tnonoverlap. [tnonoverlap>0]
In pulsed system, pulse with is tpw .
ww
w.E
asy
En
gi nee
rin
g.n
e t
In 2-phase system, full cycle of combinational logic is divided into two phases,
sometimes called ―half-cycles‖. Two latch clocks are called 1 and 2.
Flip flop can be viewed as, a pair of back to back latches using clk and its
complements.
Table shows delay and timing notations of combinational and sequencing elements.
These delays may differ for rising (with suffix ‗r‘) and falling (with suffix ‗f‘).
The delay with timing diagram for all three sequencing elements are, as shown in
figure below.
ww
In combinational logic, input A changing to another value, output Y cannot change
instantaneously. After the contamination delay {tcd}, Y may begin to change (or) glitch.
Output Y settles to a value in propagation delay {t pd} .
w.E
Input D in flip flop must have settled by some setup time {tsetup} before the rising edge
of clock and should not change again until, a hold time {thold} after the clock edge.
asy
En
gi nee
rin
g.n
e t
The output begins to change after a clock-to-contamination delay {tccq} and completely
settles after clock to-Q propagation delay {tpcq}.
ww
w.E
asy
E ngi
nee
rin
g.n
et
ww
w.E
The clock period must be
≥ + +
asy ≤ −(
or)
+ )
En
Where, tsetup+tpcq – sequencing overhead.
rin
If the hold time is large and contamination delay is small, data can incorrectly propagate
through successive elements, on one clock edge.
g.n
This corrupt the state of the system called, race condition (or) hold time failure (or) min-
delay failure.
It can be fixed by redesigning the logic and not by slowing the clock.
e t
≥ ℎ −
ww
w.E
asy
En
≤ 2
gi−( −
nee
)
rin
(E) Clock Skew
g.n
Analyze the impact of spatial variations of clock signal on edge-triggered sequential logic
circuits. (NOV 2018)
e
The spatial variation in arrival time of a clock transition in an integrated circuit is referred
as clock skew.
t
The clock skew between two points i and j on a IC is given by δ(i,j)=t i-tj, where t i and t j are
the position of the rising edge of the clock with respect to a reference.
The clock skew can be positive or negative, depending upon the routing direction and
position of the clock source.
The timing diagram for the case with positive skew, is shown in figure.
In the figure, the rising clock edge is delayed by a positive δ at the second register.
Figure: Timing diagram to study the impact of clock skew on p erformance and
functionality. In this sample timing diagram, δ >0.
(F) Clock Jitter:
Clock jitter is the temporal variation of the clock period at a given point. The clock period
can reduce or expand on a cycle-by-cycle basis. It is a temporal uncertaintyy measure.
Cycle-to-cycle jitter refers to time varying deviation of a single clock perio d.
For a given spatial location, i is given as T jitter, i(n) = Ti,n+1 – Ti,n – TCLK.
ww Where Ti,n is the clock period for period n, T i,n+1 is clock period for periood n+1, and
TCLK is the nominal clock period.
Jitter directly impacts the performance of a sequential system.
w.E
Figure shows the nominal clock period as well as variation in period.
asy
En
gi nee
rin
Figure: Circuit for studying the impact of jitter on performance.
g.n
e
************************** ************************************** *********
3.4 Pipelining:
Explain in detail about pipelining structure needed for a logic operation. (A pril 2017, Nov 2017)
Discuss in detail various pipeli ning approaches to optimize sequential circuits. (May 2013, 2016)
t
Pipelining is a design tech nique used to accelerate the operation of the datapaths in digital
processors.
The idea is explained with Figure 3.22a.
The goal of the circuit is to compute log(|a - b|), where both a and b represent streams of
numbers.
The minimal clock period Tmin necessary to ensure correct evaluation is given as:
T t
min c q tpd,logictsu
16 UNIT-III –SEQUENTIAL LOGIC CIRCUITS – EC 8095-VLSI DESIGN
Downloaded From: www.EasyEngineering.net
Where, tc-q and tsu are theDownloaded From: www.EasyEngineering.net
propagation delay and the set-up time of the register respectively.
Registers are edge-triggered D registers.
The term tpd,logic stands for the worst-case delay path through the combinatorial network,
which consists of the adder, absolute value and logarithm functions.
In conventional systems, th e delay is larger than the delays associated with the registers
and dominates the circuit performance.
Assume that each logic module has an equal propagation delay.
Each logic module is then, active for only 1/3 of the clock period.
Pipelining is a technique to improve the resource utilization and increase the functional
throughput.
Introduce registers between the logic blocks, as shown in Figure 3.22b.
This causes the computation for one set of input data to spread over a number of clock
periods, as shown in Table 1.
ww The result for the data set (a1, b1) only appears at the output after three clock-periods.
w.E
asy
En
gi nee
Figure 3.22 Data path for the computation of log(|a + b|).
sets, (a2, b2) and (a3, b3).
rin
At that time, the circuit has already performed parts of the computations for the next data
g.n
The computation is performed in an assembly- line fashion, hence the name pipeline.
The combinational circuit block has been partitioned into three sections, each of which has
a smaller propagation dela y than the original function.
This reduces the value of the minimum allowable clock period:
T t max( t
e t
min, pipe
c q pd,add pd,abs pd,log
,t ,t )
Suppose all logic blocks have same propagation delay and that the reg ister overhead is
small with respect to the logic delays.
w.E
3.4.1 NORA-CMOS—A Logic Style for Pipelined Structures:
asy
Discuss about the NORA–C MOS structure. (Nov 2016)
2
The latch-based pipeline circuit can also be implemented using C MOS latches, as shown
in Figure 3.24.
En
This topology has one additional property:
2
gi
A C MOS-based pipelined circuit is race-free as long as all the logic functions
nee
F (implemented usi ng static logic) between the latches are noninver ting.
rin
g.n
2
e t
Figure 3.24 Pipe lined datapath using C MOS latches
The only way, a signal can race from stage to stage under this condition is, when the logic
function F is inverting, as in Figure 3.25.
Here F is replaced by a single, static CMOS inverter. Similar considerations are valid for
the (1-1) overlap.
2
It combines C MOS pipeline registers and NORA dynamic logic function blocks.
Each module consists of a block of combinational logic, that can be a mixture of static and
2
dynamic logic, followed by a C MOS latch.
2
Figure 3.25 Potential race condition during (0-0) ove rlap in C MOS-based design.
Logic and latch are clocked, in such a way that both are simultaneously in either evaluation
or hold (precharge) mode.
A block that is, in evaluation during CLK = 1 is called a CLK- module, while the inverse is
called a CLK -module.
The operation modes of the modules are summarized in Table 2.
ww
w.E
asy
Table 2: Operation modes for NORA logic modules.
En
3.4.2 Latch- vs. Register-Based Pipelines:
Consider the pipelined circuit of Figure 3.23.
g.n
The pipeline system is implemented based on pass-transistor-based positive and negative
latches instead of edge triggered registers.
e
Here logic is introduced between the master and slave latches of a master-s lave system.
When the clocks CLK and CLK are non-overlapping, correct pipeline operation is obtained. t
Input data is sampled on C 1 at the negative edge of CLK and the computation of logic
block F starts.
The result of the logic block F is stored on C 2 on the falling edge of CLK and the
computation of logic block G starts.
The non-overlapping of the clocks ensures correct operation.
The value stored on C 2 at the end of the CLK low phase, is the result of passing the
previous input through the logic function F.
w.E
3.5 Choosing a Clocking Strategy:
asy
Choosing the right clocking scheme affects the functionality, speed and power of a circuit.
En
The simple clocking scheme is the two-phase master-slave design.
The predominant approach is use the multiplexer-based register and to generate the two
gi
clock phases locally, by simply inverting the clock.
nee
High-performance CMOS VLSI design is using simple clocking schemes, even at the
expense of performance.
ww
Asynchronous design avoids the problem of clock uncertainty by eliminating the need for
globally-distributed clocks.
w.E
The important issues of synchronization, which is required when interfacing different clock
domains or when sampling an asynchronous signal.
Classification of Digital Systems:
asy
In digital systems, signals can be classified depending on, how they are related to a local
clock.
En
Signals that transition only at predetermined periods in time can be classified as
synchronous, mesochronous and plesiochronous with respect to a system clock.
nee
A synchronous signal has the exact same frequency and a known fixed p hase offset with
respect to the local clock.
rin
In such a timing methodology, the signal is ―synchronized‖ with the clock and the data can
be sampled directly without any uncertainty.
g.n
In digital logic design, synchronous systems are straight forward type of interconnect,
e
where the flow of data in a circuit proceeds with the system clock as shown below.
ww
A (mesochronous) synchronizer can be used to synchronize the d ata signal with the
receiving clock as shown below.
w.E
The synchronizer serves to adjust the phase of the received signal to ensure proper
sampling.
asy
En
3.6.3 Plesiochronous Interconnect
gi nee
A plesiochronous signal has nominally the same, but slightly different frequency as the
local clock (―plesio‖ from Greek is near).
rin
This scenario can easily arise when two interacting modules have independent clocks
generated from separate crystal oscillators.
g.n
Since the transmitted sign al can arrive at the receiving module at a different rate than the
e
local clock, one need to utilize a buffering scheme to ensure, all data is received.
A possible framework for plesiochronous interconnect is shown in Figure below.
t
ww
************************** ************************************** *
3.7 Clock-Distribution Techniques
w.E
Explain the clock distribution techniques in synchronous design in detail. (Nov 2017)
Design a clock distribution network based on H tree model for 16 nodes. (April 2018)
asy
Clock skew and jitter are m major issues in digital circuits and they limit the performance
of a digital system.
En
It is necessary to design a c lock network, that minimizes skew and jitter.
gi
Another important consideration in clock distribution is the power dissipation.
nee
In most high-speed digital processors, a majority of the power is dissipated in the clock
network.
rin
To reduce power dissipation, clock networks must support clock conditioning, the ability to
shut down parts of the clock network.
Unfortunately, clock gating results in additional clock uncertainty.
Fabrics for clocking: g.n
various parts of the chip. e
Clock networks include an n network that is used to distribute a global reference to
A final stage is responsible for local distribution of the clock, while considering the local
t
load variations.
Most clock distribution schemes use the absolute delay from a central clock source to the
clocking elements.
Therefore one common approach to distributing a clock is, to use balance d paths (or called
trees).
The most common type of clock primitive is, the H-tree network (named for the physical
structure of the network) in figure, where a 4x4 array is shown.
In this scheme, the clock is routed to a central point on the chip and bala nced paths.
w.E
Latch-Based Clocking:
The use of a latch based methodology (in Figure) enables more flexible timing, allowing
asy
one stage to pass slack to or steal time from following stages.
This flexibility allows an overall performance increase.
En
In this configuration, a stable input is available to the combinational logic block A
(CLB_A) on the falling edge of CLK1 (at edge2).
of CLK_B is launched. gi
On the falling edge of CL K2 (at edge3), the output CLB_A is latched and the computation
nee
CLB_B computes on the low phase of CLK2 and the output is available on the falling edge
of CLK1 (at edge4).
rin
This timing appears equivalent to having an edge-triggered system where CLB_A and
CLB_B are cascaded and between two edge-triggered registers.
g.n
In both cases, it appears that the time available to perform the combination of CLB_A and
CLB_B are TCLK.
e t
Figure: Latch-based design in which transparent latches are separated by combinational logic.
************************** ************************************** **************
A more reliable and robust technique is the self- timed approach, which presents a local
solution to the timing problem.
Figure uses a pipelined datapath to illustrate how this can be accomplished.
The computation of a logic block is initiated by asserting a Start signal.
The combinational logic block computes on the input data.
This signaling ensures the logical ordering of the events and can be achieved with the
w.E
asy
En
gi nee
Figure: Self- timed, pipelined datapath.
In the case of the pipelined datapath, the scenario could proceed as follows.
rin
1. An input word arrives, and a Req(uest) to the block F1 is raised. If F1 is inactive at that
g.n
time, it transfers the data and acknowledges this fact to the input buffer.
2. F1 is enabled by raising the Start signal. After a certain amount of time, dependent upon the
e
data values, the done signal goes high indicating the completion of the computation.
3. A Req(uest) is issued to the F 2 module. If this function is free, an Ack(nowledge) is raised,
the output value is transferred and F1 can go ahead with its next computation. t
3.8.2 A simple synchronizer
How do eliminates met stability problem in seque ntial circuit and explain?
ww
The output X may be detestable for some time, but will settle to a good level with high
probability, if we wait long enough.
F2 samples X and produce an output Q that should be a valid logic level and be aligned
w.E
with the clock.
The synchronizer has a latency of one clock cycle Tc .
asy
3.8.3 Communicating between asynchronous clock domains
A common application of synchronizers is in communication between asynchronous
En
clock domains, i.e., blocks of circuits that do not share a common clock.
gi
Suppose System A is controlled by clkA that needs to transmit N-bit data words to
System B, which is controlled by clkB, as shown in Figure.
nee
The systems can represent separate chips or separate units within a chip using unrelated
clocks.
rin
System A must guarantee that the data is stable, while the flip-flops in System B sample
the word.
g.n
It indicates when new data is valid by using a request signal (Req), so System B receives
the word exactly once rather than zero or multiple times.
e
System B replies with an acknowledge signal (Ack), when it has sampled the data, so
System A knows when the data can safely be changed.
If the relationship between clkA and clkB is completely unknown, a synchronizer is
t
required at the interface.
ww
w.E
asy
En Figure: Arbiter
Figure (b) shows an arbiter built from an SR latch and a four-transistor metastability
filter.
gi nee
If one of the request inputs arrives well before the other, the latch will respond
appropriately.
rin
If they arrive at nearly the same time, the latch may be driven into metastability, as
shown in Figure (c).
g.n
The filter keeps both acknowledge signals low, until the voltage difference between the
internal nodes n1 and n2 exceeds Vt , indicating that a decision has been made.
Such an asynchronous arbiter will never produce metastable outputs.
The self- timed approach offers a potential solution to the growing clock-distribution
problem.
It translates the global clock signal into a number of local synchronization problems.
Handshaking logic is needed to ensure the logical ordering of the circuit events and to
avoid race conditions.
ww Sense amplifier circuits accept small input signals and amplify them to generate rail- to-rail
swings.
w.E
There are many techniques to construct these amplifiers, with the use of feedback (e.g.,
cross-coupled inverters).
asy
En
gi nee
rin
g.n
Positi ve edge-triggered register based on sense-ampli fier
e t
The circuit uses a precharged front-end amplifier that samples the differential input signal on
the rising edge of the clock signal.
The outputs of front-end are fed into a NAND cross-coupled SR FF that holds the data and
guarantees that the differential outputs switch only once per clock cycle.
The differential inputs in this implementation don‘t have to have rail-to-rail swing and hence this
register can be used as a receiver for a reduced swing differential bus.
ww
w.E
asy
En
gi nee
rin
g.n
•
•
This in turn activates MN, pulling X and eventually CLKG low.
e
The length of the pulse is controlled by the delay of the AND gate and the two inverters.
In the preceding sections, we have focused on one single type of sequential element, this is the latch (and
its sibling the register). The most important property of such a circuit is that it has two stable states, and is
hence called bistable. The bistable element is not the only sequential circuit of interest. Other
regenerative circuits can be catalogued as astable and monostable. The former act as oscillators and can,
for instance, be used for on-chip clock generation. The latter serve as pulse generators, also called one-
shot circuits. Another interesting regenerative circuit is the Schmitt trigger. This component has the useful
prop-erty of showing hysteresis in its dc characteristics—its switching threshold is variable and depends
ww
upon the direction of the transition (low-to-high or high-to-low). This peculiar feature can come in handy
in noisy environments.
w.E
The Schmitt Trigger
Definiti on
asy
A Schmitt trigger [Schmitt38] is a device with two important properties:
It responds to a slowly changing input waveform with a fast transition time at the output.
En
The voltage-transfer characteristic of the device displays different switching thresh-olds for positive- and
gi
negative-going input signals. This is demonstrated in Figure 7.46, where a typical voltage-transfer
characteristic of the Schmitt trigger is shown (and its schematics symbol). The switching thresholds for
nee
the low-to-high and high-to-low transitions are called VM+ and VM− , respectively. The hysteresis
voltage is defined as the difference between the two.
rin
g.n
e t
One of the main uses of the Schmitt trigger is to turn a noisy or slowly varying input signal into a clean
digital output signal. This is illustrated in Figure 7.47. Notice how the hysteresis suppresses the ringing on
the signal. At the same time, the fast low-to-high (and high-to-low) transitions of the output signal should
be observed. For instance, steep signal slopes are beneficial in reducing power consumpt ion by
suppressing direct-path currents. The ―secret‖ behind the Schmitt trigger concept is the use of positive
feedback.
CMOS Implementation
ww
One possible CMOS implementation of the Schmitt trigger is shown in Figure 7.48. The idea behind this
circuit is that the switching threshold of a CMOS inverter is determined by the (k n /kp ) ratio between the
NMOS and PMOS transistors. Increasing the ratio results in a reduction of the threshold, while decreasing
w.E
it results in an increase in VM.
Adapting the ratio depending upon the direction of the transition results in a shift in the switching
threshold and a hysteresis effect. This adaptation is achieved with the aid of feedback.
asy
En
gi nee
rin
g.n
e
Suppose that Vin is initially equal to 0, so that Vout = 0 as well. The feedback loop biases the PMOS
transistor M4 in the conductive mode while M3 is off. The input signal effectively connects to an inverter t
consisting of two PMOS transistors in parallel (M2 and M4) as a pull-up network, and a single NMOS
transistor (M1) in the pull-down chain.
This modifies the effective transistor ratio of the inverter to kM1/(kM2+kM4), which moves the
switching threshold upwards.
Once the inverter switches, the feedback loop turns off M4, and the NMOS device M3 is activated.
This extra pull-down device speeds up the transition and produces a clean output signal with steep slopes.
A similar behavior can be observed for the high-to-low transition. In this case, the pull-down network
originally consists of M1 and M3 in parallel, while the pull-up network is formed by M2.
This reduces the value of the switching threshold to VM–.
It is possible to shift the switching point by changing the sizes of M 3 and M 4 . For example, to modify the
low-to-high transition, we need to vary the PMOS device. The high-to-low threshold is kept constant by
keeping the device width of M 3 at 0.5 µ m. The device width of M 4 is varied as k * 0.5µ m. Figure 7.49b
demonstrates how the switching threshold increases with raising values of k.
ww
w.E
asy
En
gi nee
Monostable Sequential Circuits rin
g.n
A monostable element is a circuit that generates a pulse of a predetermined width every time the quiescent circuit is
e
triggered by a pulse or transition event. It is called monostable because it has only one stable state (the
quiescent one). A trigger event, which is either a signal transition or a pulse, causes the circuit to go
temporarily into another quasi-stable state.
This means that it eventually returns to its original state after a time period deter-mined by the circuit t
parameters. This circuit, also called a one-shot, is useful in generating pulses of a known length. This
functionality is required in a wide range of applications. We have already seen the use of a one-shot in the
construction of glitch registers.
This circuit detects a change in a signal, or group of signals, such as the address or data bus, and produces
a pulse to initialize the subsequent circuitry.
The most common approach to the implementation of one-shots is the use of a sim-ple delay element to
control the duration of the pulse.
The concept is illustrated in Figure 7.51. In the quiescent state, both inputs to the XOR are identical, and
the output is low.
A transition on the input causes the XOR inputs to differ temporarily and the output to go high. After a
delay td (of the delay element), this disruption is removed, and the output goes low again.
A pulse of length td is created. The delay circuit can be realized in many different ways, such as an RC-
network or a chain of basic gates.
In
DE L AY
Out
td t
d
ww
discussed in detail in a later chapter (on timing).
The ring oscillator is a simple, example of an astable circuit. It consists of an odd number of inverters
w.E
connected in a circular chain. Due to the odd number of inversions, no stable operation point exists, and
the circuit oscillates with a period equal to 2 × tp × N, with N the number of inverters in the chain and tp
the propagation delay of each inverter.
asy
The simulated response of a ring oscillator with five stages is shown in Figure 7.52 (all gates use
minimum-size devices). The observed oscillation period approximately equals 0.5 nsec,which
En
corresponds to a gate propagation delay of 50 psec. By tapping the chain at various points, different
phases of the oscillating waveform are obtained (phases 1, 3, and 5 are discussed.
3.0
gi nee
2.5
v1 v3 v5
rin
g.n
2.0
1.5
e
1.0
0.5
0.0
-0.5
0.0 0.5
time (ns ec)
1.0 1.5
Figure 7.52 Simulated
waveforms of five-stage ring
osc illat or. T he outputs of st ages 1,
3, and 5 are shown.
t
The ring oscillator composed of cascaded inverters produces a waveform with a fixed oscillating frequency
determined by the delay of an inverter in the CMOS process. In many applications, it is necessary to
control the frequency of the oscillator.
An example of such a circuit is the voltage-controlled oscillator (VCO), whose oscillation frequency is a
function (typically non-linear) of a control voltage.
The standard ring oscillator can be modified into a VCO by replacing the standard inverter with a current-
starved inverter as shown in Figure 7.53 [Jeong87]. The mechanism for controlling the delay of each
inverter is to limit the current available to discharge the load capacitance of the gate.
V
DD
M
2
In Out
M
1
I
ref
ww
In this modified inverter circuit, the maximal discharge current of the inverter is lim-ited by adding an
extra series device. Note that the low-to-high transition on the inverter can also be controlled by adding a
w.E
PMOS device in series with M 2 .
The added NMOS transistor M 3 , is controlled by an analog control voltage Vcntl , which determines the
avail-able discharge current. Lowering Vcntl reduces the discharge current and, hence, increases tpHL.
asy
The ability to alter the propagation delay per stage allows us to control the frequency of the ring
structure. The control voltage is generally set using feedback techniques. Under low operating current
En
levels, the current-starved inverter suffers from slow fall times at its output. This can result in significant
short-circuit current.
gi
This is resolved by feeding its output into a CMOS inverter or better yet a Schmitt trigger. An extra
inverter is needed at the end to ensure that the structure oscillates.
nee
rin
g.n
e t
Arithmetic Building Blocks: Data Paths, Adders, Multipliers, Shifters, ALUs, power and speedtradeoffs,
Case Study: Design as a tradeoff.
Designing Memory and Array structures: Memory Architectures and Building Blocks, Memory Core,
Memory Peripheral Circuitry.
ww
Data path circuits are meant for passing the data from one segment to other segment
for processing or storing.
The data path is the core of processors, where all computations are performed.
w.E
It is generally defined with general digital processor. It is shown in figure.
asy
En
gi nee
rin
Figure: General digital processor
If only data path and its communication is shown as g.n
e t
In this, data is applied at one port and data output is obtained at second port.
Data path block consists of arithmetic operation, logical operation, shift operation and
temporary storage of operands.
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
UNIT-IV –EC6601 VLSI DESIGN
Data paths are arranged in a bit sliced organization.
Instead of operating on single bit digital signals, the data in a processor are arranged in a
word based fashion.
Bit slices are either identical or resemble a similar structure for all bits.
The data path consists of the number of bit slices (equal to the word length), each
operating on a single bit. Hence the term is bit-sliced.
ww
w.E
Figure: Bit-sliced datapath organization
******************************************************************************
ww
CMOS implementation can produce different delay parts.
th
tdi- worst case delay through the i stage. We can calculate the total delay using
w.E
the following equation
t4b = td3+td2+td1+td0 and td0 = td (a0, b0 c1)
This is the time for the input to produce the carry out bit.
asy
td1 = td2 = td (cin cout)
td3 = td (cin S3)
En
t4b = td (cin S3) +2td (cin cout) + td (a0, b0 c1)
If it is extend to n-bit, then the worst case delay is
gi
Worst case delay linear with the number of bits
td = O(N) nee
tadder = (N-1)tcarry + tsum
The figure below shows 4-bit adder/subtractor circuit. rin
In this, if add/sub=0, then sum is a+b. If add/sub=1, then the output is a-b.
g.n
e t
Figure: 4-bit adder/subtractor circuit
Sum and carry expressions are designed using static CMOS.
It requires 28 transistors+̅(+ which+ lead) large area and circuit+ is slow+.
Sum, S= and Carry, C0=
Drawbacks:
Circuit is slower.
V
DD
V
DD C
i A B
A B
A
B
Ci B V
DD
A
X C
i
C
i A S
Ci
V
A B B DD
ww A
Co
B
C
i A
asy
******************************************************************************
III. Carry Look Ahead Adder (CLA):
En
Explain the operation and design of Carry lookahead adder (CLA). (May 2017, Nov 2016)
nee
Explain the concept of carry lookahead adder and discuss its types. (April 2018)
A carry-lookahead adder (CLA) is a type of adder used in digital circuit.
rin
A carry-lookahead adder improves speed by reducing the amount of time required to
determine carry bits.
In ripple carry adder, carry bit is calculated alongwith the sum bit.
g.n
e
Each bit must wait until the previous carry is calculated to begin calculating its own result and
carry bits. t
The carry-lookahead adder calculates one or more carry bits before the sum, which reduces
the wait time to calculate the result of the larger value bits.
A ripple-carry adder works starting at the rightmost (LSB) digit position, the two
corresponding digits are added and a result obtained. There may be a carry out of this digit
position.
Accordingly all digit positions other than LSB. Need to take into account the possibility to
add an extra 1, from a carry that has come in from the next position to the right.
Carry lookahead depends on two things:
Calculating, for each digit position, whether that position is going to propagate a carry
if one comes in from the right.
Combining these calculated values to be able to realize quickly whether, for each
group of digits, that group is going to propagate a carry.
w.E P(A,B) A B
These adders are used to overcome the latency which is introduced by the rippling effect
of carry bits.
asy
Write carry look-ahead expressions in terms of the generate gi and propagate pi signals.
En
The general form of carry signal ci thus becomes
ci 1 ai .bi ci .( ai bi ) g i ci . pi
pi ai bi nee
Sum and carry expression are written as,
Si = ai bi rin
c1=g0+p0.c0
g.n
e
c2=g1+p1.c1= g1+p1.( g0+p0.c0)
c3=g2+p2.c2
c4=g3+p3.c3 = g3+p3.g2+ p3.p2.g1+ p3.p2.p1.g0 + p3.p2.p1.p0.c0
t
w.E
asy
En
gi nee
Figure – Sum calculation using the CLA network
rin
The symmetry in the array is shown in mirror. It allows more structured layout at
the physical design level.
g.n
e t
ww
The Manchester carry chain is a variation of the carry-lookahead adder that uses shared
logic to lower the transistor count.
A Manchester carry chain generates the intermediate carries by tapping off nodes in the
w.E
gate that calculates the most significant carry value.
Dynamic logic can support shared logic, as transmission gate logic.
asy
One of the major drawbacks of the Manchester carry chain is increase the propagation delay.
A Manchester-carry-chain section generally won't exceed 4 bits.
En
In this adder, the basic equation is ci 1 g i ci . pi
Where pi ai bi and g i ai .bi
rin
If Ki=1, then pi=0 and gi=0. Hence, ki is known as carry kill bit.
g.n
e t
Table
ww
If ϕ = 1, then evaluation occur.
w.E
asy
En
gi
Figure dynamic circuit
nee
Dynamic Manchester carry chain for the carry bit upto C4 is shown below. C1, C2, C3,
rin
C4 can be taken by using inverters. The carry input is given as C0
g.n
e t
******************************************************************************
Design a carry bypass adder and discuss its features. (May 2016)
ww
It is high speed adder. It consist of adder, AND gate and OR gate.
An incoming carry Ci,0=1 propagates through the complete adder chain and an
outgoing carry C0,3=1.
w.E
In other words, if (P0P1P2P3 =1) then C0,3= Ci,0 else either DELETE or
GENERATE occurred.
asy
It can be used to speed up the operation of the adder, as shown in below fig (b).
En
gi nee
Figure: Carry Skip Adder. rin
Hence the name carry bypass adder or carry skip adder. g.n
When BP= P0P1P2P3 =1, the incoming carry is forwarded immediately to the next block.
e
Idea: if (P0 and P1 and P2 and P3 =1) the C03 = C0, else “kill” or “generate”.
t
t
Sum Sum Sum sum Sum
M bits
ww
w.E
asy
En
Figure: Manchester carry-chain implementation of bypass adder
g.n
The carry-select adder is simple but rather fast, having a gate level depth of O( n ).
e
The carry-select adder generally consists of two ripple carry adders and a multiplexer.
Adding two n-bit numbers with a carry-select adder is done with two adders in order to
perform the calculation twice. t
One time with the assumption of the carry-in being zero and the other assuming it will be
one.
After the two results are calculated (the correct sum as well as the correct carry-out), it is
then selected with the multiplexer once the correct carry-in is known.
The number of bits in each carry select block can be uniform, or variable.
In the uniform case, the optimal delay occurs for a block size of n .
The O( n ) delay is derived from uniform sizing, where the ideal number of full-adder
elements per block is equal to the√square2 root of the number of bits being added.
Propagation delay, P is equal to where N = N- bit adder
Below is the basic building block of a carry-select adder, where the block size is 4.
Two 4-bit ripple carry adders are multiplexed together, where the resulting carry and
sum bits are selected by the carry-in.
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
UNIT-IV –EC6601 VLSI DESIGN
ww
w.E
asy
En
Uniform-sized adder:
gi nee
Figure: Building blocks of a carry-select adder
(iii) w.E
Carry Save Adder:
asy
Carry save adder is similar to the full adder. It is used when adding multiple numbers.
All the bits of a carry save adder work in parallel.
En
In carry save adder, the carry does not propagate. So, it is faster than carry propagate adder.
nee
rin
g.n
e t
ww
w.E Figure: Half adder and Truth table
Full adder circuit has three inputs and two outputs
asy
En
gi nee
rin
Figure : Full adder and truth table
CPL --- Complementary Pass Logic
g.n
e t
Explain the design and operation of 4 x 4 multiplier circuit. (Apr. 2016, 2017, Nov 2016, 2018)
Design a multiplier for 5 bit by 3 bit. Explain its operation and summarize the numbers
of adders. Discuss it over Wallace multiplier. (Nov 2017, April 2018)
A study of computer arithmetic processes will reveal that the most common requirements
are for addition and subtraction.
There is also a significant need for a multiplication capability.
Basic operations in multiplication are given below.
0 x 0 = 0, 0 x 1 = 0, 1 x 0 = 0, 1x1=1
ww
w.E
x
asy 1
1
0
0
1
1
1
0
0
0
1
1
1
0
1
0
Multiplicand
Multiplier
En
0 1 0 1 0
Partial products
gi
0 0 0 0 0 0
1 0 1 0 1 0
1 1 1 0 0 1 1 1 0
nee
Result
If two different 4-bit numbers (x0, x1, x2, x3 & y0, y1, y2, y3)are multiplied then
rin
g.n
e t
Multiplication by shifting:
If x=(0020)2 = (2)10
If it is to be multiplied by 2, then we can shift x in left side. x = (0100)2 = (4)10
If it is to be divided by 2, then we can shift in right side. x = (0001)2 = (1)10.
So, shift register can be used for multiplication or division by 2.
ww
w.E
asy
En
gi
A practical implementation is based on the sequence. The product is obtained
by successive addition and shift right operations
(i) Array multiplier: nee
rin
g.n
e t
Figure: General block diagram of multiplier
Array multiplier uses an array of cells for calculation.
Multiplier circuit is based on repeated addition and shifting procedure. Each partial
product is generated by the multiplication of the multiplicand with one multiplier digit.
The partial products are shifted according to their bit sequences and then added.
N-1 adders are required where N is the number of multiplier bits.
The method is simple but the delay is high and consumes large area by using ripple
carry adder for array multiplier. Product expression is given below
ww
w.E Figure: 4 x 4 array multiplier
asy
This multiplier can accept all the inputs at the same time. An array multiplier for n-bit
2
word need n(n-2) full adders, n-half adder and n AND gates.
En
X3 X2 X1 X0 Y0
X3 gi X2
nee X1 X0 Y 1 Z0
HA FA FA
rin HA
X3 X2 X1 X0 Y2
g.n Z1
X3
FA
X2
FA
X1
FA
X0 Y3
HA
Z2
e t
FA FA FA HA
Z Z Z Z Z
7 6 5 4 3
Figure: 4 x 4 array multiplier using Fulladder, Halfadder and AND gate.
ww
w.E
asy
Figure: Block diagram of Booth multiplier
En
e.g. radix 2 encoding.
gi
But in booth multiplication, partial product generation is done based on recoding scheme
nee
Bits of multiplicand (Y) are grouped from left to right and corresponding operation on
multiplier (X) is done in order to generate the partial product.
rin
In radix-2 booth multiplication partial product generation is done based on encoding
which is as given by Table.
g.n
e t
Table: Booth encoding table with RADIX-2
RADIX-2 PROCEDURE:
1) Add 0 to the LSB of the multiplier and make the pairing of 2 from the right to the
left which shown in the figure.
ww Grouping starts from the LSB and the first block contains only two bits of the multipliers
and it assumes zero for the third bit.
asy
These group of binary digits are according to the Modified Booth Encoding Table and it
En
is one of the numbers from the set of (-2, 2, 0, 1, -1).
gi nee
rin
g.n
RADIX-4 PROCEDURE:
Table: Booth encoding table with RADIX-4 e t
1) Add 0 to the right of the LSB of the multiplier.
2) Extend the sign bit 1 position if it is necessary when n is even.
3) Value of each vector, the partial product is coming from the set of (-2, 2, 0, 1, -1).
ww
w.E
asy
En
gi nee
rin
******************************************************************************
VII. DIVIDERS
ww
w.E
asy
En
gi nee
rin
g.n
e t
Basic building blocks of serial adder are given below.
1. 4 bit adder
2. 4 bit binary up counter
3. 2:1 MUX (4 MUXs are used)
4. D flipflop
Y0 Y1 Y2 Y3 are complemented and given to 4 bit adder block (figure shown below)
X0 X1 X2 X3 are given to MUXs and MUX output is given to D flipflop. Select signal
of MUX is high. It is connected to clear input of counter.
Carry output of adder is connected with clock enable pin of counter. The same is given
to OR gate. The output of this OR gate is given to clock enable signal of flipflops.
The other input of OR gate is tied with select signal of MUX.
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
UNIT-IV –EC6601 VLSI DESIGN
If X > Y, C0 of adder is high.
After first subtraction, the counter output is incremented by 1.
For each subtraction, the counter output is incremented.
If C0 of adder is low, then clock of counter and FF is disabled. Counting is stopped.
Q3 Q2 Q1 Q0 is the counter output (Quotient)
R3 R2 R1 R0 is the flipflop output (remainder)
ww
w.E
asy
En
gi nee
rin
******************************************************************************
VIII. SHIFT REGISTERS: g.n
Design 4 input and 4 output barrel shifter using NMOS logic. (NOV 2018).
e t
An n-bit rotation is specified by using the control word R0-n and L/R bit defines a left or
right shifting.
For example y3 y 2 y 1 y 0 = a3 a2 a1 a0
If it is rotated 1-bit in left side, we get If y3 y 2 y 1 y 0 = a2 a1 a0 a3
it is rotated 1-bit in right side, we get y3 y 2 y 1 y 0 = a0 a3 a2 a1
ww
w.E
asy
En
Figure: 8 X 4 barrel shifter
S0, S1, S2, S3 are known as shift lines.
gi
General symbol for barrel shifter is shown in figure. The outputs are given as y3 y 2 y 1 y 0.
nee
A barrel shifter is often implemented as a cascade of parallel 2×1 multiplexers.
For a 8-bit barrel shifter, two intermediate signals are used which shifts by four and two bits,
or passes the same data, based on the value of S[2] and S[1].
rin
g.n
This signal is then shifted by another multiplexer, which is controlled by S[0].
A common usage of a barrel shifter is in the hardware implementation of floating-point
arithmetic.
e t
Logarithmic Shifter:
th
A Shifter with a maximum shift width of M consists of a log2M stages, where the i stage
i
either shifts over 2 or passes the data unchanged.
ww
Maximum shift value of seven bits is shown in figure, to shift over five bits, the first stage is
set to shift mode, the second to pass mode and the last again to shift.
The speed of the logarithmic shifter depends on the shift width in a logarithmic wa, M-bit
w.E
shifter requires log2M stages.
The series connection of pass transistors slows the shifter down for larger shift values.
asy
Advantage of logarithmic shifter is more effective for larger shift values in terms of both area
and speed.
En
gi nee
rin
g.n
******************************************************************************
e t
IX. SPEED AND AREA TRADE OFF:
Discuss the details about speed and area trade off. (May 2017)
Adder:
The tradeoff in terms of power and performance is shown below.
The performance is represented in terms of the delay(speed).
The area estimations for each of the delays are given based on the fact that area is in
relation to the power consumption.
The area of a carry lookahead adder is larger than the area of a ripple carry for a
particular delay.
Figure: Area Vs Delay for 8 bit adder Figure: Area Vs Delay for 16 bit adder
ww
w.E
asy
En
gi
Figure: Area Vs delay for 32 bit adder Figure: Area Vs delay for 64 bit adder
nee
rin
g.n
Figure: Delay Vs Area for all adders
e
Figure: Area Vs Delay for all multiplier t
Above figures shows that the delay of the ripple carry adder increases much faster
when compared to the carry lookahead adder as the number of bits is increased.
In the carry lookahead adder, the cost is in terms of the area because computations are
in parallel, and therefore more power is consumed for a specific delay.
Explain the memory architecture and its control circuits in detail. (April 2018)
When n x m memory is implemented, then, n memory words are arranged in a linear fashion.
One word will be selected at a time by using select line.
If we want to implement the memory 8X8, n=8, m=8(number of bits).
Then we need 8 select signals (one for each word).
But by using decoder we can reduce the number of select signals.
In case of 3 to 8 decoder, if 3 inputs are given to decoder, then we can get 8 select signals.
If n=220, then we can give only 20 inputs to the decoder.
ww
w.E
asy
En
gi nee
rin
g.n
e t
Figure: Array structured memory organization
If basic storage cell size is approximately square, then the design is extremely slow. The
vertical wire, which connects the storage cells to I/O will be excessively large.
So, memory arrays are organized in such a way that vertical and horizontal dimensions
are the same.
The words are stored in a row. These words are selected simultaneously.
The column decoder is used to route the correct word to the I/O terminals.
The row address is used to select one row of memory and column address is used to
select particular word from that selected row.
ww
w.E
asy
En
Fi gure: Hierarchical memory architecture
ww
w.E
Figure: 4 x 4OR ROM cell array Figure: 4 x 4 MOS NOR ROM
Programming ROM
asy
En
The transistor in the intersection of row and column is OFF when the associated word
line is LOW. In this condition, we get logic 1 output.
gi nee
rin
g.n
e t
Figure: 4 x 4 MOS NAND ROM
Advantage: basic cell only consists of transistor. No need of connection to any of the supply
voltage.
Disadvantage: As it has pseudo nMOS, it is ratioed logic and consumes static power.
ww
then
high electric field is generated. So, avalanche injection occurs.
After acquiring energy, electron becomes hot and transverse through the first oxide insulator .
They get trapped on the floated gate.
w.E
The floating gate transistor is known as floating gate avalanche injection MOS or
FAMOS. Disadvantage: High programming voltage is need.
asy
En
gi nee
Figure (a) floating gate transistor (b) symbol
EPROM – Erasable Programmable Read Only Memory:
rin
Erasing is done by passing UV rays on the cell by using transparent window.
g.n
This process will take some seconds to some minutes.
It depends on intensity of UV source. The programming takes 5-10microseconds/word.
During programming, chip is removed from the board and placed in EPROM
programmer. Advantages: simple and large families are fabricated with low cost.
Disadvantages:
Number of erase/program cycle is limited upto 1000.
e t
Reliability is not good.
Threshold voltage of the device may be varied with repeated program.
2
EEPROM – E PROM:
Electrically Erasable Programmable ROM. Here Floating gate tunneling oxide
(FLOTOX) is used.
It is similar to floating gate except that the portion of the floating gate is separated from
the channel at the thickness of 10nm or <10nm.
ww
threshold voltage.
w.E
Flash Memory – Flash Electric ally Erasable Programmable ROM
It is a combination of density of EPROM and versatility of EEPROM.
Avalanche hot electron injecti on mechanism is used.
asy
Erasing can be done by Fowle r-Nordheim tunneling concept. Here erasing is done in
bulk.
En
gi nee
It is similar to FAMOS gate.
Figure: ETOX device
rin
A very thin tunneling oxide la yer (10nm thickness) is there.
g.n
Erasing operation: Erasing can be performed when gate is connected to the ground and the
source is connected to 12V.
e
Write operation: High voltage pulse is applied to the gate of the selected device. Logic 1 is
applied to the drain and hot e lectrons are injected into the floating gate.
t
Read operation: To select a cell, its word line is connected to 5V. It c auses conditional
discharge of the bit line.
Figure: (a) Erase (b) Write (c) Read operation of NOR flash memory
ww
After the small initial word line delay then the values stored at Q and inverse Q are transferred to
the bit lines by leaving BL at 2.5V and the value at inverse Q is discharge through M1, M5.
w.E
asy
En
gi nee
Figure: CMOS SRAM cell
rin
g.n
Write operation:
Assume that Q=1, now logical 0 is to be written in the cell.
Then inverse BL is set to 1 and BL is set to 0.
e
The gate of M1 is at VDD and gate of M4 is at ground as long as the switching is not
commenced.
Inverse Q is not pulled high enough to ensure the writing of logic 1.
t
Cell voltage is kept below 0.4V. The new value of the cell is written through M6.
ww
If logic 0 is stored, then BL2 line is high.
To refresh the cell, first the stored data is read, and its inverse is placed on BL1 and WWL
w.E
line is asserted.
One transistor DRAM:
In this cell, to write logic 1 then it is placed on bit line and word line is asserted high.
asy
The capacitor is charged or discharged depending upon the data. Before performing read
operation, bit line is precharged.
En
gi nee
Figure: One transistor DRAM rin
4.9.3.3.3 CAM – Content Addressable or Associate Memory g.n
Explain about CAM.
UNIT-IV
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
In the compare mode, stored data are compared using bit line. The match line is
connected to all CAM blocks in a row. And it is initially precharged to VDD.
If there is some match occurs, then internal row is discharged. If even one bit in a row is
mismatched, then the match line is low.
*****************************************************************************
w.E
4.10 Memory peripheral (control) Circuits:
Explain the memory architecture and its control circuits in detail. (April 2018)
(i)
asy
Address & Block Decoders:
Row Decoder:
En
Row and column address decoder are used to select the particular memory location in an
gi
array.
n
Row decoder is used to drive NOR ROM array. It selects one of 2 word lines.
nee
Dynamic 2 to 4 decoder reduces the number of transistors and propagation delay.
rin
g.n
e t
Symbol and Truth table Dynamic 2-to-4 NOR decoder
Column Decoder
It should match the bit line pitch of the memory array.
In column decoder, decoder outputs are connected to nMOS pass transistors.
By using this circuit, we can selectively drive one out of m pass transistors.
UNIT-IV
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
Only one nMOS pass transistor is ON at the time.
ww
reliability of memory circuits.
Basic differential sense amplifier circuit shown in below figure.
w.E
It performs the following performances
asy
En
Amplification:
gi nee
rin
In memory structures such as the 1T DRAM, amplification is required for proper
functionality.
Delay Reduction:
g.n
The amplifier compensates for the fan-out driving capability of the memory cell by
e
detecting and amplifying small transitions on the bit line to large signal output
swings.
Power reduction: t
Reducing the signal swing on the bit lines can eliminate large part of the power
dissipation related to charging a n d discharging the bit lines.
(iii) Drivers/ Buffers
The length of word and bit lines increases with increasing memory sizes.
Large portion o f the read and write access time can be attributed t o the
wire delays.
A major part of the memory-periphery area is allocated to the drivers (address
buffers and I/O drivers).
******************************************************************************
ww in stored charge.
• Increased capacitor area or value:
w.E
Keeping the "ground" plate of the storage capacitor at VDD/2 reduces the
maximum voltage over Cs, making it possible to use thinner oxides.
asy
• Increasing the cell size:
Ultra-low-voltage DRAM memory operation might require a sacrifice in area
efficiency.
Retention current Reduction:
En
gi
SRAM array should not have any static power dissipation. But the leakage current of the
nee
transistor will be the major problem and this is the main source of the retention current.
This retention current can be reduced by the following factors.
rin
1. Turn OFF unused memory blocks
2. Negative biasing voltage of the cells which are not active, thus reduce the leakage current.
g.n
3. If low threshold voltage transistor is inserted between VDD and SRAM array, leakage
reduces.
reduced.
e
4. Leakage is a function of VDD, thus if supply rail is lowered, then leakage current is
Figure: (a) Insertion of low threshold device (b) Reducing supply Voltage
******************************************************************************
ww
FPGA provide the next generation in the programmable logic devices.
It refers to the ability of the gate arrays to be programmed for a specific function by the user.
w.E
The word Array is used to indicate a series of columns and rows of gates that can be
programmed by the end user.
asy
As compared to standard gate arrays, the field programmable gate arrays are larger devices.
The basic cell structure for FPGA is complicated than the basic cell structure of standard
gate array.
En
The programmable logic blocks of FPGA are called Configurable Logic Block (CLB).
gin
The FPGA architecture consists of three types of configurable elements-
(i) IOBs –Input/output blocks
(ii) CLBs- Configurable logic blocks
eer
(iii) Resources for interconnection
ing
The IOBs provide a programmable interface between the internal, array of logic blocks
(CLBs) and the device‟s external package p ins.
CLBs perform user-specified logic functions.
.ne
The interconnect resources carry signals among the blocks.
A configurable program stored in internal static memory cells.
Configurable program determines the logic functions and the interconnections.
t
The configurable data is loaded into the device during power-up reprogramming function.
FPGA devices are customized by loading configuration data into internal memory cells.
1.Logic blocks
Based on memories (Flip-flop & LUT – Lookup Table) Xilinx
Based on multiplexers (Multiplexers)-Actel
Based on PAL/PLA - Altera
Transistor Pairs
2. Interconnection Resources
Symmetrical FPGA-s
Row-based FPGA-s
Sea-of-gates type of FPGA-s
Hierarchical FPGA-s (CPLD)
3. Input-output cells (I/O Cell)
ww
Possibilities for programming :
a. Input
w.E b. Output
c. Bidirectional
RE-PROGRAMMABLE DEVICE ARCHITECTURE:
asy
En
gin
eer
ing
.ne
t
Figure: FPGA building blocks structure
The figure shows the general structure of FPGA chip.
It consists of a large number of programmable logic blocks surrounded by programmable
I/O block.
5.7.1: Configurable Logic Block:
The programmable logic blocks of FPGA are smaller and less capable than a PLD, but an
FPGA chip contains a lot more logic blocks to make it more capable.
As shown in figure the logic blocks are distributed across the entire chip.
These logic blocks can be interconnected with programmable inter connections.
The programmable logic blocks of FPGAs are called Configurable Logic Blocks (CLBs).
CLBs contain LUT, FF, logic gates and Multiplexer to perform logic functions.
The CLB contains RAM memory cells and can be programmed to realize any function of
five variables or any two functions of four variables.
The functions are stored in the truth table form, so the number of gates required to realize
the functions is not important.
5.7.2: Interconnection resources:
ww
w.E
asy
En
gin
eer
Figure: Types of interconnection resources ing
(a) Symmetrical Arrays
.ne
It consists of logic elements (CLBs) arranged in rows and columns of a matrix and
interconnect laid out between them.
t
This symmrtrical martrix is surrounded by I/O blocks which connect it to outside world.
(b) Row based architecture:
It consists of alternating rows of logic modules and programmable interconnect tracks.
Input output blocks is located in the periphery of the rows.
One row may be connected to adjacent rows via vertical interconnect.
(c) Hierarchical CPLD:
This architecture is designed in hierarchical manner with top level containing only logic
blocks and interconnects.
1.Connections within macrocells
2.Local connection resource within the logical block.
3.Global connection resource (Switch Matrix)
(d) Sea of gates structure:
It consists of logic elements (CLBs) arranged in rows and columns of a matrix inthe channel
less gate arrays module.
ww
w.E Figure: A three-state bidirectional output buffer
asy
We can limit the number of I/O drivers that can be attached to any one VDD and GND pad.
It allows employ the same pad for input and output bidirectional I/O.
En
When we want to use the pad as an input, set OE low and take the data from DATAin.
We can build output-only or input-only pads.
gin
*************************************************************************************
5.8: FPGA(PROGRAMMABLE ASIC )interconnect routing procedures (Architectures):
eer
Give short notes on FPGA inte rconnect routing procedures. (May 2016)
and another logic block.
t
The type of routing architecture decides area consumed by routing as well as density of
logic blocks.
Routing techniques decide the amount of area used by wire segments and programmable
switches as compared to area consumed by logic blocks.
Connections between the logic blocks in distant groups require the traversal of one or more levels
of routing segments.
As shown in Figure, only one level of routing directly connects to the logic blocks.
Programmable connections are represented with the crosses and circles.
ww
w.E
asy
Figure: Example of Hie rarchical
En
FPGA (b) Xilinx Routing Architecture:
In Xilinx routing, connections are made from logic block into the channel through a connection
block.
gin
As SRAM technology is used to implement Lookup Tables, connection sites are large.
eer
A logic block is surrounded by connection blocks on all four sides.
ing
.ne
t
The logic block pins connecting to connection blocks can then be connected to any number of wire
segments through switching blocks.
Figure shows the Xilinx routing architecture.
There are four types of wire segments available:
General purpose segments that pass through switches in the switch block.
Direct interconnect connects logic block pins to four surrounding connecting blocks
Long line: high fan out uniform delay connections
Clock lines: clock signal provider which runs all over the chip.
En
gin
eer
ing
.ne
Figure: Altera Max 5000 Routing Architecture
All four types of tracks connect to every logic block in the array block.
t
Any track can connect to into any input which makes this routing simple.
Advantage: It allows to be packed tightly and efficiently.
Disadvantage: Large number of switches required, which adds to capacitive load.
It employs wire segments of different lengths in each channel to provide the most appropriate
length for each given connection.
ww
w.E Figure: Island-Style Routing
Architecture (e) Actel Routing Architecture:
asy
Actel's design has more wire segments in horizontal direction than in vertical direction.
The input pins connect to all tracks of the channel that is on the same side as the pin.
En
The output pins extend across two channels above the logic block and two channels below it.
Output pin can be connected to all 4 channels that it crosses.
gin
The switch blocks are distributed throughout the horizontal channels.
All vertical tracks can make a connection with every incidental horizontal track.
eer
This allows for the flexibility that a horizontal track can switch into a vertical track, thus allowing
for horizontal and vertical routing of same wire.
ing
The drawback is more switches are required which add up to more capacitive load.
.ne
t
UNIT-V
\ EC8095-VLSI DESIGN
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
VLSI designers have a wide variety of CAD tools to choose from, each with their own
strengths and weaknesses. The leading Electronic Design Automation (EDA) companies include
Cadence, Synopsys, Magma, and Mentor Graphics.
Tanner also offers commercial VLSI design tools. The leading free tools include Electric,
Magic, and LASI.
This set of laboratories uses the Cadence and Synopsys tools because they have the largest
market share in industry, are capable of handling everything from simple class projects to state-of-
the-art integrated circuits.
The full set of tools is extremely expensive but the companies offer academic programs to
make the tools available to universities at a much lower cost.
ww The tools run on Linux and other flavors of UNIX. Setting up and maintaining the tool
involves a substantial effort. Once they are setup correctly, the basic tools are easy to use, as this
tutorial demonstrates.
w.E Some companies use the Tanner tools because their list price is much lower and they are
easy to use. However, their academic pricing is comparable with Cadence and Synopsys, giving
asy
little incentive for universities to adopt Tanner.
En
The Electric VLSI Design System is an open-source chip design program developed by
Electric presently does not read the design rules for state-of-the-art nanometer processes and
gin
poorly integrates with synthesis and place & route.
Magic is a free Linux-based layout editor with a powerful but awkward interface that was
once widely used in universities.
eer
ing
The Layout System for Individuals, LASI, developed by David Boyce, is freely available
and runs on Windows. It was last updated in 1999. There are two general strategies for chip
design.
.ne
t
Custom design involves specifying how every transistor is connected and physically
arranged on the chip.
The majority of commercial designs are synthesized today because synthesis takes less
engineering time.
However, custom design gives more insight into how chips are built and into what to do
when things go wrong.
Custom design also offers higher performance, lower power, and smaller chip size. The
first two labs emphasize the fundamentals of custom design, while the next two use logic synt hesis
and automatic placement to save time.
Tool Setup
These labs assume that We have the Cadence and Synopsys tools installed. The tools
generate a bunch of random files. It‟s best to keep them in one place. In your home directory,
mkdir IC_CAD
mkdir IC_CAD/cadence
Getting Started
Before you start the Cadence tools, change into the cadence directory:cd
~/IC_CAD/cadence Each of our tools has a startup script that sets the appropriate paths to the tools
and invokes them.
ww Start Cadence with the NCSU extensions by running cad-ncsu & A window labeled icfb
will open up.
w.E This is the Integrated Circuit Front and Back End (e.g. schematic and layout) software, part
of Cadence‟s Design Framework interface.
asy
A “What‟s New” and a Library Manager window may open up too. Scroll through the icfb
En
window and look at the messages displayed as the tool loads up.
gin
Get in the habit of watching for the messages and recognizing any that are out of the ordinary.
This is very helpful when We encounter problems. All of your designs are stored in a
library. If the Library Browser doesn‟t open, choose Tools
eer
ing
Library Manager. We‟ll use the Library Manager to manipulate your libraries. Don‟t try to
move libraries around or rename them directly in Linux; there is some funny behavior and We are
.ne
likely to break them.
Familiarize Werself with the Library Manager. Your cds.lib file includes many libraries
t
from the NCUS CDK supporting the different MOSIS processes. It also includes libraries from the
University of Utah.
The File menu allows We to create new libraries and cells within a library, while the Edit
menu allows We to copy, rename, delete, and change the access permissions.
Choose the “Attach to existing tech library” and accept the default, UofU AMI 0.60u C5N
(3M, 2P, high- res).
This is a technology file for the American Microsystems (now Orbit Semiconductor) 0.6
μm process, containing design rules for layout.
Schematic Entry
Our first step is to create a schematic for a 2-input NAND gate. Each gate or larger
component is called a cell. Cells have multiple views. The schematic view for a cell built with
CMOS transistors will be called cmos sch.
Later, We will build a view called layout specifying how the cell is physically
manufactured. In the Library Manager, choose File • New • Cell View… In your lab1_xx library,
enter a cell name of nand2 and a view name of cmos_sch. The tool should be Composer -
Schematic.
We may get a window asking you to confirm that cmos_sch should be assoc iated with this
tool. The schematic editor window will open. Your goal is to draw a gate like the one shown in
Figure 1. We are working in a 0.6 μm process with λ = 0.3 μm.
Unfortunately, the University of Utah technology file is configured on a half- lambda grid,
so grid units are 0.15 μm. Take care that everything We do is an integer multiple of λ so We don‟t
come to grief later on. Our NAND gate will use 12 λ (3.6 μm) nMOS and pMOS transistors.
Choose Add • Instance to open a Component Browser window. C hoose
UofU_Analog_Parts for the library, then select nmos. The Add Instance dialog will open. Set the
Width to 3.6u (u indicates microns).
ww Click in the schematic editor window to drop the transistor. We can click a second time to
w.E
place another transistor. Return to the Component Browser window and choose pmos. Drop two
pMOS transistors.
asy
Then return to the browser and get a gnd and a vdd symbol. When We are in a mode in the
editor, We can press ctrl-c or Esc to get out of it.
35. En
Other extremely useful commands include Edit • Move, Edit • Copy, Edit • Undo, and Edit
Delete. Edit • Properties • Object… is also useful to change things like transistor sizes or
wire names.
gin
eer
Move the elements around until they are in attractive locations. I like to keep series
transistors one grid unit apart and place pMOS transistors two grid units above the nMOS. Look at
the bottom of the schematic editor window to see what mode We are in.
ing
Next, use Add • Pin… to create some pins. In the Add Pin dialog, enter a and b. Make sure
the direction is “input.”
.ne
is the bottom one.
t
The tools are case-sensitive, so use lower case everywhere. Place the pins, being sure that a
Although pin order doesn‟t matter logically, it does matter physically and electrically, so
We will get errors if We reverse the order. Then place an output pin y. Now, wire the elements
together.
Choose Add • Wire (narrow). Click on each component and draw a wire to where it should
connect. It is a good idea to make sure every net (wire) in a design has a name.
Otherwise, We‟ll have a tough time tracking down a problem later on one of the unnamed
nets.
Every net in Wer schematic is connected to a named pin or to power or ground except the
net between the two series nMOS transistors. Choose Add • Wire name… Enter mid or something
like that as the name, and click on the wire to name it. Choose Design • Check and Save to save
Wer schematic.
We‟ll probably get one warning about a “solder dot on crossover” at the 4-way junction on
the output node.
This is annoying because such 4-way junctions are normal and common. Choose Check •
Rules Setup… and click on the Physical tab in the dialog. Change Solder On CrossOver from
“warning” to “ignored” and close the dialog.
Then Check and Save again and the warning should be gone.
If We have any other warnings, fix them. A common mistake is wires that look like they
might touch but don‟t actually connect. Delete the wire and redraw it. Poke around the menus and
familiarize Werself with the other capabilities of the schematic editor.
LOGIC VERIFICATION
ww Cells are commonly described at three levels of abstraction. The register-transfer level
(RTL) description is a Verilog or VHDL file specifying the behavior of the cell in terms of
w.E
registers and combinational logic.
It often serves as the specification of what the chip should do. The schematic illustrates
asy
how the cell is composed from transistors or other cells. The layout shows how the transistors or
cells are physically arranged.
En
Logic verification involves proving that the cells perform the correct function. One way to
do this is to simulate the cell and apply a set of 1‟s and 0‟s called test vectors to the inputs, then
check that the outputs match expectation.
gin
eer
Typically, logic verification is done first on the RTL to check that the specification is
correct. A testbench written in Verilog or VHDL automates the process of applying and checking
ing
all of the vectors.
The same test vectors are then applied to the schematic to check that the schematic matches
the RTL.
.ne
the schematic (and, by inference, the RTL).
t
Later, we will use a layout-versus schematic (LVS) tool to check that the layout matches
We will begin by simulating an RTL description of the NAND gate to become familiar
with reading RTL and understanding a testbench. In this tutorial, the RTL and testbench are
written in System Verilog, which is a 2005 update to the popular Verilog hardware description
language.
There are many Verilog simulators on the market, including NC-Verilog from Cadence,
VCS from Synopsys, and ModelSim from Mentor Graphics.
This tutorial describes how to use NC Verilog because it integrates gracefully with the
other Cadence tools.
NCVerilog compiles your Verilog into an executable program and runs it directly, making
it much faster than the older interpreted simulators. Make a new directory for simulation (e.g.
nand2sim).
Copy nand2.sv, nand2.tv, and testfixture.verilog from the course directory into your new
directory.
cp /courses/e158/10/nand2.sv . cp /courses/e158/10/nand2.tv .
cp /courses/e158/10/nand2.testfixture testfixture.verilog
nand2.sv is the SystemVerilog RTL file, which includes a behavioral description of a nand2
module and a simple self-checking testbench that includes testfixture.verilog. testfixture.verilog
reads in testvectors from nand2.tv and applies them to pins of the nand2 module.
After each cycle it compares the output of the nand2 module to the expected output, and
prints an error if they do not match.
ww Look over each of these files and understand how they work. First, We will simulate the
nand2 RTL to practice the process and ensure that the testbench works.
w.E Later, We will replace the behavioral nand2 module with one generated from Wer Electric
schematic and will resimulate to check that your schematic performs the correct function.
asy
At the command line, type sim- nc nand2.sv to invoke the simulator. We should see some
messages ending with
ncsim> run En
Completed 4 tests with 0 errors. gin
Simulation stopped via $stop(1) at time 81 NS + 0
eer
ing
We‟ll be left at the ncsim command prompt. Type quit to finish the simulation. If the
simulation hadn‟t run correctly, it would be helpful to be able to view the results.
.ne
NC-Verilog has a graphical user interface called SimVision. The GUI takes a few seconds
to load, so We may prefer to run it only when We need to debug.
t
To rerun the simulation with the GUI, type sim- ncg nand2.sv A Console and Design
Browser window will pop up.
In the browser, click on the + symbol beside the testbench to expand, then click on dut.
The three signals, a, b, and y, will appear in the pane to the right. Select all three, then right-click
and choose Send to Waveform Window.
In the Waveform Window, choose Simulation • Run. We‟ll see the waveforms of your
simulation; inspect them to ensure they are correct. The 0 errors message should also appear in the
console.
If you needed to change something in your code or testbench or test vectors, or wanted to
add other signals, do so and then Simulation • Reinvoke Simulator to recompile everything and
bring We back to the start.
Then choose Run again. Make a habit of looking at the messages in the console window
and learning what is normal.
Warnings and errors should be taken seriously; they usually indicate real problems that will
catch We later if We don‟t fix them.
Schematic Simulation
Next, We will verify your schematic by generating a Verilog deck and pasting it into the
RTL Verilog file.
While viewing your schematic, click on Tools • Simulation • NCVerilog to open a window
for the Verilog environment. Note the run directory (e.g. nand2_run1), and press the button in the
upper left to initialize the design.
ww Then press the next button to generate a netlist. Look in the icfb window for errors and
correct them if necessary.
w.E We should see that the pmos, nmos, and nand2 cells were all netlisted. In your Linux
terminal window, cd into the directory that was created. We‟ll find quite a few files.
asy
The most important are verilog.inpfiles, testfixture.template, and testfixture.verilog. Each
cell is netlisted into a different directory under ihnl. verilog.inpfiles states where they are.
En
Take a look at the netlist and other files. testfixture.template is the top level module that
instantiates the device under test and invokes the testfixture.verilog.
gin
Copy your from your nand2sim directory to your nand2_run1 directory using a command
such as
eer
cp ../nand2sim/testfixture.verilog . cp ../nand2sim/nand2.tv .
ing
Back in the Virtuoso Verilog Environment window, We may wish to choose Setup •
Record Signals.
.ne
t
Click on the “All” button to record signals at all levels of the hierarchy. (This isn‟t
important for the nand with only one level of hierarchy, but will be helpful later.)
Then choose Setup • Simulation. Change the Simulation Log File to indicate simout.tmp –
sv. This will print the results in simout.tmp.
The –sv flag indicates that the simulator should accept SystemVerilog syntax used in the
testfixture.verilog. Set the Simulator mode to “Batch” and click on the Simulate button.
We should get a message that the batch simulation succeeded.This doesn‟t mean that it is
correct, merely that it run.
In the terminal window, view the simout.tmp file. It will give some statistics about the
compilation, then should indicate that the 4 tests were completed with 0 errors.
If the simulation fails, the simout.tmp file will have clues about the problems. Change the
simulator mode to Interactive to rerun with the GUI.
Be patient; the GUI takes several seconds to start and gives no sign of life until the n. Add
the waveforms again and run the simulation.
We may need to zoom to fit all the waves. For some reason, SimVision doesn‟t print the
$display message about the simulation succeeding with no errors.
We will have to read the simout.tmp file at the command line to verify that the test vectors
passed. If We find any logic errors, correct the schematic and resimulate.
The rapid pace of innovation has created powerful SOC solutions at consumer prices.
This has created a highly competitive market place where billions of dollars can be won by the
right design delivered at the right time.
ww These new designs are produced on processes that challenge the fundamental law of
physics and are highly sensitive to equipment variation.
w.E The industry now produces new designs in a complex world where process and
design interaction have created new complex failures that stand in the way of billion-dollar
opportunities.
asy
These interactions lead to new types of defects such as blocked chains,
En
which create noise in the debug/diagnosis process.
gin
They also lead to new types of design issues such as delay defects in combinational and sequential
logic.
eer
The challenge is made even greater by the growing complexity in device structure and
design techniques.
ing
Multiple design organizations use multiple IP blocks and multiple libraries that need to work
Together throughout the process window, often across multiple fabs.
.ne
t
These new challenges come at a time when product lifetimes are shrinking, leading to
pressure to reduce time for debug and characterization activities. These problems are seen for the
first time at first silicon.
Test the first chips back from fabrication If We are lucky, they work the first time If not
Logic bugs vs. electrical failures Most chip failures are logic bugs from inadequate simulation or
verification Some are electrical failures Crosstalk Dynamic nodes: leakage, charge sharing Ratio
failures A few are tool or methodology failures (e.g. DRC) Fix the bugs and fabricate a corrected
chip Silicon debug (or “bringup”) is primarily a Non-Recurring Engineering (NRE) cost (like
design) Contrast this with manufacturing test which has to be applied to every part shipped.
MANUFACTURING TEST
A speck of dust on a wafer is sufficient to kill chip Yield of any chip is < 100% Must test
chips after manufacturing before delivery to customers to only ship good parts Manufacturing
testers are very expensive Minimize time on tester Careful selection of test vectors.
A test for a defect will produce an output response which is different from the output when
there is no defect Test quality is high if the set of tests will detect a very high fraction of possible
defects Defect level is the percentage of bad parts shipped to customers Yield is the percentage of
defect- free chips manufactured
Fault models:
Numerous possible physical failures (what we are testing for) Can reduce the number of
failure types by considering the effects of physical failures on the logic functional blocks: called a
Assume that defects will cause the circuit to behave as if lines were “stuck” at logic 0 or 1 Most
commercial tools for test are based on the “stuck-at” model Other fault models “Stuck open”
model for charge retained on a CMOS node Recent use of the “transition” fault model in an
attempt to deal with delays “Path delay” fault model would be better for small delay defects, but
the large number of possible paths is an impediment to the use of this fault model.
ww
DESIGNS FOR TESTABILITY
Approach to generating tests for defects is to map defects to (higher level) faults: develop
w.E
fault model, then generate tests for the faults Typical: gate- level “stuck-at” fault model As
technology shrinks, other faults: bridging faults, delay faults, crosstalk faults, etc.
asy
An interesting point: what is important is how well the tests generated (based on the fault
model) will detect realistic defects the accuracy of the fault model is secondary
En
Observability and controllability:
gin
Observability: ease of observing a value on a node by monitoring external output pins of
the chip Controllability: ease of forcing a node to 0 or 1 by driving input pins of the chip
eer
Combinational logic is usually easier to observe and control Still, NP-complete problem Finite
state machines can be very difficult, requiring many cycles to enter desired state Especially if state
transition diagram is not known to the test engineer, or is too large
ing
Fault simulation:
.ne
Identify faults detected by a sequence of tests Provide a numerical value of coverage (ratio
t
of detected faults to total faults) Correlation between high fault coverage and low defect level
Faults considered Generally, gate level “stuck-at” faults Can also evaluate coverage of switch level
faults Can include timing and dynamic effects of failures.
Although fault simulation takes polynomial time in the number of gates, it can still be
prohibitive for large designs. Static timing analysis (Primetime, for example) only finds structural
long paths
BOUNDARY SCAN
Boundary scan is a method for testing interconnects (wire lines) on printed circuit boards or
sub-blocks inside an integrated circuit. Boundary scan is also widely used as a debugging method
to watch integrated circuit pin states, measure voltage, or analyze sub-blocks inside an integrated
circuit.
Testing
The boundary scan architecture provides a means to test interconnects and clusters of logic,
memories etc. Without using physical test probes. It adds one or more so called 'test cells'
connected to each pin of the device that can selectively override the functionality of that pin.
These cells can be programmed via the JTAG scan chain to drive a signal onto a pin and
across an individual trace on the board. The cell at the destination of the board trace can then be
ww
programmed to read the value at the pin, verifying the board trace properly connects the two pins.
If the trace is shorted to another signal or if the trace has been cut, the correct signal value
w.E
will not show up at the destination pin, and the board will be observed to have a fault.
On-Chip Infrastructure
asy
To provide the boundary scan capability, IC vendors add additional logic to each of their
En
devices, including scan cells for each of the external traces.
These cells are then connected together to form the external boundary scan shift register
gin
(BSR), and combined with JTAG TAP (Test Access Port) controller support comprising four (or
sometimes more) additional pins plus control circuitry.
eer
Some TAP controllers support scan chains between on-chip logical design blocks, with
ing
JTAG instructions which operate on those internal scan chains instead of the BSR.
This can allow those integrated components to be tested as if they were separate chips on a
.ne
board. On-chip debugging solutions are heavy users of such internal scan chains.
t
These designs are part of most Verilog or VHDL libraries. Overhead for this additional
logic is minimal, and generally is well worth the price to enable efficient testing at the board level.
For normal operation, the added boundary scan latch cells are set so that they have no
effect on the circuit, and are therefore effectively invisible.
However, when the circuit is set into a test mode, the latches enable a data stream to be
shifted from one latch into the next.
Once a complete data word has been shifted into the circuit under test, it can be latched into
place so it drives external signals.
Shifting the word also generally returns the input values from the signals configured as
inputs.
As the cells can be used to force data into the board, they can set up test conditions. The
relevant states can then be fed back into the test system by clocking the data word back so that it
can be analyzed.
By adopting this technique, it is possible for a test system to gain test access to a board. As
most of today‟s boards are very densely populated with components and tracks, it is very difficult
for test systems to physically access the relevant areas of the board to enable them to test the
board. Boundary scan makes access possible without always needing physical probes.
In modern chip and board design, Design For Test is a significant issue, and one common
design artifact is a set of boundary scan test vectors, possibly delivered in Serial Vector Format
(SVF) or a similar interchange format.
ww Devices communicate to the world via a set of input and output pins. By themselves, these
pins provide limited visibility into the workings of the device.
w.E However, devices that support boundary scan conta in a shift-register cell for each signal
pin of the device. These registers are connected in a dedicated path around the device's boundary
(hence the name).
asy
The path creates a virtual access capability that circumvents the normal inputs and provides
direct control of the device and detailed visibility at its outputs. he contents of the boundary scan
En
are usually described by the manufacturer using a part-specific BSDL file.
gin
The boundary-scan cells can be configured to support external testing for interconnection
eer
between chips (EXTEST instruction) or internal testing for logic within the chip (INTEST
instruction).
Scan-path 'infrastructure' or integrity
Boundary-scan device pin to boundary-scan device pin 'interconnect'
t
Boundary-scan pin to memory device or device cluster (SRAM, DRAM, DDR etc)
Arbitrary logic cluster testing
When used during manufacturing, such systems also support non-test but affiliated
applications such as in-system programming of various types of flash memory: NOR, NAND, and
serial (I2C or SPI).
Such commercial systems are used by board test professionals and will often cost several
thousand dollars for a fully- fledged system.
They can include diagnostic options to accurately pin- point faults such as open circuits and
shorts and may also offer schematic or layout viewers to depict the fault in a graphical manner.
Tests developed with such tools are frequently combined with other test systems such as in-
circuit testers (ICTs) or functional board test systems.
Design-for-testability techniques improve the controllability and observability of internal nodes, so that
embedded functions can be tested.
Two basic properties determine the testability of a node: 1) controllability, which is a measure of the
difficulty of setting internal circuit nodes to 0 or 1 by assigning values to primary inputs (PIs), and 2)
observability, which is a measure of the difficulty of propagating a node‟s value to a primary output (PO)
. A node is said to be testable if it is easily controlled and observed. For sequential circuits, some have
added predictability, which represents the ability to obtain known output values in response to given
input stimuli. The factors affecting predictability include initializability, races, hazards, oscillations, etc.
DFT techniques include analog test busses and scan methods. Testability can also be improved with BIST
circuitry, where signal generators and analysis circuitry are implemented on chip. Without testability,
design flaws may escape detection until a product is in the hands of users; equally, operational failures
may prove difficult to detect and diagnose.
Traditionally, hardware designers and test engineers have focused on proving the correct manufacture of a
design and on locating and repairing field failures. They have developed several highly structured and
ww
effective solutions to this problem, including scan design and self test. Design verification has been a less
formal task, based on the designer‟s skills. However, designers have found that structured design-for-test
features aiding manufacture and repair can significantly simplify design verification. These features
w.E
reduce verification cycles from weeks to days in some cases.
In contrast, software designers and test engineers have targeted design validation and verification. Unlike
hardware, software does not break during field use. Design errors, rather than incorrect replication or
asy
wear out, cause operational bugs. Efforts have focused on improving specifications and programming
styles rather than on adding explicit test facilities. For example, modular design, structured programming,
formal specification, and object orientation have all proven effective in simplifying test.
En
Although these different approaches are effective when we can cleanly separate a design‟s hardware and
software parts, problems arise when boundaries blur. For example, in the early design stages of a complex
gi
system, we must define system level test strategies. Yet, we may not have decided which parts to
nee
implement in hardware and which in software. In other cases, software running on general-purpose
hardware may initially deliver certain functions that we subsequently move to firmware or hardware to
improve performance.
rin
Designers must ensure a testable, finished design regardless of implementation decisions. Supporting
hardware-software codesign‟ requires “cotesting” techniques, which draw hardware and software test
techniques together into a cohesive whole.
Things to be followed
Large circuits should be partitioned into smaller sub-circuits to reduce test costs. One of the most
important steps in designing a testable chip is to first partition the chip in an appropriate way such that for
each functional module there is an effective (DFT) technique to test it.
Partitioning must be done at every level of the design process, from architecture to circuit, whether testing
is considered or not. Partitioning can be functional (according to functional module boundaries) or
physical (based on circuit topology). Partitioning can be done by using multiplexers and/or scan chains.
ww Test access points must be inserted to enhance controllability & observability of the circuit. Test
points include control points (CPs) and observation points (OPs). The CPs are active test points,
w.E
while the OPs are passive ones. There are also test points, which are both CPs and OPs. Before
exercising test through test points that are not PIs and POs, one should investigate into additional
requirements on the test points raised by the use of test equipments.
asy
Circuits (flip-flops) must be easily initializable to enhance predictability. A power-on reset
mechanism controllable from primary inputs is the most effective and widely used approach.
Test control must be provided for difficult-to-control signals.
En
Automatic Test Equipment (ATE) requirements such as pin limitation, tri-stating, timing
gi
resolution, speed, memory depth, driving capability, analog/mixed-signal support,
internal/boundary scan support, etc., should be considered during the design process to avoid
nee
delay of the project and unnecessary investment on the equipments.
Internal oscillators, PLLs and clocks should be disabled during test. To guarantee tester
rin
synchronization, internal oscillator and clock generator circuitry should be isolated during the test
of the functional circuitry. The internal oscillators and clocks should also be tested separately.
Analog and digital circuits should be kept physically separate. Analog circuit testing is very much
g.n
different from digital circuit testing. Testing for analog circuits refers to real measurement, since
analog signals are continuous (as opposed to discrete or logic signals in digital circuits). They
tested separately.
Things to be avoided
e
require different test equipments and different test methodologies. Therefore they should be
t
Asynchronous(unclocked) logic feedback in the circuit must be avoided. A feedback in
the combinational logic can give rise to oscillation for certain inputs. Since no clocking is
employed, timing is continuous instead of discrete, which makes tester synchronization virtually
impossible, and therefore only functional test by application board can be used.
The above guidelines are from experienced practitioners. These are not complete or universal. In
fact, there are drawbacks for these methods:
A There is a lack of experts and tools.
B Test generation is often manual
C This method cannot guarantee for high fault coverage.
D It may increase design iterations.
w.E
Objectives of Scan Design
asy
Scan design is implemented to provide controllability and observability of internal state
variables for testing a circuit.
En
It is also effective for circuit partitioning.
nee
Circuit is designed using pre-specified design rules. rin
Test structure (hardware) is added to the verified design.
g.n
e
One (or more) test control (TC) pin at the primary input is required.
Flip-flops are replaced by scan flip-flops (SFF) and are connected so that they behave as a shift
register in the test mode. The output of one SFF is connected to the input of next SFF.
The input of the first flip- flop in the chain is directly connected to an input pin (denoted as
SCANIn), and the output of the last flip- flop is directly connected to an output pin (denoted as
t
SCANOUT).
In this way, all the flip- flops can be loaded with a known value, and their value can be easily
accessed by shifting out the chain. Figure 39.1 shows a typical circuit after the scan insertion
operation.
Primary Primary
Inputs Outputs
ww SFF
w.E SFF
TC
SCANIN asy
CLK
En
gi
Fig. 39.1 Scan structure to a design
nee
Fig. 39.1 shows a scan structure connected to design. The scan flip-flips (FFs) must be
interconnected in a particular way. This approach effectively turns the sequential testing problem
into a combinational one and can be fully tested by compact ATPG patterns. Unfortunately, there
rin
are two types of overheads associated with this technique that the designers care about very
much. These are the hardware overhead (including three extra pins, multiplexers for all FFs, and
g.n
extra routing area) and performance overhead (including multiplexer delay and FF delay due to
extra load).
Scan Overheads
The use of scan design produces two types of overheads. These are area overhead and
performance overhead. The scan hardware requires extra area and slows down the signals.
II. IO pin overhead: At least one primary pin necessary for test.
JJ. Area overhead: Gate overhead = [4 nsff/(n g+10n ff)] x 100%, where n g = number of
combinational gates; nff = number of flip- flops; nsff = number of scan flip- flops; For full
scan number of scan flip- flops is equal to the number of original circuit flip- flops.
Example: ng = 100k gates, n ff = 2k flip- flops, overhead = 6.7%. For more accurate
estimation scan wiring and layout area must be taken into consideration.
Performance overhead: The multiplexer of the scan flip- flop adds two gate-delays in
combinational path. Fanouts of the flip- flops also increased by 1, which can increase the
clock period.
ww
Scan Variations
There have been many variations of scan as listed below, few of these are discussed here.
w.E
MUXed Scan
Scan path
Scan-Hold Flip-Flop
Serial scan asy
Level-Sensitive Scan Design (LSSD)
En
Scan set
Random access scan
MUX Scan
gi nee
It was invented at Stanford in 1973 by M. Williams & Angell.
rin
In this approach a MUX is inserted in front of each FF to be placed in the scan chain.
g.n
The scan flip-flips (FFs) must be interconnected in a particular way. This approach effectively
turns the sequential testing problem into a combinational one and can be fully tested by compact
ATPG patterns.
e
There are two types of overheads associated with this method. The hardware overhead due to t
three extra pins, multiplexers for all FFs, and extra routing area. The performance overhead
includes multiplexer delay and FF delay due to extra load.
Scan Path
This approach is also called the Clock Scan Approach.
It was invented by Kobayashi et al. in 1968, and reported by Funatsu et al. in 1975, and
adopted by NEC.
In this approach multiplexing is done by two different clocks instead of a MUX.
It uses two-port raceless D-FFs as shown in Figure 39.3. Each FF consists of two latches
operating in a master-slave fashion, and has two clocks (C1 and C2) to control the scan
input (SI) and the normal data input (DI) separately.
The two-port raceless D-FF is controlled in the following way:
Downloaded From: www.EasyEngineering.net
Downloaded From: www.EasyEngineering.net
C2
SI
DI DO
SO
ww C1
L1
L2
asy
(b) This approach gives a lower hardware overhead (due to dense layout) and less
performance penalty (due to the removal of the MUX in front of the FF) compared to the
En
MUX Scan Approach. The real figures however depend on the circuit style and
technology selected, and on the physical implementation.
D
C
C D +L
00 L
D +L
L L 01 L
C 10 0
1 1 1
DI w.E +L1 DI
C asy C
SI
A
L1 +L1
SI
En +L2
gi
+L2 B L2
A
B
nee
rin
Fig. 39.5 The polarity-hold shift-register latch (SRL)
g.n
LSSD requires that the circuit be LS, so we need LS memory elements as defined above. Figure
e
39.4 shows an LS polarity-hold latch. The correct change of the latch output (L) is not dependent
on the rise/fall time of C, but only on C being `1' for a period of time greater than or equal to data
propagation and stabilization time. Figure 39.5 shows the polarity- hold shift-register latch (SRL)
used in LSSD as the scan cell.
t
The scan cell is controlled in the following way:
Normal mode: A=B=0, C=0 → 1.
SR (test) mode: C=0, AB=10→ 01 to shift SI through L1 and L2 .
Advantages of LSSD
Correct operation independent of AC characteristics is guaranteed.
FSM is reduced to combinational logic as far as testing is concerned.
Hazards and races are eliminated, which simplifies test generation and fault simulation.
Drawbacks of LSSD
Complex design rules are imposed on designers. There is no freedom to vary from the
overall schemes. It increases the design complexity and hardware costs (4-20% more
hardware and 4 extra pins).
Asynchronous designs are not allowed in this approach.
Sequential routing of latches can introduce irregular structures.
Faults changing combinational function to sequential one may cause trouble, e.g., bridging
and CMOS stuck-open faults.
Test application becomes a slow process, and normal-speed testing of the entire test
sequence is impossible.
w.E
Random Access Scan
asy
1. This approach was developed by Fujitsu and was used by Fujitsu, Amdahl, and TI.
2. It uses an address decoder. By using address decoder we can select a particular FF and
En
either set it to any desired value or read out its value. Figure 39.6 shows a random access
structure and Figure 39.7 shows the RAM cell [1,6-7].
Combinational
gi nee PO
PI
Logic
RAM
rin
CK
TC
nff bite
g.n SCANOUT
SCANIN
Address Address
Select
e t
Log2 nff bites Decoder
D
From comb. logic Q To comb.
logic
SD Scan flip-flop
SCANIN
(SF
CK
TC
wwSE
SCAN
OUT
The difference between this approach and the previous ones is that the state vector can
asy
now be accessed in a random sequence. Since neighboring patterns can be arranged so
that they differ in only a few bits, and only a few response bits need to be observed, the
En
test application time can be reduced.
In this approach test length is reduced.
nee
This is suitable for delay and embedded memory testing.
rin
The major disadvantage of the approach is high hardware overhead due to address
decoder, gates added to SFF, address register, extra pins and routing
Scan-Hold Flip-Flop
g.n
e
Special type of scan flip-flop with an additional latch designed for low power testing
application.
It was proposed by DasGupta in Figure 39.8 shows a hold latch cascaded with the SFF.
t
The control input HOLD keeps the output steady at previous state of flip- flop.
For HOLD = 0, the latch holds its state and for HOLD = 1, the hold latch becomes
transparent.
For normal mode operation, TC = HOLD =1 and for scan mode, TC = 1 and Hold = 0.
Hardware overhead increases by about 30% due to extra hardware the hold latch.
This approach reduces power dissipation and isolate asynchronous part during scan.
It is suitable for delay test
To SD of
next SHFF
D
Q
S
SFF
T
Q
ww CK
HO
w.E
asy
Fig. 39.8 Scan-hold flip-flop (SHFF)
En
In this approach only a subset of flip- flops is scanned. The main objectives of this
gi
approach are to minimize the area overhead and scan sequence length. It would be
possible to achieve required fault coverage
nee
In this approach sequential ATPG is used to generate test patterns. Sequential ATPG has
number of difficulties such as poor initializability, poor contro llability and observability
rin
of the state variables etc. Number of gates, number of FFs and sequential depth give little
idea regarding testability and presence of cycles makes testing difficult. Therefore
g.n
sequential circuit must be simplified in such a way so that test generation becomes easier.
Removal of selected flip- flops from scan improves performance and allows limited scan
design rule violations.
It also allows automation in scan flip-flop selection and test generation
Figure 39.9 shows a design using partial scan architecture [1].
e t
Sequential depth is calculated as the maximum number of FFs encountered from PI line
to PO line.
PI PO
Combinational
circuit
CK1
FF
ww
CK2
FF
SCANOU T
TCw.E SFF
asy SFF
SCANIN
En
gin
Fig. 39.9 Design using partial scan structure
Conclusions .ne
t
Accessibility to internal nodes in a complex circuitry is becoming a greater problem and thus it is
essential that a designer must consider how the IC will be tested and extra structures will be
incorporated in the design.
Scan design has been the backbone of design for testability in the industry for a long time.
Design automation tools are available for scan insertion into a circuit which then generate test
patterns.
Overhead increases due to the scan insertio n in a circuit. In ASIC design 10 to 15 % scan overhead
is generally accepted.
IDDQ Testing:
Iddq testing is a method for testing CMOS integrated circuits for the presence of manufacturing faults. It relies on
measuring the supply current (Idd) in the quiescent state (when the circuit is not switching and inputs are held at
static values). The current consumed in the state is commonly called Iddq for Idd (quiescent) and hence the name.
Iddq testing uses the principle that in a correctly operating quiescent CMOS digital circuit, there is no static current
path between the power supply and ground, except for a small amount of leakage. Many common semiconductor
manufacturing faults will cause the current to increase by orders of magnitude, which can be easily detected. This
has the advantage of checking the chip for many possible faults with one measurement. Another advantage is that it
may catch faults that are not found by conventional stuck-at fault test vectors.
Iddq testing is somewhat more complex than just measuring the supply current. If a line is shorted to Vdd, for
example, it will still draw no extra current if the gate driving the signal is attempting to set it to '1'. However, a
different input that attempts to set the signal to 0 will show a large increase in quiescent current, signalling a bad
part. Typical Iddq tests may use 20 or so inputs. Note that Iddq test inputs require only controllability, and
not observability. This is because the observability is through the shared power supply connection.
ww
w.E
asy
En
gin
eer
ing
Iddq testing has many advantages:
.ne
t
It is a simple and direct test that can identify physical defects.
The area and design time overhead are very low.
Test generation is fast.
Test application time is fast since the vector sets are small.
It catches some defects that other tests, particularly stuck-at logic tests, do not.
Drawback: Compared to scan chain testing, Iddq testing is time consuming, and thus more expensive, as is achieved
by current measurements that take much more time than reading digital pins in mass production.
As device geometry shrinks, i.e transistors and gates become smaller resulting in larger and more complex
processors and SOC's (see Moore's law), the leakage current becomes much higher and less predictable.
This makes it difficult to tell a low leakage part with a defect from a naturally high leakage part. Also, increasing
circuit size means a single fault will have a lower percentage effect, making it harder for the test to detect. However,
Iddq is so useful that designers are taking steps to keep it working.
One particular technique that helps is power gating, where the entire power supply to each block can be switched
off using a low leakage switch. This allows each block to be tested individually or in combination, which makes the
tests much easier when compared to testing the whole chip.
Iddq testing is one of the many ways to test CMOS integrated circuits in production. These circuits are usually
tested as a way to find different types of manufacturing faults. Electric faults can be a major hazard and it can even
lead to fatalities. This method relies on measuring the supply current (Idd) in its quiescent state (static value of a
non-switching circuit).
The current that is then measured at this state is called Iddq or Idd (quiescent).
Downloaded From: www.EasyEngineering.net
Downloaded
This testing method is based on the principle thatFrom:
therewww.EasyEngineering.net
is no static current path between the power supply and the
ground in a correctly operating quiescent CMOS digital circuit – except for a small amount of leakage.
It then detects the leak by picking up on any increased magnitude of the current, which is easily shown due to
semiconductor manufacturing faults. It then has the upper hand of being able to check the chip for as many possible
faults with only one measurement. It also works much better than conventional stuck-at fault test vectors in the
sense that it picks up faults that usually go by these measurements undetected.
Even though this method is quite popular and simple, its inner workings are very complex. It goes beyond just
measuring the supply current. To use an example, if a line is shortened to Vdd it will still be unable to draw extra
current if the gate driving the signal is set to „1‟. But a different input attempting to set the signal at „0‟ will show an
increase in quiescent current that will indicate a bad part in the electrical stream. A typical Iddq test will use about
20 inputs. These test inputs need only controllability and not necessarily observability. The reason for this is that
observability takes place through the shared power connection.
The advantages of Iddq are far greater than anyone could have ever imagined. Firstly, it is a simple and direct test
that can identify physical defects more effectively than standardised equipment or methods. Secondly, the time
period attached to it isn‟t very demanding. What this means is that the design time and area overhead are relatively
low. The test generation is fast, the test application time is fast due to the small sets in vectors, and it catches
ww
underlying effects that other tests can‟t pick up on immediately.
One disadvantage of Iddq testing is that it can be time consuming if compared to methods like scan testing. It is also
w.E
a more expensive option, comparatively speaking. The reason for this is because it is achieved by current
measurements that take much more time than reading digital pins in mass production.
asy
Design for Manufacturability - An Overview
Introduction:
En
Aggressive ground rule changes continue to increase the complexity of semiconductor technology. The
requirements for designs, processes, equipment, and facilities all grow in sophistication from generation to
gin
generation. These trends have made it increasingly difficult to produce a technology in the development laboratory
and transfer it to volume manufacturing in a timely and cost effective manner. The traditional laboratory role of
design and process development has expanded to include a parallel responsibility for manufacturability. For many
eer
companies, design for manufacture (DFM) has become a critical strategy for survival in an increasingly competitive
global marketplace.
ing
DFM is a systems approach to improving the competitiveness of a manufacturing enterprise by developing products
that are easier, faster, and less expensive to make, while maintaining required standards of functionality, quality,
and marketability. Design for manufacturability (DFM) and early manufacturing involvement (EMI) concepts are
.ne
now major components of the development effort designed to maintain and enhance the rate of technology
advancement and significantly improve the development-to-manufacturing transition. Design-for-manufacturability
t
philosophy and practices are used in many companies because it is recognized that 70% to 90% of overall product
cost is determined before a design is ever released into manufacturing. The semiconductor industry continues to
grow in both complexity and competitiveness.
Problem Statement
The layout development is most critical in integrated circuits (IC's) design because of cost, since it involves
expensive tools and a large amount of human intervention, and also because of the consequences for production
cost. As the device size is shrinking, the landscape of technology developments has become very different from the
past. The problems, which were supposed to be secondary can cause of yield drop out in submicron technologies.
The variability becomes a critical issue not only for performance, but also for yield dropout.
Yield dropout due to given below defects.
1. Random Defects: Due to form of impurities in the silicon itself, or the introduction of a dust particle that land on
the wafer during processing. These defects can cause a metal open or shorts. As feature sizes continue to shrink,
random defects have not decreased accordingly making advanced IC‟s even more susceptible to this type of defect.
2. Systematic Defects: Again systematic defects are more prominent contributor in yield loss in deep submicron
process technologies. Systematic defects are related to process technology due to limitation of lithography process
which increased the variation in desired and printed patterns. Another aspects of process related problem is planarity
issues make layer density requirements necessary because areas with a low density of a particular layer can cause
upper layers to sag, resulting in discontinuous planarity across the chip.
3. Parametric Defects: In deep submicron technology parametric defects is most critical for us. Parametric defects
come into the picture due to improper modeling of interconnects parasitic.
Downloaded From: www.EasyEngineering.net
As a result manufactured device doesDownloaded
not match From: www.EasyEngineering.net
the expected result from design simulation and does not meet the
design specification.
Design for manufacturability (DFM) is process to overcome these defects of yield drop out. The DFM will not be
done without collaborations between various technology parties, such as process, design, mask, EDA, and so on.
The DFM will give us a big challenge and opportunity in nanometer era.
Design for Manufacturability is the proactive process which ensures the quality, reliability, cost effective and time
to market.
DFM consist a set of different methodologies trying to enforce some soft (recommended/Mandatory) design rules
regarding the shapes and polygons of the physical layout which improve the yield.
Given a fixed amount of available space in a given layout area, there are potentially multiple yield enhancing
changes that can be made.
There are some DFM guidelines which we can take into account at SOC level.
1. Filler cell (consisting regular Diffusion and Poly silicon structures) insertion and shielding
Issue Addressed: PO/OD non uniformity
ww
Benefit: Higher parametric yield.
2. Via optimization
Issue Addressed: open Via‟s, systematic via opening issue
w.E
Benefit: Higher yield after manufacturing and qualification.
3. Wire Spreading
Issue Addressed: wire shorts and opening due to defectivity.
asy
Benefit: Higher yield, decrease cross talk.
4. Power/ground-connected fill
En
Issue Addressed: Density gradients, Large IR drop, Layout becomes regular
Benefit: Robustness to IR drop
gin
5. Litho hotspot detection and repair
Issue Addressed: Lithography hotspots
Benefit: Higher yield
6. Dummy Metal/Via/FEOL
Issue Addressed: Large density gradients
eer
Benefit: Higher yield
7. CMP hotspot detection
Issue Addressed: CMP hotspots ing
Benefit: Higher yield
.ne
t
ww
w.E
asy
E ngi
nee
rin
g.n
et