vlsi-design-unit-4
vlsi-design-unit-4
ARRAY SUBSYSTEMS:
SRAM, DRAM, ROM, Serial
Access Memories, Content
Addressable Memory
VLSI DESIGN
VIDYA SAGAR P
CMOS system design consists of partitioning the system into subsystems of the types listed
above. Many options exist that make trade-of between speed, density, programmability, ease
of design, and other variables. This chapter addresses design options for common data path
operators, arrays, especially those used for memory. Control structures are most commonly
coded in a hardware description language and synthesized.
Data path operators benefit from the structured design principles of hierarchy, regularity,
modularity, and locality. They may use N identical circuits to process N-bit data. Related data
operators are placed physically adjacent to each other to reduce wire length and delay.
Generally, data is arranged to ow in one direction, while control signals are introduced in a
direction orthogonal to the data ow.
Common data path operators considered in this chapter include adders, one/zero
detectors, comparators, counters, shifters, ALUs, and multipliers.
4.2 Shifters
Consider a direct MOS switch implementation of a 4X4 crossbar switch as shown in Fig. 4.1.
The arrangement is quit general and may be readily expanded to accommodate n-bit
inputs/outputs. In fact, this arrangement is an overkill in that any input line can be connected
to any or all output lines-if all switches are closed, then all inputs are connected to all outputs
in one glorious short circuit.
Furthermore, 16 control signals (sw00)-sw15, one for each transistor switch, must be
provided to drive the crossbar switch, and such complexity is highly undesirable.
4.3 Adders
Addition is one of the basic operation perform in various processing like counting,
multiplication and altering. Adders can be implemented in various forms to suit different
speed and density requirements.
The truth table of a binary full adder is shown in Figure 4.3, along with some functions that
will be of use during the discussion of adders. Adder inputs: A, B, Carry input
Output: SUM, Carry output: CARRY Generate signal: G (A B); occurs when CARRY is internally
generated within adder.
Propagate signal: P (A + B); when it is 1, C is passed to CARRY.
In some adders A, B is used as the P term because it may be reused to generate the sum term.
The direct implementation of the above equations is shown in Fig. 4.4 using the gate
schematic and the transistors is shown in Fig. 4.5.
The full adder of Fig. 4.5 employs 32 transistors (6 for the inverters, 10 for the carry
circuit, and 16 for the 3-input XOR). A more compact design is based on the observation that S
can be factored to reuse the CARRY term as follows:
For the SUM (S) SUM = (A XOR B) XOR Cin = (A ⊕ B) ⊕ Cin
For the CARRY-OUT (Cout) bit CARRY-OUT = A AND B OR Cin (A XOR B) = A.B + Cin (A ⊕ B)
Such a design is shown at the transistor levels in Figure 4.6 and uses only 28 transistors. Note
that the pMOS network is complement to the nMOS network.
Here Cin=C
Where tc is the delay through the carry stage of a full adder, and ts is the delay to compute the
sum of the last stage. The delay of ripple carry adder is linearly proportional to n, the number
of bits, therefore the performance of the RCA is limited when n grows bigger. The advantages
of the RCA are lower power consumption as well as a compact layout giving smaller chip area.
c2 = G1 + P1.G0 + P1.P0c0
= Gi + Pi.ci we get
c3 = G2 + P2.G1 + P2.P1.G0 + P2.P1.P0.c0
Notice that the carry-out bit, ci+1, of the last stage will be available after four delays: two gate
delays to calculate the propagate signals and two delays as a result of the gates required to
implement Equation c4.
Figure 4.9 shows that a 4-bit CLA is built using gates to generate the Pi and Pi = (ai⊕ bi)
Gi and signals and a logic block to generate the carry out signals according to Equations c1 to c4.
(a) Logic network for 4-bit CLA carry bits (b) Sum calculation using CLA network
Pi, Pi+3= (Pi+3)*(Pi+2)*(Pi+1)*Pi & Carry= Ci+4 + (Pi, i+3)* Ci. (1)
The architecture of CSkA is shown in Figure.
In Carry Bypass Adder (CBA), RCA is used to add 4-bits at a time and the carry generated will
be propagated to next stage with help of multiplexer using select input as Bypass logic. By
pass logic is formed from the product values as it is calculated in the CLA. Depending on the
carry value and by pass logic, the carry is propagated to the next stage.
If the carry path is precharged to VDD, the transmission gate is then reduced to a simple
NMOS transistor. In the same way the PMOS transistors of the carry generation is removed.
One gets a Manchester cell.
The multiplication process may be viewed to consist of the following two steps:
❼ Evaluation of partial products.
❼ Accumulation of the shifted partial products.
It should be noted that binary multiplication is equivalent to a logical AND operation. Thus
evaluation of partial products consists of the logical ANDing of the multiplicand and the
relevant multiplier bit. Each column of partial products must then be added and, if necessary,
any carry values passed to the next column.
There are a number of techniques that may be used to perform multiplication. In general,
the choice is based on factors such as speed, throughput, numerical accuracy, and area. As a
rule, multipliers may be classified by the format in which data words are accessed, namely:-
Serial form
Serial/parallel form
Parallel form
Thus Pk are the partial product terms called summands. There are mn summands, which are
produced in parallel by a set of mn AND gates.
For 4-bit numbers, the expression above may be expanded as in the table below.
The worst-case delay associated with such a multiplier is (2n + l)tg, where tg is the worst-case
adder delay.
Cell shown in Figure 4.16 is a cell that may be used to construct a parallel multiplier.
The Xi term is propagated diagonally from top right to bottom left, while the yj term is
propagated horizontally. Incoming partial products enter at the top. Incoming CARRY IN
values enter at the top right of the cell. The bit-wise AND is performed in the cell, and the SUM
is passed to the next cell below. The CARRY 0UT is passed to the bottom left of the cell.
Figure 4.17 depicts the multiplier array with the partial products enumerated. The Multiplier
can be drawn as a square array, as shown here, Figure 4.18 is the most convenient for
implementation.
In this version the degeneration of the first two rows of the multiplier are shown. The first
row of the multiplier adders has been replaced with AND gates while the second row employs
half-adders rather than full adders.
This optimization might not be done if a completely regular multiplier were required (i.e.
one array cell). In this case the appropriate inputs to the first and second row would be
connected to ground, as shown in the previous slide. An adder with equal carry and sum
propagation times is advantageous, because the worst-case multiply time depends on both
paths.
A l-bit adder provides a 3:2 (3 inputs, 2 outputs) compression in the number of bits. The
addition of partial products in a column of an array multiplier may be thought of as totaling up
the number of l's in that column, with any carry being passed to the next column to the left.
Figure 4.19
Considering the product P3, it may be seen that it requires the summation of four partial
products and a possible column carry from the summation of P2.
Example for implementation of 6X6 multiplier (4-bit) using Wallace Tree Multi-plication
methods
Consider the 6 x 6 multiplication table shown below. Considering the product P5, it may be
seen that it requires the summation of six partial products and a possible column carry from
the summation of P4. Here we can see the adders required in a multiplier based on this style
of addition.
The adders have been arranged vertically into ranks that indicate the time at which the
adder output becomes available. While this small example shows the general Wallace addition
technique, it does not show the real speed advantage of a Wallace tree. There is an identity
table \array part", and a CPA part, which is at the top right. While this has been shown as a
ripple-carry adder, any fast CPA can be used here.
The delay through the array addition (not including the CPA) is proportional to log1.5(n),
where n is the width of the Wallace tree.
4.4.3 Baugh-Wooley multiplier:
In signed multiplication the length of the partial products and the number of partial products
will be very high. So an algorithm was introduced for signed multiplication called as Baugh-
Wooley algorithm. The Baugh-Wooley multiplication is one amongst the cost-effective ways to
handle the sign bits. This method has been developed so as to style regular multipliers, suited
to 2's compliment numbers.
Where 𝑎𝑖 and 𝑏𝑖 area unit the bits during A and B, severally and 𝑎𝑛−1 and 𝑏𝑛−1 area unit the
sign bits. The full precision product, P = A × B, is provided by the equation:
The first two terms of above equation are positive and last two terms are negative. In order to
calculate the product, instead of subtracting the last two terms, it is possible to add the
opposite values. The above equation signifies the Baugh-Wooley algorithm for multiplication
process in two’s compliment form.
Baugh-Wooley Multiplier provides a high speed, signed multiplication algorithm. It uses
parallel products to complement multiplication and adjusts the partial products to maximize
the regularity of multiplication array. When number is represented in two’s complement form,
sign of the number is embedded in Baugh-Wooley multiplier. This algorithm has the advantage
that the sign of the partial product bits are always kept positive so that array addition
techniques can be directly employed. In the two’s complement multiplication, each partial
product bit is the AND of a multiplier bit and a multiplicand bit, and the sign of the partial
product bits are positive.
BOOTH ENCODER Booth multiplier reduce the number of iteration step to perform
multiplication as compare to conventional steps. Booth Algorithm Scans the multiplier
operand and spikes chains of this algorithm can. This algorithm can reduce the number of
addition required to produce the result compare to conventional multiplication method. With
the help of this algorithm reduce the number of partially product generated in multiplication
process by using the modified booth algorithm. Based on the multiplier bits, the process of
encoding the multiplicand is performed by radix-4 booth encoder. This recoding algorithm is
used to generate efficient partial product.
RADIX-4 BOOTH MULTIPLIER The Radix-4 modified Booth algorithm overcomes all these
limitations of Radix-2 algorithm. For operands equal to or greater than 16 bits, the modified
Radix-4 Booth algorithm has been widely used. It is based on encoding the two’s complement
multiplier in order to reduce the number of partial products to be added to n/2.
In Radix-4 Modified Booth algorithm, the number of partial products reduced by half. For
multiplication of 2’s complement numbers, the two bit encoding using this algorithm scans a
triplet of bits. To Booth recode the multiplier term, consider the bits in blocks of three, such
that each block overlaps the previous block by one bit. Grouping starts from the LSB, and the
first block only uses two bits of the multiplier.
-14 in binary: 10010 (so we can add when we need to subtract the
Multiplier
Step Multiplicand Action upper 5-bits 0,
lower 5-bits multiplier,
1 “Booth bit” initially 0
0 01110 Initialization 00000 11011 0
00000+10010=10010
10: Subtract Multiplicand
1 01110 10010 11011 0
11100+01110=01010
01: Add Multiplicand (Carry ignored because adding a positive
3 01110 and negative number cannot overflow.)
01010 10110 1
Shift Right Arithmetic
10: Subtract Multiplicand 00101 01011 0
4 01110 00101+10010=10111
10111 01011 0
Shift Right Arithmetic
11101 11010 1
2. External noise and loss of signal strength causes loss of data bit information while
transporting data from one device to other device, located inside the computer or
externally.
3. To indicate any occurrence of error, an extra bit is included with the message according
to the total number of 1s in a set of data, which is called parity.
4. If the extra bit is considered 0 if the total number of 1s is even and 1 for odd quantities of
1s in a set of data, then it is called even parity.
5. On the other hand, if the extra bit is 1 for even quantities of 1s and 0 for an odd number
of 1s, then it is called odd parity
A parity generator is a combination logic system to generate the parity bit at the
transmitting side.
If the message bit combination is designated as, D3D2D1D0 and Pe, Po are the even and odd
parity respectively, then it is obvious from the table that the Boolean expressions of even
parity and odd parity are
Pe=D3 D2 D1 D0
Po =(D3 D2 D1 D0)
The above illustration is given for a message with four bits of information. However, the logic
diagrams can be expanded with more XOR gates for any number of bits.
potharajuvidyasagar.wordpress.com COURSE MATERIAL
Data Path Subsystems VIDYA SAGAR P
Figure 4.26: One/zero detectors (a) All one detector (b) All zero detector (c) All zero detector
transistor level representation
4.7 Comparators :
Another common and very useful combinational logic circuit is that of the Digital Comparator
circuit. Digital or Binary Comparators are made up from standard AND, NOR and NOT gates that
compare the digital signals present at their input terminals and produce an output depending
upon the condition of those inputs.
For example, along with being able to add and subtract binary numbers we need to be able
to compare them and determine whether the value of input A is greater than, smaller than or
equal to the value at input B etc. The digital comparator accomplishes this using several logic
gates that operate on the principles of Boolean Algebra. There are two main types of Digital
Comparator available and these are.
1. Identity Comparator an Identity Comparator is a digital comparator that has only one output
terminal for when A = B either HIGH" A = B = 1or LOW" A = B = 0
The purpose of a Digital Comparator is to compare a set of variables or unknown numbers, for
example A (A1, A2, A3, . An, etc) against that of a constant or unknown value such as B (B1, B2,
B3, . Bn, etc) and produce an output condition or ag depending upon the result of the
comparison. For example, a magnitude comparator of two 1-bits, (A and B) inputs would
produce the following three output conditions when compared to each other.
A > B; A + B; A < B
Then the operation of a 1-bit digital comparator is given in the following Truth Table.
Inputs Outputs
B A A > B A=B A < B
0 0 0 1 0
0 1 1 0 0
1 0 0 0 1
1 1 0 0 0
From the above table the obtained expressions for magnitude comparator using K-map are
as follows
For A < B : C = A B
For A = B : D = A B +A B
For A > B : E = AB The logic diagram of 1-bit comparator using basic gates is shown below
in Figure 4.24.
*** Draw separate diagrams for grater, equality and less than expressions.
Equality Comparator:
Check if each bit is equal (XNOR, aka equality gate)
1’s detect on bitwise equality
B[3]
A[3]
B[2]
A[2] A=B
B[1]
A[1]
B[0]
A[0]
4.8 Counters :
Counters can be implemented using the adder/subtractor circuits and registers (or
equivalently, D ip- ops)
The simplest counter circuits can be built using T ip- ops because the tog-gle feature is
naturally suited for the implementation of the counting operation. Counters are available in
two categories
The ip- op output transition serves as a source for triggering other ip- ops i.e the C input
(clock input) of some or all ip- ops are triggered NOT by the common clock pulses
Eg:- Binary ripple counters, BCD ripple counters
2. Synchronous counters A synchronous counter however, has an internal clock, and the
external event is used to produce a pulse which is synchronized with this internal clock.
C input (clock input) of all ip- ops receive the common clock pulses
E.g.:- Binary counter, Up-down Binary counter, BCD Binary counter, Ring counter,
Johnson counter,
4.8.1 Asynchronous Up-Counter with T Flip-Flops
Figure 4.28 shows a 3-bit counter capable of counting from 0 to 7. The clock inputs of the
three ip- ops are connected in cascade. The T input of each ip-op is connected to a constant 1,
which means that the state of the ip- op will be toggled at each active edge (here, it is positive
edge) of its clock. We assume that the purpose of this circuit is to count the number of pulses
that occur on the primary input called Clock. Thus the clock input of the rst ip- op is connected
to the Clock line. The other two ip- ops have their clock inputs driven by the Q output of the
preceding ip- op. Therefore, they toggle their states whenever the preceding ip- op changes its
state from Q = 1 to Q = 0, which results in a positive edge of the Q signal.
Note here the value of the count is the indicated by the 3-bit binary number Q2Q1Q0. Since
the second ip- op is clocked by Q0 , the value of Q1 changes
shortly after the change of the Q0 signal. Similarly, the value of Q2 changes shortly
after the change of the Q1 signal. This circuit is a modulo-8 counter. Because it counts in the
upward direction, we call it an up-counter. This behavior is similar to the rippling of carries in
a ripple-carry adder. The circuit is therefore called an asynchronous counter, or a ripple
counter.
4.8.2 Asynchronous Down-Counter with T Flip-Flops
Some modifications of the circuit in Figure 4.29 lead to a down-counter which counts in the
sequence 0, 7, 6, 5, 4, 3, 2, 1, 0, 7, and so on. The modified circuit is shown in Figure 3. Here the
clock inputs of the second and third ip- ops are driven by the Q outputs of the preceding
stages, rather than by the Q outputs.
First of all, the asynchronous counter is slow. In a synchronous counter, all the ip- ops will
change states simultaneously while for an asynchronous counter, the propagation delays of
the ip- ops add together to produce the overall delay. Hence, the more bits or number of ip-
ops in an asynchronous counter, the slower it will be.
pattern of bits in each row of the table, it is apparent that bit Q0 changes on each clock cycle.
Bit QQ1 changes only when Q0 = 1. Bit Q2 changes only when both Q1 and Q0 are equal to 1. Bit
T0 = 1
T1 = Q0
T2 = Q0Q1
T3 = Q0Q1Q2
In Figure 5, instead of using AND gates of increased size for each stage, we use a factored
arrangement. This arrangement does not slow down the response of the
counter, because all ip- ops change their states after a propagation delay from the positive edge
of the clock. Note that a change in the value of Q0 may have to propagate through several AND
gates to reach the ip- ops in the higher stages of the counter, which requires a certain amount of
time. This time must not exceed the clock period. Actually, it must be 3less than the clock period
minus the setup time of the ip- ops. It shows that the circuit behaves as a modulo-16 up-
counter. Because all changes take place with the same delay after the active edge of the Clock
signal, the circuit is called a synchronous counter.
4.9 Shifters
4.9.1 Shifters :
Logical Shift:
Shifts number left or right and fills with 0’s
o 1011 LSR 1 = 0101 1011 LSL1 = 0110
Arithmetic Shift:
Shifts number left or right. Rt shift sign extends
o 1011 ASR1 = 1101 1011 ASL1 = 0110
Rotate:
Shifts number left or right and fills with lost bits
o 1011 ROR1 = 1101 1011 ROL1 = 0111
4.9.4
4.10 ALU:
An ALU is a Arithmetic Logic Unit that requires Arithmetic operations and Boolean operations.
Basically arithmetic operations are addition and subtraction. one may either multiplex between
an adder and a Boolean unit or merge the Boolean unit into the adder as in tha classic
transistor-transistor logic.
The heart of the ALU is a 4-bit adder circuit. A 4-bit adder must take sum of two 4-bit numbers,
and there is an assumption that all 4-bit quantities are presented in parallel form and that the
shifter circuit is designed to accept and shift a 4-bit parallel sum from the ALU. The sum is to be
stored in parallel at the output of the adder from where it is fed through the shifter and back to
the register array. Therefore, a single 4-bit data bus is needed from the adder to the shifter and
another 4-bit bus is required from the shifted output back to the register
The memory array is classified into 3 types - Random Access memory (RAM), Serial access
memory and content addressable memory (CAM). We will discuss each type in detail.
The basic idea of the memory that can only be read and never altered is called Read only
memories. There are vast and variety of potential applications for these kind of memories.
Programs for processors with fixed applications such as washing machines, calculators and
game machines, once developed and debugged, need only reading. Fixing the contents at
manufacturing time leads to small and fast implementation.
There are different ways to implement the logic of ROM cells, the fact that the contents of a
ROM cell are permanently fixed considerably simplifies its design. The cell should be
designed so that a „0‟ or „1‟ is presented to the bitline upon activation of its”wordline. The
different approaches for implementing the ROM cells are Diode ROM, MOS ROM 1 and
MOS ROM 2. These are the main approaches for designing a larger density ROMs.
NOR-based ROM
The building block of this ROM is a pseudo-nMOS NOR gate as in Figure 4.33
NOR-based ROM consists of m n-input pseudo-nMOS NOR gates, one n-input NOR
per column as shown in Figure 4.34.
Each memory cell is represented by one nMOS transistor and a binary information is
stored by connecting or not the drain terminal of such a transistor to the bit line.
For every row address only one word line is activated by applying a high signal to the
gates of nMOS transistors in a row.
If a selected transistor in the i-th column is connected to a bit line then the logic ‘0’ is
stored in this memory cell. if the transistor is not connected, then the logic ‘1’ is stored.
NAND-based ROM
A NAND-based ROM consists of m n-input pseudo-nMOS NAND gates, one n-input
NAND per column as shown in Figure 4.35. In this case, we have up to n serially
connected nMOS transistors in each column.
For every row address only one word line is activated by applying a low signal to the
gates of nMOS transistors in a row. When no word line is activated, all nMOS
transistors are on and the line signals, Ci are all low.
When a word line is activated all transistors in the row are switched off and the
respective Ci signals are high. If a transistor in the selected row is short-circuited, then
the respective Ci signal is low.
In other words, the logic ‘0’ is stored when a transistor is replaced with a wire, whereas
the logic ‘1’ is stored by an nMOS transistor being present.
4.12.2 Programmable ROM (PROM) :
The technology that offers its users to program the memory one time is called
Programmable ROM. It is also called as WRITE ONCE device. This is most often
accomplished by introducing fuses (implemented in nichrome, polysilicon, or other
conductors) in the memory cell. During the programming phase, some of these fuses
are blown by applying a high current, which disables the connected transistor.
While PROMs have the advantage of being “customer programmable,” the single write
phase makes them unattractive. For instance, a single error in the programming process
or application makes the device unstable. This explains the current preference for
devices that can be programmed several times.
The Floating-Gate transistor is the device at the heart of the majority of reprogrammable
memories. Various attempts have made to create a device with electrically alterable
characteristics and enough reliability to support a multitude of write cycles. The
floating gate structure is similar to a traditional MOS device, except that an extra
polysilicon strip is inserted between the gate and channel.
potharajuvidyasagar.wordpress.com COURSE MATERIAL
Data Path Subsystems VIDYA SAGAR P
This strip is not connected to anything and is called a floating gate. The most obvious
impact of inserting this extra gate is to double the gate oxide thickness tox, which results
in a reduced device transconductance as well as an increased threshold voltage. Though
these properties are not desirable but from other point of view this device acts as a
normal transistor.
The most important property of this device is that the threshold voltage of this device is
programmable. By applying a high voltage (above 10V) between the source and the
gate-drain terminals creates a high electric field and causes avalanche injection to occur.
Electrons acquire sufficient energy to become “hot” and traverse through the first oxide
insulator, so that they get trapped on the floating gate. In reference to the programming
mechanism, the floating-gate transistor is often called a floating-gate avalanche-
injection MOS.
The trapping of electrons on the floating gate effectively drops the voltage on the gate.
This process is self-limiting – the negative charge accumulated on the floating gate
reduces the electrical field over the oxide so that ultimately it becomes incapable of
accelerating any more hot electrons. Virtually all nonvolatile memories are currently
based on the floating-gate mechanism. Different classes can be identified, based on the
erasure mechanism.
The main advantage of this programming approach is that it is reversible; that is,
erasing is simply achieved by reversing the voltage applied during the writing process.
The electrons injection on floating-gate raises the threshold, while the reverse operation
lowers the VT. When a voltage of approximately 10V (equivalent to 109 V/m) is applied
over the thin insulator, electrons travel to and from the floating gate through a
mechanism called Fowler – Nordheim tunneling.
The monitoring control hardware on the memory chip regularly checks the value of the
threshold during erasure, and adjusts the erasure time dynamically. This approach is
only practical when erasing large chunks of memory at a time; hence the flash concept.
One of the many existing alternatives for Flash EEPROMs memories are ETOX devices.
It resembles a FAMOS gate except that a very thin tunneling gate oxide is utilized (10
nm). Different areas of the gate oxide are used for programming and erasure.
Programming is performed by applying a high voltage (12V) on the gate and drain
terminals for a grounded source, while erasure occurs with the gate rounded and the
source at 12V.
1. The array cells are programmed before applying the erase pulse so that the entire
threshold starts at approximately same time.
2. An erase pulse of controlled width is applied. Subsequently the whole array is read
to ensure that all the cells are erased. If not another erase pulse is applied followed by
the read cycle.
For write (programming) operation, a high voltage is applied to the gate of the selected
device. If a „1‟ is applied to the drain at that time, hot electrons are generated and injected
onto the floating gate, raising the threshold. Read operation corresponds as the
wordline is raised to 5V; it causes a conditional discharge of bitline.
The main advantages of RAM over types of storage which require physical movement
is that retrieval times are short and consistent.Short because no physical movement is
necessary and consistent the time taken to retrieve the data does not depend on the
current distance from a physical head. The access time for retrieving any piece of data
in RAM chip is same. The disadvantages are its cost compared to the physical moving
media and loss of data when power is turned off.
potharajuvidyasagar.wordpress.com COURSE MATERIAL
Data Path Subsystems VIDYA SAGAR P
RAM is used as 'main memory' or primary storage because of its speed and consistency.
The working area used for loading, displaying and manipulating applications and data.
In most personal computers, the RAM is not an integral part of the motherboard or
CPU. It comes in the easily upgraded form of modules called memory sticks. These can
quickly be removed and replaced when they are damaged or when the system needs up
gradation of memory depending on current purposes. A smaller amount of random-
access memory is also integrated with the CPU, but this is usually referred to as "cache"
memory, rather than RAM. Modern RAM generally stores a bit of data as either a
charge in a capacitor, as in dynamic RAM, or the state of a flip-flop, as in static RAM.
It thus typically takes six MOSFETs to store one memory bit. Access to the cell is
enabled by the word line WL which controls the two access transistors N 1 and N2
which, in turn, control whether the cell should be connected to the bitlines BL and /BL.
They are used to transfer data for both read and write operations. The bitlines are
complementary as it improves the noise margin. Chapter 2 explains more about SRAMs
and its Read/Write operations.
Figure 4.38: A CMOS static memory cell with column pull-up transistors and
parasitic column capacitances
For the read or write operations we select the cell asserting the word line signal
S=‘1’.For the write operation we apply a low voltage to one of the bit line, holding the
other one high. To write ‘0’ in the cell, the column voltage VC is forced to low(C = 0).
This low voltage acts through a related pass transistor (n3) on the gates of the
corresponding inverter (n2, p2) so that its input goes high. This sets the signal at the
other inverter Q = 0.
Similarly, to write ‘1’ in the cell, the opposite column voltage V C¯ is forced to low (C¯ =
0) which sets the signal Q = 1.During the read ‘1’ operation, when the stored bit is Q = 1,
transistors n3, p1 and n4, n2 are turned on. This maintains the column voltage VC at its
steady-state high level (say 3.5V) while the opposite column voltage VC¯ is being pulled
down discharging the column capacitance CC¯ through transistors n4, n2 so that VC >
VC¯. Similarly, during the read ‘0’ operation we have VC < VC¯. The difference between
the column voltages is small, say 0.5V, and must be detected by the sense amplifiers
from data-read circuitry.
The structure of the write circuitry associated with one column of the memory cells is
shown in Figure 4.39.
Figure 4.39: The structure of the write circuitry associated with one column of the
memory cells.
The transistor M3 is driven by the signal from the column decoder selecting the
specified column. The transistor M1 is on only in the presence of the write enable
signal.(W = 0) when the data bit to be written is ‘0’. The transistor M2 is on only in the
presence of the write signal ¯(W = 0) when the data bit to be written is ‘1’.
Figure 4.40: The structure of the write circuitry associated with one column of the
memory cells.
During the read operation the voltage level on one of the bit lines drops slightly after
the pass transistors in the memory cell are turned on.
TO READ:
BIT lines are charged high
Enable line WL is pulled high, switching access transistors M5 and M6 on`
If value stored in /Q is 0, value is accessed through access transistor M5 on /BL.
If value stored in Q is 1, charged value of Bit line BL is pulled up to VDD.
Value is ‘sensed’ on BL and /BL.
TO WRITE:
Apply value to be stored to Bit lines BL and /BL
Enable line WL is triggered and input value is latched into storage cell
BIT line drivers must be stronger than SRAM transistor cell to override previous
values
While Enable line is held low, the inverters retain the previous value Could use tri-state
WE line on BIT to drive into specific state. Transistor count per bit is only 6 + (line
drivers & sense logic).
Hence ever memory cell must be refreshed approximately every half millisecond.
Despite of the need for additional refreshing circuitry SRAM has two fundamental
features which have determined is enormous popularity:
• The DRAM cell occupies much smaller silicon area than the SRAM cell. The size of a
DRAM cell is in the order of 8F2, where F is the smallest feature size in a given
technology. For F = 0.2μm the size is 0.32μm2
• No static power is dissipated for storing charge in a capacitance. The storage
capacitance CS, which is connected between the drain of the access transistor (the
storage node) and the ground, is formed as a trench or stacked Capacitor.
The stacked capacitor is created between a second polysilicon layer and a metal plate
covering the whole array area. The plate is effectively connected to the ground
terminal.To consider read/write operations we have to take into account a significant
parasitic capacitance CC associated with each column, as shown in Figure 4.43.
RAS RAS
CAS CAS
ADDR
WE
Undefined
clk
Input
P0 P1 P2 P3
Figure 1.4 Serial-in-parallel-out Shift Register
4.17.3 Parallel-In-Serial-Out: The figure shown below is an example of Parallel-In-
Serial-Out shift register. P0, P1, P2 and P3 are the parallel inputs to the shift
register. When Shift = „0‟ the shift register loads all the inputs. When Shift = „1‟ the
inputs are shifted to right. This shift register shift one bit per cycle.
P0 P1 P2 P3
Shift/load
clk
Sout
Figure 1.5 Parallel-In-Serial-Out Shift Register
4.17.4 Queues:A queue is a pile in which items are added a one end and removed from
the other. In this respect, a queue is like the line of customers waiting to be served by
a bank teller. As customers arrive, they join the end of the queue while the teller
serves the customer at the head of the queue. The major advantage of queue is that
they allow data to be written at different rates. The read and write use their own
clock and data. There is an indication in queue when it is full or empty. These kind of
queues usually built with SRAM and counters. There are two types of queues they
are First-In-First-Out and Last-In First-Out.
potharajuvidyasagar.wordpress.com COURSE MATERIAL
Data Path Subsystems VIDYA SAGAR P
4.17.6 Last-In-First-Out: It is also called as stack; objects which are stored in a stack are
kept in a pile. The last item put into the stack is at the top. When an item is pushed
into a stack, it is placed at the top of the pile. When an item popped, it is always the
top item which is removed. Since it is always the last item to be put into the stack
that is the first item to be removed, it is last-in, first-out.
The Figure 1.7 is an example of 512-word CAM architecture. It supports three modes of
operation read, write and match. The read and write modes access and manipulate the
data same as in an ordinary memory. The match mode is a special function of
associative memory. The data patterns are stored in the comparand block which are
needed to match and the mask word indicated which bits are significant. Every row that
matches the pattern is passed to the validity block.