0% found this document useful (0 votes)
105 views83 pages

ch05 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views83 pages

ch05 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 83

Digital Design

Chapter 5:
Register-Transfer Level
(RTL) Design
Slides to accompany the textbook Digital Design, with RTL Design, VHDL,
and Verilog, 2nd Edition,
by Frank Vahid, John Wiley and Sons Publishers, 2010.
http://www.ddvahid.com

Copyright © 2010 Frank Vahid


Instructors of courses requiring Vahid's Digital Design textbook (published by John Wiley and Sons) have permission to modify and use these slides for customary course-related activities,
subject to keeping this copyright notice in place and unmodified. These slides may be posted as unanimated pdf versions on publicly-accessible course websites.. PowerPoint source (or pdf
Digital
with animations) may Design 2e
not be posted to publicly-accessible websites, but may be posted for students on internal protected sites or distributed directly to students by other electronic means.
Copyright © 2010 1
Instructors may make printouts of the slides available to students for a reasonable photocopying charge, without incurring royalties. Any other use requires explicit permission. Instructors
Frank Vahid
may obtain PowerPoint source or obtain special use permissions from Wiley – see http://www.ddvahid.com for information.
5.1

Introduction
• Chpt 2

Higher levels
Register-
– Capture Comb. behavior: Equations, truth tables transfer
– Convert to circuit: AND + OR + NOT  Comb. logic level (RTL)
• Chpt 3 Logic level
– Capture sequential behavior: FSMs
Transistor level
– Convert to circuit: Register + Comb. logic  Controller
• Chpt 4 Levels of digital
– Datapath components, simple datapaths design abstraction

• Chpt 5
– Capture behavior: High-level state machine Processors:
– Convert to circuit: Controller + Datapath  Processor • Programmable
– Known as “RTL” (register-transfer level) design (microprocessor)
• Custom
Digital Design 2e
Copyright © 2010 2
Frank Vahid
Note: Slides with animation are denoted with a small red "a" near the animated items
5.2

High-Level State Machines (HLSMs)


• Some behaviors too complex for
equations, truth tables, or FSMs s a
• Ex: Soda dispenser
– c: bit input, 1 when coin deposited
– a: 8-bit input having value of c Soda
deposited coin d dispenser
– s: 8-bit input having cost of a soda processor
– d: bit output, processor sets to 1
when total value of deposited coins
equals or exceeds cost of a soda s a 25
• FSM can’t represent… 25
– 8-bit input/output 0 1 0 1 0 50
– Storage of current total c Soda
d
tot:
tot:
– Addition (e.g., 25 + 10) dispenser a

0 1 0 processor 50
25

Digital Design 2e
Copyright © 2010 3
Frank Vahid
HLSMs s
8
a
8

• High-level state machine c Soda


(HLSM) extends FSM with: d dispenser
processor
– Multi-bit input/output
– Local storage
– Arithmetic operations
Inputs: c (bit), a (8 bits), s (8 bits)
• Conventions Outputs: d (bit) // '1' dispenses soda
– Numbers: Local storage: tot (8 bits)
• Single-bit: '0' (single quotes) c
• Integer: 0 (no quotes) Add
• Multi-bit: “0000” (double quotes) Init Wait
tot:=tot+a
a – == for equal, := for assignment d:='0' c'(tot<s)
c’*(tot<s)’
– Multi-bit outputs must be tot:=0
registered via local storage Disp

– // precedes a comment SodaDispenser d:='1'


Digital Design 2e
Copyright © 2010 4
Frank Vahid
Ex: Cycles-High Counter
• P = total number (in binary) of cycles that m is 1
CountHigh
• Capture behavior as HLSM m
clk Preg

– Preg required (multibit outputs must be registered) 32


• Use to hold count P
CountHigh Inputs: m (bit) CountHigh Inputs: m (bit) CountHigh Inputs: m (bit)
Outputs: P (32 bits) Outputs: P (32 bits) Outputs: P (32 bits)
Local storage: Preg Local storage: Preg Local storage: Preg
S_Clr // Clear Preg to 0s S_Clr // Clear Preg to 0s S_Clr
// Clear Preg to 0s
Preg := 0 Preg := 0 Preg := 0
a

? m' // Wait for m == '1' m' // Wait for m == '1'


S_Wt S_Wt

m m' m

? // Increment Preg
m S_Inc Preg := Preg + 1
(a) (b) (c)

Digital Design 2e Note: Could have designed directly using an up-counter. But, that methodology
Copyright © 2010
is ad hoc, and won't work for more complex examples, like the next one. a 5
Frank Vahid
Example: Laser-Based Distance Measurer
T (in seconds)
laser
D
Object of
a
interest
sensor
2D = T sec * 3*108 m/sec

• Laser-based distance measurement – pulse laser,


measure time T to sense reflection
– Laser light travels at speed of light, 3*108 m/sec
– Distance is thus D = (T sec * 3*108 m/sec) / 2

Digital Design 2e
Copyright © 2010 6
Frank Vahid
Example: Laser-Based Distance Measurer
T (in seconds)
B L
laser from button to laser
Laser-based
distance
sensor D 16 measurer S
to display from sensor

• Inputs/outputs
– B: bit input, from button, to begin measurement
– L: bit output, activates laser
– S: bit input, senses laser reflection
– D: 16-bit output, to display computed distance

Digital Design 2e
Copyright © 2010 7
Frank Vahid
Example: Laser-Based Distance Measurer
DistanceMeasurer from button B Laser-
L
to laser
Inputs: B (bit), S (bit) based
Outputs: L (bit), D (16 bits) distance
D 16 measurer S
Local storage: Dreg(16) to display from sensor
(required)
a
S0 ?
(first state usually
L := '0' // laser off initializes the system)
Dreg := 0 // distance is 0

• Declare inputs, outputs, and local storage


– Dreg required for multi-bit output
• Create initial state, name it S0
– Initialize laser to off (L:='0') Recall: '0' means single bit,
– Initialize displayed distance to 0 (Dreg:=0) 0 means integer

Digital Design 2e
Copyright © 2010 8
Frank Vahid
Example: Laser-Based Distance Measurer
from button B Laser-
L
to laser
DistanceMeasurer based
... B' // button not pressed distance
D 16 measurer S
to display from sensor

S0 S1 ?
B
L := '0' // button
Dreg := 0 pressed

• Add another state, S1, that waits for a button press


– B' – stay in S1, keep waiting
– B – go to a new state S2

Q: What should S2 do? A: Turn on the laser


a
Digital Design 2e
Copyright © 2010 9
Frank Vahid
Example: Laser-Based Distance Measurer
from button B Laser-
L
to laser
based
DistanceMeasurer distance
... B' D 16 S
to display measurer
from sensor

S0 S1 S2 S3
B
L := '0' L := '1' L := '0'
Dreg := 0 // laser on // laser off

• Add a state S2 that turns on the laser (L:='1')


• Then turn off laser (L:='0') in a state S3

Q: What do next? A: Start timer, wait to sense reflection


a

Digital Design 2e
Copyright © 2010 10
Frank Vahid
Example: Laser-Based Distance Measurer
B L
fr om button to laser
DistanceMeasurer Inputs: B (bit), S (bit) Outputs: L (bit), D (16 bits) Laser-based
Local storage: Dreg, Dctr (16 bits) 16
distance
D measurer S
B' t o display from sensor
S' // no reflection

S // reflection
S0 S1 S2 S3 ?
B
L := '0' Dctr := 0 L := '1' L := '0'
Dreg := 0 // reset cycle Dctr := Dctr + 1
count // count cycles
a

• Stay in S3 until sense reflection (S)


• To measure time, count cycles while in S3
– To count, declare local storage Dctr
– Initialize Dctr to 0 in S1. In S2 would have been O.K. too.
• Don't forget to initialize local storage—common mistake
– Increment Dctr each cycle in S3
Digital Design 2e
Copyright © 2010 11
Frank Vahid
Example: Laser-Based Distance Measurer
B L
from button Laser- t o laser
DistanceMeasurer Inputs: B (bit), S (bit) Outputs: L (bit), D (16 bits) based
Local storage: Dreg, Dctr (16 bits) distance
D 16 S
to display measurer
from sensor
B' S'

S0 S1 S2 S3 S4
B S
L := '0' Dctr := 0 L := '1' L := '0' Dreg := Dctr/2
Dreg := 0 Dctr := Dctr+1 // calculate D

• Once reflection detected (S), go to new state S4


– Calculate distance
– Assuming clock frequency is 3x108, Dctr holds number of meters, so
Dreg:=Dctr/2
• After S4, go back to S1 to wait for button again
Digital Design 2e
Copyright © 2010 12
Frank Vahid
HLSM Actions: Updates Occur Next Clock Cycle
S'

• Local storage updated on clock edges only


S3
– Enter state on clock edge S
Dctr := Dctr+1
– Storage writes in that state occur on next clock edge
– Can think of as occurring on outgoing transitions S' / Dctr := Dctr+1

• Thus, transition conditions use the OLD value,


not the newly-written value S3
S/
Dctr := Dctr+1
– Example:
Inputs: B (bit)
Outputs: P (bit) // if B, 2 cycles high S0 S1 S1 S0
Local storage: Jreg (8 bits) clk

B' B
!(Jreg<2) Jreg<2
1 2 3
S0 S1 Jreg ? 1 2 3
B
P := '0' P := '1'
Jreg := 1 Jreg := Jreg + 1 P
Digital Design 2e
Copyright © 2010
(a) (b) 13
Frank Vahid
5.3

RTL Design Process


External data
• Capture behavior inputs
DP ...
• Convert to circuit External control
inputs
control
– Need target architecture inputs
... ...
Controller Datapath
– Datapath capable of HLSM's External ...
...
control
data operations outputs DP
– Controller to control datapath control ...
outputs
External data
outputs

Digital Design 2e
Copyright © 2010 14
Frank Vahid
Ctrl/DP Example for Earlier Cycles- CountHigh a

High Counter m 000...00001


Create DP
A B
add1
CountHigh
S
Connect
First clear Preg to 0s
m
Preg ? 32 with
Then increment Preg for each Preg_clr
clr I
controller
clock cycle that m is 1 ld Preg
(a) Preg_ld Q
P
(c) DP
Derive
P 32
CountHigh Inputs: m (bit) controller
Outputs: P (32 bits)
LocStr: Preg (32 bits) CountHigh
//Clear Preg to 0s
S_Clr Preg := 0 m 000...00001
We //Preg := 0
S_Clr Preg_clr = 1 A B
created Preg_ld = 0 add1
this S
m' //Wait for m=='1'
HLSM S_Wt 32
//Wait for m=1 Preg_clr
earlier m' S_Wt Preg_clr = 0 clr I
m' m ld Preg
Preg_ld = 0
Preg_ld Q
//Increment Preg m' m
m S_Inc Preg := Preg + 1 DP
//Preg:=Preg+1
a
m S_Inc Preg_clr = 0
(b) Preg_ld = 1 a

Digital Design 2e
Controller
Copyright © 2010 15
Frank Vahid (d) 32
P
RTL Design Process

Digital Design 2e
Copyright © 2010 16
Frank Vahid
Example: Soda Dispenser from Earlier
s a
• Quick overview example.
More details of each step to come.
tot_ld ld
tot
tot_clr clr
a
8
Inputs: c (bit), a (8 bits), s (8 bits) 8 8
Outputs: d (bit) // '1' dispenses soda
Local storage: tot (8 bits) 8-bit
tot_lt_s 8-bit
c < adder
Add 8
Datapath
Init Wait
tot:=tot+a
Step 2A
d:='0' c'(tot<s) s a
c’*(tot<s)’
tot:=0 8 8
Disp
c
SodaDispenser d:='1' tot_ld
d a
tot_clr
Step 1 Controller Datapath
tot_lt_s
Digital Design 2e
Copyright © 2010
Frank Vahid
Step 2B 17
Example: Soda Dispenser
s a
• Quick overview example. 8 8
More details of each step to come. c
tot_ld
d
tot_clr
Inputs: c (bit), a (8 bits), s (8 bits) Controller Datapath
Outputs: d (bit) // '1' dispenses soda tot_lt_s
Local storage: tot (8 bits)
Step 2B
c
Add
Inputs: c, tot_lt_s (bit)
Init Wait Outputs: d, tot_ld, tot_clr (bit)
tot:=tot+a c tot_ld
c
d:='0' c'(tot<s) d
Add tot_clr
c’*(tot<s)’
tot:=0 Init Wait
tot_ld=1 tot_lt_s
Disp
d=0 c' 
tot_lt_s’
c tot_lt_s
tot_clr=1
SodaDispenser d:='1'
Disp

Step 1 Controller d=1

Digital Design 2e
Copyright © 2010 Step 2C 18
Frank Vahid
Example: Soda Dispenser
• Quick overview example.
Inputs: c, tot_lt_s (bit)
More details of each step to come. Outputs: d, tot_ld, tot_clr (bit)
c tot_ld
c
Add tot_clr
d
tot_lt_s

tot_clr
Init Wait

tot_ld
tot_ld=1 tot_lt_s
s1 s0 c n1 n0 d
d=0 c 
tot_lt_s
c tot_lt_s
0 0 0 0 0 1 0 0 1 tot_clr=1
0 0 0 1 0 1 0 0 1  Disp
Init

0 0 1 0 0 1 0 0 1
Controller d=1
0 0 1 1 0 1 0 0 1
0 1 0 0 1 1 0 0 0
0 1 0 1 0 1 0 0 0 Step 2C
Wait

0 1 1 0 1 0 0 0 0
0 1 1 1 1 0 0 0 0 Use controller design process
1 0 0 0 0 1 0 1 0
(Ch3) to complete the design
Add

1 1 0 0 0 0 1 0 0
Disp

Digital Design 2e
Copyright © 2010 19
Frank Vahid
RTL Design Process—Step 2A: Create a datapath
• Sub-steps
– HLSM data inputs/outputs  Datapath inputs/outputs.
– HLSM local storage item  Instantiated register
• "Instantiate": Add new component ("instance") to design
– Each HLSM state action and transition condition data computation 
Datapath components and connections
• Also instantiate multiplexors as needed
• Need component library from which to choose

clr I A B A B I I1 I0
ld reg add cmp shift<L/R> mux2x1
Q S lt eq gt Q s0 Q

clk^ and clr=1: Q=0 S = A+B (unsigned) shiftL1: <<1 s0=0: Q=I0
clk^ and ld=1: Q=I A<B: lt=1 shiftL2: <<2 s0=1: Q=I1
else Q stays same A=B: eq=1 shiftR1: >>1
A>B: gt=1 ...
Digital Design 2e
Copyright © 2010 20
Frank Vahid
Step 2A: Create a Datapath—Simple Examples
X Y Z X X Y Z X Y Z

Preg = X + Y + Z Preg = Preg + X k k=0: Preg = Y + Z


Preg=X+Y; regQ=Y+Z
k=1: Preg = X + Y
Preg Preg Preg regQ Preg

P P P Q P
(a) (b) (c) (d)
X Y Z X X Y Z X Y Z

DP DP
A B
add1 A B A B A B
S add1 A B A B add1 add2
S add1 add2 S S
X+Y S S

A B 0 clr I I1 I0
add2 1 ld Preg 0 clr I 0 clr I mux2x1
S Q 1 ld Preg 1 ld regQ k s0 Q
X+Y+Z Q Q a

0 clr I 0 clr I
1 ld Preg P P Q 1 ld Preg
Q Q
DP DP

P P
Digital Design 2e
Copyright © 2010 21
Frank Vahid
Laser-Based Distance Measurer—Step 2A: Create a Datapath
DistanceMeasurer Inputs: B (bit), S (bit) Outputs: L (bit), D (16 bits)
Local storage:

B' S'

S0 S1 S2 S3 S4
B S
L := '0' Dctr := 0 L := '1' L := '0' Dreg := Dctr/2
Dreg := 0 Dctr := Dctr+1 // calculate D

1 Datapath
16
a
A B
• HLSM data I/O  DP I/O Add1: add(16) 16 I
S Shr1: shiftR1(16)
• HLSM local storage  reg Dreg_clr 16 Q
• HLSM state action and Dreg_ld 16

transition condition data Dctr_clr clr I clr I


computation  Datapath Dctr_ld ld Dctr: reg(16) ld Dreg: reg(16)
components and connections Q Q
16
D
Digital Design 2e
Copyright © 2010 22
Frank Vahid
Laser-Based Distance Measurer—Step 2B: Connecting the Datapath to a
Controller

L
B to laser
from button
Controller
from sensor
Dreg_clr S

Dreg_ld

Dctr_clr Datapath

Dctr_ld
D
to display
16
300 MHz Clock a

Digital Design 2e
Copyright © 2010 23
Frank Vahid
Laser-Based Distance Measurer—Step 2C: Derive the Controller FSM
1 Datapath

HLSM
16

A B
DistanceMeasurer Inputs: B (bit), S (bit) Outputs: L (bit), D (16 bits) Add1: add(16) 16 I
S Shr1: shiftR1(16)
Local storage: Q
Dreg_clr 16
Dreg_ld 16
B' S'
Dctr_clr clr I clr I
Dctr_ld ld Dctr: reg(16) ld Dreg: reg(16)
S0 S1 S2 S3 S4 Q Q
B S
16
L := '0' Dctr := 0 L := '1' L := '0' Dreg := Dctr/2 D
Dreg := 0 Dctr := Dctr+1 // calculate D

Controller Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_ld


• FSM has same
states, B S
transitions, and
control I/O a
B S
• Achieve each S0 S1 S2 S3 S4

HLSM data L=0 L=0 L=1 L=0 L=0


operation using Dreg_clr = 1
Dreg_ld = 0
Dreg_clr = 0
Dreg_ld = 0
Dreg_clr = 0
Dreg_ld = 0
Dreg_clr = 0
Dreg_ld = 0
Dreg_clr = 0
Dreg_ld = 1
datapath control Dctr_clr = 0 Dctr_clr = 1 Dctr_clr = 0 Dctr_clr = 0 Dctr_clr = 0
Dctr_ld = 0 Dctr_ld = 0 Dctr_ld = 0 Dctr_ld = 1 Dctr_ld = 0
signals in FSM (laser off) (clear count) (laser on) (laser off) (load Dreg with Dctr/2)
(clear Dreg) (count up) (stop counting)
Digital Design 2e
Copyright © 2010 24
Frank Vahid
Laser-Based Distance Measurer—Step 2C: Derive the Controller FSM

Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_ld

B S

B S
S0 S1 S2 S3 S4

L=0 Dctr_clr = 1 L=1 L=0 Dreg_ld = 1


Dreg_clr = 1 (clear count) (laser on) Dctr_ld = 1 Dctr_ld = 0
(laser off) (laser off) (load Dreg with Dctr/2)
(clear Dreg) (count up) (stop counting)

• Same FSM, using


convention of Some assignments to 0 still shown, due
unassigned to their importance in understanding
outputs implicitly desired controller behavior
assigned 0
Digital Design 2e
Copyright © 2010 25
Frank Vahid
5.4

More RTL Design


• Additional datapath components

W_d
A B A B A clr W_a
sub mul abs inc upcnt W_e
clk^ and W_e=1:
S P Q Q RF[W_a]= W_d
RF
R_a R_e=1:
R_e R_d = RF[R_a]
S = A-B P = A*B Q = |A| clk^ and clr=1: Q=0 R_d
(signed) (unsigned) clk^ and inc=1: Q=Q+1
else Q stays same

Digital Design 2e
Copyright © 2010 26
Frank Vahid
RTL Design Involving Register File or Memory
• HLSM array: Ordered list of items
– Ex: Local storage: A[4](8-bit) – 4 8-bit items
– Accessed using notation "A[i]", i is index
– A[0] := 9; A[1] := 8; A[2] := 7; A[3] := 22
• Array contents now: <9, 8, 7, 22>
• X := A[1] will set X to 8
• Note: First element's index is 0
• Array can be mapped to instantiated register file or memory

Digital Design 2e
Copyright © 2010 27
Frank Vahid
ArrayEx Inputs: (none)

Simple Array Example


Outputs: P (11 bits)
Local storage: A[4](11 bits)
Preg (11 bits)
Init1 Preg := 0
A[0] := 9

(A[0] == 8)' a

Init2 A[1] := 12
12 9
11 11
A[0] == 8
I1 I0
A_s Amux
Out1 Preg := A[1] s0 Q

8
(a) A_Wa0 W_d
A_Wa1 W_a
ArrayEx Inputs: A_eq_8 A_We W_e A B
A_Ra0 A
Outputs: A_s, A_Wa0, ... Acmp
A_Ra1 R_a RF[4](11)
lt eq gt
Preg_clr = 1 A_Re R_e
A_s = 0 R_d
Init1 A_Wa1=0, A_Wa0=0
A_We = 1 A_eq_8
(A_eq_8)'
Preg_clr
A_s = 1 clr I
Init2 ld Preg
A_Wa1=0, A_Wa0=1 Preg_ld Q
A_We = 1 DP
A_Ra1=0, A_Ra0=0
A_eq_8 A_Re = 1 (b) 11
P

Out1 Preg_ld = 1
Digital Design 2e Controller
Copyright © 2010 28
Frank Vahid (c)
a
RTL Example: Video Compression – Sum of Absolute Differences
Only difference: ball moving
Frame 1 Frame 2 Frame 1 Frame 2

Digitized Digitized Digitized Difference of a


frame 1 frame 2 frame 1 2 from 1

1 Mbyte 1 Mbyte 1 Mbyte 0.01 Mbyte


(a) (b)
Just send
• Video is a series of frames (e.g., 30 per second) difference
• Most frames similar to previous frame
– Compression idea: just send difference from previous frame
Digital Design 2e
Copyright © 2010 29
Frank Vahid
RTL Example: Video Compression – Sum of Absolute Differences
compare Each is a pixel, assume
Frame 1 Frame 2
represented as 1 byte
(actually, a color picture
might have 3 bytes per
pixel, for intensity of
red, green, and blue
components of pixel)
• Need to quickly determine whether two frames are similar
enough to just send difference for second frame
– Compare corresponding 16x16 “blocks”
• Treat 16x16 block as 256-byte array
– Compute the absolute value of the difference of each array item
– Sum those differences – if above a threshold, send complete frame
for second frame; if below, can use difference method (using
another technique, not described)
Digital Design 2e
Copyright © 2010 30
Frank Vahid
Array Example: Video Compression—Sum-of-Absolute Differences

A SAD Inputs: A, B [256](8 bits); go (bit)


RF[256](8) Outputs: sad (32 bits)
Local storage: sum, sadreg (32 bits); i (9 bits)
B sad
RF[256](8) S0 !go
go go
sum := 0
S1
i := 0

a
S2
• S0: wait for go i<256
• sum:=sum+abs(A[i]-B[i])
S1: initialize sum and index

(i<256)’
S3
i := i + 1
• S2: check if done ( (i<256)’ )
• S4 sadreg := sum
S3: add difference to sum,
increment index
(b)
• S4: done, write to output sad_reg
Digital Design 2e
Copyright © 2010 31
Frank Vahid
Inputs: A, B [256](8 bits); go (bit)
Outputs: sad (32 bits)
Local storage: sum, sadreg (32 bits); i (9 bits)
Array Example: Video Compression—
S0
go
!go
Sum-of-Absolute Differences
sum := 0
S1
i := 0

S2
!(i<256) i<256
sum:=sum+abs(A[i]-B[i])
S3
i := i + 1

S4 sadreg := sum

go AB_rd AB_addr A_data B_data

i_lt_256 A
lt 8 8
S0 go cmp B
256 9 a
go i_inc
A B
S1
sum=0 sum_clr=1
i_clr
i –
i=0 i_clr=1
8
S2 sum_ld
(i<256)’ (i_lt_256)’

i<256 i_lt_256 sum 32 abs


sum_clr
S3 sum=sum+abs(A[i]-B[i])
sum_ld=1; AB_rd=1 32 32 8
i=i+1 i_inc=1 sadreg_ld
S4 sad_reg = sum
sadreg_clr
sadreg +
sadreg_ld=1
Controller Datapath 32
Digital Design 2e
Copyright © 2010 sad 32
Frank Vahid
Circuit vs. Microprocessor
• Circuit: Two states (S2 & S3) for each i, 256 i’s 512 clock cycles
• Microprocessor: Loop (for i = 1 to 256), but for each i, must move
memory to local registers, subtract, compute absolute value, add to
sum, increment i – say 6 cycles per array item  256*6 = 1536
cycles
• Circuit is about 3 times (300%) faster (assuming equal cycle lengths)
• Later, we’ll see how to build SAD circuit that is much faster

(i<256)’
S2
i<256
sum:=sum+abs(A[i]-B[i])
S3
i:=i+1

Digital Design 2e
Copyright © 2010 33
Frank Vahid
Common RTL Design Pitfall Involving Storage Updates
• Questions Local storage: R, Q (8 bits)
– Value of Q after state A?
R<100 C
– Final state is C or D?
A B (R<100)'
• Answers
– Q is NOT 99 after state A R:=99 R:=R+1 D
Q:=R
– Q is 99 in state B, so final state is C
– Storage update actions in state R<100
occur simultaneously on next clock clk A B C
edge 99 100
• Thus, order actions are written is R ? 99 100 a

irrelevant
Q ? ? ?
• A's actions same if:
– Q:=R R:=99 or
– R:=99 Q:=R
Digital Design 2e
Copyright © 2010 34
Frank Vahid
Common RTL Design Pitfall Involving Storage Updates

• New HLSM Local storage : R, Q (8 bits)


using extra R<100 C
state so read of B2
A B (R<100)'
R occurs after
write of R R:=99 R:=R+1 D
Q:=R Q:=R

R<100 (R<100)'

clk A B B2 D
99 100
R ? 99 100 100

Q ? ? 99 99

Digital Design 2e
Copyright © 2010 35
Frank Vahid
RTL Design Involving a Timer
• Commonly need explicit time intervals L
– Ex: Repeatedly blink LED on 1 second, off 1 second
• Pre-instantiate timer that HLSM can then use

BlinkLed Timer: T Outputs: L (bit)


BlinkLed
T_M 32
T_ld M T_Q' T_Q'
load
T_Q T_Q
enable 32-bit
T_en 1-microsec Init Off On
Q timer T L:='0' L:='0' L:='1' a
T:=1000000 T_en:='1' T_en:='1'
T_Q T_en:='0'
L
a (a) (b)
Pre-instantiated timer HLSM making use of timer
Digital Design 2e
Copyright © 2010 36
Frank Vahid
Button Debouncing
• Press button
– Ideally, output changes to 1 button

– Actually, output bounces B B


• Due to mechanical reasons 0 1
• Like ball bouncing when dropped to
Ideal: B
floor
Actual: B
• Digital circuit can convert actual bounce
signal closer to ideal signal
ButtonDebouncer Inputs: Bin (bit) Outputs: Bout (bit)
Timer: T
Bin' T_Q' Bin
Bin Bin'
Init WaitBin Wait20 WhileBin
Bout :='0' Bout:='0' Bout:='1' Bout:='1'
T:=20000 T_en:='0' T_en:='1' T_en:='0'
Digital Design 2e T_en:='0'
Copyright © 2010 37
Frank Vahid
a
Data Dominated RTL Design Example
• Data dominated design: Extensive DP,
simple controller
• Control dominated design: Complex
controller, simple DP
• Example: Filter
– Converts digital input stream to new X Y
12 digital filter 12
digital output stream
clk
– Ex: Remove noise
• 180, 180, 181, 180, 240, 180, 181
• 240 is probably noise, filter might replace
by 181
– Simple filter: Output average of last N
values
• Small N: less filtering
• Large N: more filtering, but less sharp
output

Digital Design 2e
Copyright © 2010 38
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• FIR filter X Y

– “Finite Impulse Response” 12 digital filter 12

clk
– Simply a configurable weighted
sum of past input values
– y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2) y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Above known as “3 tap”
Inputs: X (12 bits) Outputs: Y (12 bits)
• Tens of taps more common Local storage: xt0, xt1, xt2, c0, c1, c2 (12 bits);
• Very general filter – User sets
the constants (c0, c1, c2) to
define specific filter Init FC

• RTL design Yreg := 0


xt0 := 0
Yreg :=
c0*xt0 +
– Step 1: Create HLSM xt1 := 0 c1*xt1 +
xt2 := 0 c2*xt2
• Very simple states/transitions c0 := 3 xt0 := X
c1 := 2 xt1 := xt0
FIR filter c2 := 2 xt2 := xt1
Digital Design 2e
Copyright © 2010 Assume constants set to 3, 2, and 2 39
Frank Vahid
FIR Filter Inputs: X (12 bits) Outputs: Y (12 bits)
Local storage: xt0, xt1, xt2, c0, c1, c2 (12 bits);
Yreg (12 bits) • Step 2A: Create datapath
Init FC
• Step 2B: Connect Ctrlr/DP (as
Yreg := 0 Yreg :=
earlier examples)
xt0 := 0 c0*xt0 +
xt1 := 0 c1*xt1 + • Step 2C: Derive FSM
xt2 := 0 c2*xt2
c0 := 3
c1 := 2
xt0 := X
xt1 := xt0
– Set clr and ld lines appropriately
FIR filter c2 := 2 xt2 := xt1

3 2 2

c0_ld c1_ld c2_ld


xt0_clr
c0 c1 c2

...
xt0_ld
...

xt0 xt1 xt2


X
12
clk
x(t) x(t-1) x(t-2)
* * *

Yreg_clr
+ + Yreg_ld
Y
Yreg
Datapath for 3-tap FIR filter

Digital Design 2e
Copyright © 2010 40
Frank Vahid
Circuit vs. Microprocessor y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)

• Comparing the FIR circuit to microprocessor instructions


– Microprocessor
• 100-tap filter: 100 multiplications, 100 additions. Say 2 instructions
per multiplication, 2 per addition. Say 10 ns per instruction.
• (100*2 + 100*2)*10 = 4000 ns
– Circuit
• Assume adder has 2 ns delay, multiplier has 20 ns delay
• Longest path goes through one multiplier and two adders
– 20 + 2 + 2 = 24 ns delay
• 100-tap filter, following design on previous slide, would have about a
34 ns delay: 1 multiplier and 7 adders on longest path
– Circuit is more than 100 times faster (4000/34). Wow.

Digital Design 2e
Copyright © 2010 41
Frank Vahid
5.5

Determining Clock Frequency


• Designers of digital circuits
often want fastest
performance clk a b
– Means want high clock
frequency
• Frequency limited by longest
register-to-register delay
2 ns 
delay
– Known as critical path
– If clock is any faster, incorrect
data may be stored into register c
– Longest path on right is 2 ns
• Ignoring wire delays, and
register setup and hold times,
for simplicity

Digital Design 2e
Copyright © 2010 42
Frank Vahid
Critical Path
• Example shows four paths
– a to c through +: 2 ns
– a to d through + and *: 7 ns a b
– b to d through + and *: 7 ns
– b to d through *: 5 ns
• Longest path is thus 7 ns
2 ns
delay
  5 ns
delay
• Fastest frequency

2 ns
2 ns

7 ns
7 ns
Max
– 1 / 7 ns = 142 MHz (2,7,7,5)
= 7 ns c d
a

Digital Design 2e
Copyright © 2010 43
Frank Vahid
Critical Path Considering Wire Delays
• Real wires have delay too
– Must include in critical path
• Example shows two paths clk a b
– Each is 0.5 + 2 + 0.5 = 3 ns
0.5 ns
• Trend 0.5 ns
– 1980s/1990s: Wire delays were tiny
compared to logic delays  2 ns
– But wire delays not shrinking as fast as a
logic delays 0.5 ns

3 ns

3 ns
• Wire delays may even be greater than
logic delays! c
• Must also consider register setup and
hold times, also add to path
• Then add some time to the computed
path, just to be safe
– e.g., if path is 3 ns, say 4 ns instead
Digital Design 2e
Copyright © 2010 44
Frank Vahid
A Circuit May Have Numerous Paths
• Paths can exist s a

– In the datapath Combinational logic 8 8


d
– In the controller
– Between the tot_ld
ld
controller and t ot_clr tot
c clr
datapath
(c ) 8
– May be tot_lt_s
n1

hundreds or
thousands of n0
8-bit 8-bit
< adder
paths tot_lt_s 8

• Timing analysis Datapath a

tools that evaluate s1 s0


(b ) (a)
all possible paths clk State register

automatically very
helpful
Digital Design 2e
Copyright © 2010 45
Frank Vahid
5.7

Memory Components
• RTL design instantiates
datapath components to
create datapath, controlled
by a controller
– Some components are used

M words
outside the controller and DP
• MxN memory
– M words, N bits wide each
• Several varieties of memory,
N-bits
which we now introduce wide each
M× N memory

Digital Design 2e
Copyright © 2010 51
Frank Vahid
Random Access Memory (RAM)
• RAM – Readable and writable memory 32 32
W_data R_data
– “Random access memory” 4 4
• Strange name—Created several decades ago to W_addr R_addr
contrast with sequentially-accessed storage like W_en R_en
tape drives 16×32
register file
– Logically same as register file—Memory with
address inputs, data inputs/outputs, and control Register file from Chpt. 4
• RAM usually one port; RF usually two or more
– RAM vs. RF
32
• RAM typically larger than about 512 or 1024 words data
10
• RAM typically stores bits using a bit storage addr
approach that is more efficient than a flip-flop 1024 × 32
rw RAM
• RAM typically implemented on a chip in a square
en
rather than rectangular shape—keeps longest
wires (hence delay) short
RAM block symbol

Digital Design 2e
Copyright © 2010 52
Frank Vahid
RAM Internal Structure
Let A = log2M wdata(N-1) wdata(N-2) wdata0
32
data
10 word bit storage
addr block
1024x32 enable
rw RAM d0 (aka “cell”)
en
addr0 a0 word
addr1 a1 AxM
d1
decoder
addr(A-1) a(A-1) data cell
word word
e d(M-1) enable enable
clk
Combining rd and wr rw data
en
data lines rw to all cells
wdata

rdata0

wdata0
rdata
(N-1)

(N-1)

rdata(N-1) rdata(N-2) rdata0 RAM cell


rw

data(N-1) data0 • Similar internal structure as register file


– Decoder enables appropriate word based on address inputs
– rw controls whether cell is written or read
Digital Design 2e
– rd and wr data lines typically combined
Copyright © 2010
– Let’s see what’s inside each RAM cell 53
Frank Vahid
Static RAM (SRAM) SRAM cell
data data’

wdata(N-1) wdata(N-2) wdata0


cell
32 Let A = log2 M d d’
data word bit storage
10 enable block ,, ,,
addr d0 (aka cell )
1024x32 a0
a
addr0 word
rw RAM addr1 a1 A× M
d1

addr
decoder
en addr(A-1) a(A-1) data cell

word word word 0


e d(M-1) enable enable
clk
rw data
enable
en
rw to all cells

rdata(N-1) rdata(N-2) rdata0 SRAM cell


data data’
• “Static” RAM cell 1 0
– 6 transistors (recall inverter is 2 transistors) d
a
– Writing this cell 1 0
• word enable input comes from decoder
• When 0, value d loops around inverters 1
word
– That loop is where a bit stays stored enable
• When 1, the data bit value enters the loop
data data’
– data is the bit to be stored in this cell
cell
– data’ enters on other side d d’
– Example shows a “1” being written into cell
1 0 a

word 0
Digital Design 2e
enable
Copyright © 2010 54
Frank Vahid
Static RAM (SRAM)
wdata(N-1) wdata(N-2) wdata0
32 Let A = log2 M
data word bit storage
10 enable block ,, ,,
addr d0 (aka cell )
1024x32 addr0 a0 word
rw RAM addr1 a1 A× M
d1

addr
decoder
en addr(A-1) a(A-1) data cell

word word
e d(M-1) enable enable
clk
rw data
en
rw to all cells
SRAM cell
• “Static” RAM cell rdata(N-1) rdata(N-2) rdata0

data data’
– Reading this cell 1 1
• Somewhat trickier d
• When rw set to read, the RAM logic sets
both data and data’ to 1 1 0

• The stored bit d will pull either the left line or a

the right bit down slightly below 1 1 1 <1


word
• “Sense amplifiers” detect which side is enable
slightly pulled down To sense amplifiers

– The electrical description of SRAM is really


beyond our scope – just general idea here,
mainly to contrast with DRAM...
Digital Design 2e
Copyright © 2010 55
Frank Vahid
Dynamic RAM (DRAM)
wdata(N-1) wdata(N-2) wdata0
32 Let A = log2 M
data word bit storage
10 enable block ,, ,,
addr d0 (aka cell )
1024x32 addr0 a0 word
rw RAM addr1 a1 A× M
d1

addr
decoder
en addr(A-1) a(A-1) data cell

word word
e d(M-1) enable enable
clk
en
rw to all cells
rw data
DRAM cell
• “Dynamic” RAM cell rdata(N-1) rdata(N-2) rdata0 data

cell
– 1 transistor (rather than 6)
word
– Relies on large capacitor to store bit enable
d
capacitor
• Write: Transistor conducts, data voltage slowly
level gets stored on top plate of capacitor discharging

• Read: Just look at value of d (a)


• Problem: Capacitor discharges over time
data
– Must “refresh” regularly, by reading d and
enable
then writing it right back
discharges
d
(b)
Digital Design 2e
Copyright © 2010 56
Frank Vahid
Comparing Memory Types
• Register file MxN Memory
– Fastest implemented as a:

– But biggest size register


file
• SRAM
– Fast SRAM
– More compact than register file DRAM
• DRAM
– Slowest
• And refreshing takes time
Size comparison for same
– But very compact
number of bits (not to scale)
• Use register file for small items,
SRAM for large items, and DRAM
for huge items
– Note: DRAM’s big capacitor requires
a special chip design process, so
DRAM is often a separate chip
Digital Design 2e
Copyright © 2010 57
Frank Vahid
Reading and Writing a RAM
clk clk
1 2 3
addr 9 13 9 addr valid setup
time
data 500 999 Z 500 data valid hold Z 500
time
rw 1 means write setup
rw
time
en access
RAM[9] RAM[13] time
now equals 500 now equals 999
• Writing (b)
– Put address on addr lines, data on data lines, set rw=1, en=1
• Reading
– Set addr and en lines, but put nothing (Z) on data lines, set rw=0
– Data will appear on data lines
• Don’t forget to obey setup and hold times
– In short – keep inputs stable before and after a clock edge
Digital Design 2e
Copyright © 2010 58
Frank Vahid
RAM Example: Digital Sound Recorder
4096x16
RAM

addr
data

en
rw
wire 16
analog-to- digital-to-
digital 12 analog
ad_buf Ra RrwRen wire
microphone converter converter
ad_ld processor da_ld

speaker
• Behavior
– Record: Digitize sound, store as series of 4096 12-bit digital values in RAM
• We’ll use a 4096x16 RAM (12-bit wide RAM not common)
– Play back later
– Common behavior in telephone answering machine, toys, voice recorders
• To record, processor should read a-to-d, store read values into successive
RAM words
– To play, processor should read successive RAM words and enable d-to-a

Digital Design 2e
Copyright © 2010 59
Frank Vahid
RAM Example: Digital Sound Recorder
4096x16
• RTL design of processor RAM

– Create HLSM
16
– Begin with the record behavior analog-to-
12
digital-to-
digital ad_buf Ra Rrw Ren analog
– Create local storage a converter converter
ad_ld processor da_ld
• Stores current address,
ranges from 0 to 4095 (thus
need 12 bits) Record behavior
Local register: a, Rareg (12 bits)
– Create state machine that
a<4095
counts from 0 to 4095 using a S T
• For each a a:=0 ad_ld:=‘1’ a
ad_buf:=‘1’
– Read analog-to-digital conv. Rareg:=a U
» ad_ld:=‘1’, ad_buf:=‘1’ Rrw:=‘1’ a:=a+1
Ren:=‘1’
– Write to RAM at address a
(a<4095)’
» Rareg:=a, Rrw:=‘1’,
Ren:=‘1’
Digital Design 2e
Copyright © 2010 60
Frank Vahid
RAM Example: Digital Sound Recorder
4096x16
– Now create play behavior RAM data bus
– Use local register a again,
create state machine that 16
counts from 0 to 4095 again analog-to-
digital 12
digital-to-
analog
ad_buf Ra Rrw Ren
• For each a converter converter
ad_ld processor da_ld
– Read RAM
– Write to digital-to-analog conv.
• Note: Must write d-to-a one
Play behavior
cycle after reading RAM, when
Local register: a,Rareg (12 bits)
the read data is available on
the data bus a<4095
V W
– The record and play state a:=‘0’
a
ad_buf:=‘0’
machines would be parts of a Rareg:=a
X
larger state machine controlled Rrw=‘0’
Ren=‘1’
by signals that determine when da_ld:=‘1’
a:=a+1
to record or play (a<4095)’

Digital Design 2e
Copyright © 2010 61
Frank Vahid
Read-Only Memory – ROM
• Memory that can only be read from, not 32
data
written to 10
addr
1024× 32
– Data lines are output only rw RAM
– No need for rw input en

• Advantages over RAM


– Compact: May be smaller RAM block symbol

– Nonvolatile: Saves bits even if power supply


is turned off 32
– Speed: May be faster (especially than data
10
DRAM) addr 1024x32
ROM
– Low power: Doesn’t need power supply to
en
save bits, so can extend battery life
• Choose ROM over RAM if stored data won’t ROM block symbol
change (or won’t change often)
– For example, a table of Celsius to Fahrenheit
conversions in a digital thermometer

Digital Design 2e
Copyright © 2010 62
Frank Vahid
Read-Only Memory – ROM
32
data
10 1024x32
addr Let A = log2M
ROM
en
word bit storage
enable block
ROM block symbol d0 (aka “cell”)
addr0 a0 word
addr addr1 a1 AxM
d1
decoder
addr(A-1) a(A-1) data
word word
e d(M-1) enable enable
clk
en data

rdata(N-1) rdata(N-2) rdata0 ROM cell

• Internal logical structure similar to RAM, without the data


input lines

Digital Design 2e
Copyright © 2010 63
Frank Vahid
ROM Types
• If a ROM can only be read, how Let A = log2 M
word bit storage

are the stored bits stored in the


enable block
,, ,,
d0 (a cell )
addr0 a0 word
addr1 a1 A × M

first place?
d1

addr
decoder
data
addr(A-1) a(A-1) cell
word word
e d(M-1) enable enable

– Storing bits in a ROM known as en


da

programming
data(N-1) data(N-2) data0

– Several methods
• Mask-programmed ROM 1 data line 0 data line

– Bits are hardwired as 0s or 1s cell cell


during chip manufacturing word
• 2-bit word on right stores “10” enable
• word enable (from decoder) simply
passes the hardwired value
through transistor
– Notice how compact, and fast, this
memory would be
Digital Design 2e
Copyright © 2010 64
Frank Vahid
ROM Types
• Fuse-Based Programmable Let A = log2 M
word
enable
bit storage
block
,, ,,

ROM
d0 (a cell )
addr0 a0 word
addr1 a1 A × M
d1

addr
decoder
data
addr(A-1) a(A-1)

– Each cell has a fuse


cell
word word
e d(M-1) enable enable
da
en

– A special device, known as a data(N-1) data(N-2) data0

programmer, blows certain fuses


(using higher-than-normal voltage)
1 data line 1 data line
• Those cells will be read as 0s
(involving some special electronics) cell cell
• Cells with unblown fuses will be read word
a

as 1s enable

• 2-bit word on right stores “10”


fuse blown fuse
– Also known as One-Time
Programmable (OTP) ROM

Digital Design 2e
Copyright © 2010 65
Frank Vahid
ROM Types
• Erasable Programmable ROM Let A = log2 M
word bit storage

(EPROM)
enable block
,, ,,
d0 (a cell )
addr0 a0 word
addr1 a1 A × M
d1

addr
– Uses “floating-gate transistor” in each cell
decoder
data
addr(A-1) a(A-1) cell
word word
e d(M-1) enable enable

– Special programmer device uses higher- en


da

than-normal voltage to cause electrons to data(N-1) data(N-2) data0

tunnel into the gate

floating-gate
• Electrons become trapped in the gate data line data line

transistor
• Only done for cells that should store 0 cell cell
• Other cells (without electrons trapped in 1 10 a

gate) will be 1 word e- e-


– 2-bit word on right stores “10” enable
• Details beyond our scope – just general trapped electrons
idea is necessary here
– To erase, shine ultraviolet light onto chip
• Gives trapped electrons energy to escape
• Requires chip package to have window

Digital Design 2e
Copyright © 2010 66
Frank Vahid
ROM Types
• Electronically-Erasable Programmable ROM
(EEPROM)
– Similar to EPROM
• Uses floating-gate transistor, electronic programming to
trap electrons in certain cells
– But erasing done electronically, not using UV light
– Erasing done one word at a time
• Flash memory
– Like EEPROM, but all words (or large blocks of
words) can be erased simultaneously 32
data
– Became very common starting in late 1990s 10
addr
• Both types are in-system programmable
en 1024x32
– Can be programmed with new stored bits while in the EEPROM
system in which the ROM operates write
• Requires bi-directional data lines, and write control input busy
• Also need busy output to indicate that erasing is in
progress – erasing takes some time
Digital Design 2e
Copyright © 2010 67
Frank Vahid
ROM Example: Talking Doll
“Hello there!” audio
divided into 4096 speaker

“Hello there!”
4096x16 ROM
samples, stored
in ROM “Hello there!”

16 a
digital-to-
analog vibration
Ra Ren converter sensor

da_ld
processor

v
• Doll plays prerecorded message, triggered by vibration
– Message must be stored without power supply  Use a ROM, not a RAM, because
ROM is nonvolatile
• And because message will never change, may use a mask-programmed ROM or OTP ROM
– Processor should wait for vibration (v=1), then read words 0 to 4095 from the ROM,
writing each to the d-to-a

Digital Design 2e
Copyright © 2010 68
Frank Vahid
ROM Example: Talking Doll
Local register: a, Rareg (12 bits)
4096x16 ROM
v a<4095
a:=‘0’ S T

16 R areg:=a
digital-to- Ren:=‘1’
analog U a

Ra Ren converter v’
da_ld:=‘1’
da_ld (a<4095)’ a:=a+1
processor
v

• HLSM
– Create state machine that waits for v=1, and then counts from 0 to
4095 using a local storage a
– For each a, read ROM, write to digital-to-analog converter

Digital Design 2e
Copyright © 2010 69
Frank Vahid
ROM Example: Digital Telephone Answering Machine Using a Flash Memory
• Want to record the outgoing
announcement 4096x16 Flash
– When rec=1, record digitized “We’re not home.”

erase

busy
addr
data

rw
sound in locations 0 to 4095

en
– When play=1, play those
stored sounds to digital-to- 16
analog converter analog-to-
digital 12 digital-to-
ad_buf Ra Rrw Ren er bu
• What type of memory? converter analog
– Should store without power ad_ld processor converter
da_ld
supply – ROM, not RAM
– Should be in-system rec
programmable – EEPROM record play
or Flash, not EPROM, OTP
microphone speaker
ROM, or mask-programmed
ROM
– Will always erase entire
memory when
reprogramming – Flash
better than EEPROM

Digital Design 2e
Copyright © 2010 70
Frank Vahid
ROM Example: Digital Telephone Answering Machine Using a Flash Memory
• HLSM 4096x16 Flash

– Once rec=1, begin


erasing flash by setting
16
er=1 analog-to-
digital 12 digital-to-
ad_buf Ra Rrw Ren er bu
– Wait for flash to finish converter
ad_ld processor
analog
converter
erasing by waiting for da_ld

bu=0 record
rec
play

– Execute loop that sets microphone speaker

local register a from 0 to


4095, reading analog-to- Local register: a, Rareg (13 bits)
bu
digital converter and a<4096 a
writing to flash for each a S T bu’ U
a:=‘0’ er:=‘0’ ad_ld:=‘1’
er:=‘1’ ad_buf:=‘1’
Rareg:=a V
rec
Rrw:=‘1’
Ren:=‘1’
a:=a+1 (a<4096)’

Digital Design 2e
Copyright © 2010 71
Frank Vahid
Blurring of Distinction Between ROM and RAM
• We said that
– RAM is readable and writable ROM Flash RAM
a
EEPROM NVRAM
– ROM is read-only
• But some ROMs act almost like RAMs
– EEPROM and Flash are in-system programmable
• Essentially means that writes are slow
– Also, number of writes may be limited (perhaps a few million times)
• And, some RAMs act almost like ROMs
– Non-volatile RAMs: Can save their data without the power supply
• One type: Built-in battery, may work for up to 10 years
• Another type: Includes ROM backup for RAM – controller writes RAM contents to
ROM before turning off
• New memory technologies evolving that merge RAM and ROM benefits
– e.g., MRAM
• Bottom line
– Lot of choices available to designer, must find best fit with design goals
Digital Design 2e
Copyright © 2010 72
Frank Vahid
5.8

Queues (FIFOs)
• A queue is another component
sometimes used during RTL
back front
design
• Queue: A list written to at the
back, from read from the front
– Like a list of waiting restaurant
customers write items read (and
to the back remove) items
• Writing called a push, reading of the queue from front of
called a pop the queue

• Because first item written into a


queue will be the first item read
out, also called a FIFO (first-in-
first-out)

Digital Design 2e
Copyright © 2010 73
Frank Vahid
Queues
7 6 5 4 3 2 1 0
• Queue has addresses, and two
pointers: rear and front
– Initially both point to 0
• Push (write) rf
7 6 5 4 3 2 1 0
– Item written to address pointed to
by rear A
A a
– rear incremented
• Pop (read) r f
– Item read from address pointed to 7 6 5 4 3 2 1 0
by front
– front incremented B B A a

• If front or rear reaches 7, next


r f
(incremented) value should be 0 7 6 5 4 3 2 1 0
(for a queue with addresses 0 to a

7) B A

r f
Digital Design 2e
Copyright © 2010 74
Frank Vahid
Queues
• Treat memory as a circle 7 6 5 4 3 2 1 0
– If front or rear reaches 7, next (incremented)
value should be 0 rather than 8 (for a queue B A
with addresses 0 to 7)
• Two conditions of interest r f

– Full queue – no room for more items 0


• In 8-entry queue, means 8 items present
1 7
• No further pushes allowed until a pop occurs a

• Causes front=rear B
– Empty queue – no items
f
• No pops allowed until a push occurs
2 r 6
• Causes front=rear
r

– Both conditions have front=rear


• To detect whether front=rear means full or
empty, need state machine that detects if
previous operation was push or pop, sets full 3 5
or empty output signal (respectively) 4

Digital Design 2e
Copyright © 2010 75
Frank Vahid
Queue Implementation
• Can use register file for
8x16 register file
item storage wdata 16
wdata rdata
16 rdata

• Implement rear and front 3 3


waddr raddr
using up counters wr rd
– rear used as register file’s
write address, front as read wr clr
clr
address inc
inc
• Simple controller would rd 3-bit 3-bit

Controller
up counter up counter
set control lines for rear front
reset
pushes and pops, and
also detect full and empty eq
=
full
situations
– FSM for controller not empty
8-word 16-bit queue
shown
Digital Design 2e
Copyright © 2010 76
Frank Vahid
Common Uses of a Queue
• Computer keyboard
– Pushes pressed keys onto queue, meanwhile pops and sends to
computer
• Digital video recorder
– Pushes captured frames, meanwhile pops frames, compresses
them, and stores them
• Computer network routers
– Pushes incoming packets onto queue, meanwhile pops packets,
processes destination information, and forwards each packet out
over appropriate port

Digital Design 2e
Copyright © 2010 77
Frank Vahid
Queue Usage Example
7 6 5 4 3 2 1 0

• Example series of pushes initially


I empty
and pops queue

rf
– Note how rear and front 7 6 5 4 3 2 1 0
pointers move 1. Aft er pushing 3 2 7 5 8 5 9
– Note that popping doesn’t 9, 5, 8, 5, 7, 2, 3

really remove the data from the r f


7 6 5 4 3 2 1 0
queue, but that data is no
longer accessible 2. Aft er popping 3 2 7 5 8 5 9 data:
9 a

– Note how rear (and front) r f


wraps around from address 7 7 6 5 4 3 2 1 0

to 0 6 3 2 7 5 8 5 9
3. Aft er pushing 6
• Note: pushing a full queue is
f r
an error 7 6 5 4 3 2 1 0

– So is popping an empty queue 4. Aft er pushing 3 6 3 2 7 5 8 5 3 full

rf
Digital Design 2e
5. Aft er pushing 4 ERROR! Pushing a full queue
Copyright © 2010 78
Frank Vahid r esults in unknown state.
5.9

Multiple Processors
• Using multiple processors
can ease design from ButtonDebouncer
– Keeps distinct behaviors button
Bin
B
L
Bout
separate to laser
Laser-based
– Ex: Laser-based distance distance
D 16 measurer S
measurer with button
debounce to display from sensor

• Use two processors


– Ex: Code detector with
button press synchronizers
(BPS) Start
si BPS s
• BPS processor for each u
input, plus CodeDetector ri BPS r Door
Red Code
gi g lock
processor Green BPS detector
bi b
Blue BPS
ai a
BPS

Digital Design 2e
Copyright © 2010 79
Frank Vahid
Interfacing Multiple Processors
• Use signal, register, or other component outside processors
– Known as global
• Common methods use global...
– control signal, data signal, register, register file, queue
• Typically all multiple processors and clocked globals use
same clock
– Synchronized

Digital Design 2e
Copyright © 2010 80
Frank Vahid
Ex: Temperature Statistics with Multiple Processors
• 16-bit unsigned input T from temperature sensor, 16-bit output A. Sample T
every 1 second. Compute output A every minute, should equal average of most
recent 64 samples.
• Single HLSM: Complicated
• Instead, two HLSMs (and hence two processors) and shared register file
– Tsample HLSM: Store T into successive RF address, once per sec.
– Avg HLSM: Compute and output average of all 64 RF words, once per min.
– Note that each uses distinct timer

T W_d TempStats
Keeping the T
W_d
sampling and W_a W_a R_a R_a A
a

A
averaging W_e W_e R_e R_e

behaviors Tsample TRF


RF[64](16)
separate leads to R_d
Avg
simple design R_d

Digital Design 2e
Copyright © 2010 81
Frank Vahid
Ex: Digital Camera with Mult. Processors and Queue
• Read and Compress processors (Ch 1)
– Compress may take longer, depends on picture
– Use queue, read can push additional pics (up to 8)
– Likewise, use queue between Compress and Store

Image sensor 8 8
wdata rdata
Read wr Compress Queue Store
circuit rd Memory
full Queue empty circuit [8](8) circuit
[8](8)
a

Digital Design 2e
Copyright © 2010 82
Frank Vahid
5.10

Hierarchy – A Key Design Concept


CityA CityD
• Hierarchy

Province 2

Province 3
CityF

Province 1
a
– Organization with few items at the top, with CityB
each item decomposed into other items CityE
– Common example: Country CityG
CityC
Country A
• 1 item at top (the country)
• Country item decomposed into
state/province items
• Each state/province item decomposed into
city items

Province 2

Province 3
Province 1
• Hierarchy helps us manage complexity
– To go from transistors to gates, muxes,
decoders, registers, ALUs, controllers,
datapaths, memories, queues, etc. Country A
– Imagine trying to comprehend a controller Map showing just top two levels
and datapath at the level of gates
of hierarchy

Digital Design 2e
Copyright © 2010 83
Frank Vahid
Hierarchy and Abstraction

• Abstraction
– Hierarchy often involves not just
grouping items into a new item, but also
associating higher-level behavior with
the new item, known as abstraction a7.. a0 b7.. b0
• Ex: 8-bit adder has understandable high-
8-bit adder ci
level behavior—adds two 8-bit binary
numbers
co s7.. s0
– Frees designer from having to
remember, or even understand, the
lower-level details

Digital Design 2e
Copyright © 2010 84
Frank Vahid
Hierarchy and Composing Larger Components from Smaller Versions

• A common task is to compose smaller components 4x1


into a larger one i0 i0 a
– Gates: Suppose you have plenty of 3-input AND gates, i1 i1
but need a 9-input AND gate i2 i2 d
• Can simple compose the 9-input gate from several 3-input i3 i3
gates
– Muxes: Suppose you have 4x1 and 2x1 muxes, but 2x1
s1 s0 i0
need an 8x1 mux
d
• s2 selects either top or bottom 4x1 i1
4x1
• s1s0 select particular 4x1 input i4 i0 s0
• Implements 8x1 mux – 8 data inputs, 3 selects, one output
i5 i1
i6 i2 d
i3
Pr
o
vin
ec1
s1 s0
0
s1 s0 s2 1

Digital Design 2e
Copyright © 2010 85
Frank Vahid
Hierarchy and Composing Larger Components from Smaller Versions

• Composing memory very common


• Making memory words wider
– Easy – just place memories side-by-side until desired width obtained
– Share address/control lines, concatenate data lines
– Example: Compose 1024x8 ROMs into 1024x32 ROM
10
addr addr addr addr
addr

1024x8 1024x8 1024x8 1024x8


ROM ROM ROM ROM
en en en en
en

data data data data


8 8 8 8

data(31..0)

10
1024x32
ROM
data
Digital Design 2e 32
Copyright © 2010 86
Frank Vahid
Hierarchy and Composing Larger Components from Smaller Versions
11
• Creating memory with more words a9..a0
addr

addr
– Put memories on top of one another until the number 1x2 d0 1024x8
of desired words is achieved a10
i0 dcd ROM
– Use decoder to select among the memories
e d1 en data
• Can use highest order address input(s) as decoder input
• Although actually, any address line could be used 8
– Example: Compose 1024x8 memories into 2048x8

en
memory addr
1024x8
11 ROM
2048x8 en data

en addr
ROM
a10 a9 a8 a0 8
data
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 addr 8 a

a 0 0 0 0 0 0 0 0 0 1 0 1024x8
Pr ROM
0 1P
o 1o1 1 1 0
r 1 1 1 1 vin en data
a10 just chooses
0 vin
a
1 1 1 1 1 1 c1 1 1 1
which memory to To create memory with more
access 1 0 0 0 0 0 0 0 0 0 0 words and wider words, can first
1 0 0 0 0 0 0 0 0 0 1 addr
compose to enough words, then
1 0 0 0 0 0 0 0 0 1 0 1024x8
ROM
widen.
Digital Design 2e
Copyright © 2010 1 1 1 1 1 1 1 1 1 1 0 en data 87
Frank Vahid 1 1 1 1 1 1 1 1 1 1 1
Chapter Summary
– Modern digital design involves creating processor-level components
– High-level state machines
– RTL design process
• 1. Capture behavior: Use HLSM
• 2. Convert to circuit
– A. Create datapath B. Connect DP to controller C. Derive controller FSM
– More RTL design
• More components, arrays, timers, control vs. data dominated
– Determining fastest clock frequency
• By finding critical path
– Behavioral-level design – C to gates
• By using method to convert C (subset) to high-level state machine
– Memory components (RAM, ROM)
– Queues
– Multiple processors
– Hierarchy: A key concept used throughout Chapters 2-5

Digital Design 2e
Copyright © 2010 88
Frank Vahid

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy