0% found this document useful (0 votes)
29 views18 pages

L07 BS1 Motivation 2 Up

Complex digitak ckt 4

Uploaded by

ican1647174456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views18 pages

L07 BS1 Motivation 2 Up

Complex digitak ckt 4

Uploaded by

ican1647174456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Bluespec-1: Design Affects

Everything
Arvind
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology

Based on material prepared by Bluespec Inc,


January 2005

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-1

Chip costs are exploding


because of design complexity
Issues Found on First Spin ICs/ASICs
0% 10% 20% 30% 40% 50%

Functional Logic Error 43%


SoC failures Analog Tuning Issue
Signal Integrity Issue
20%
17%

costing Clock Scheme Error


Reliability Issue
14%
12%

time/spins
Mixed Signal Problem 11%
Too Much Power 11%
Has Path(s) Too Slow 10%
Has Path(s) Too Fast 10%
IR Drop Issues 7%
Firmware Error 4%
Other 3%

IC Design Costs wSource: Aart de Geus, CEO of Synopsys


30 wBased on a survey of 2000 users by Synopsys
25 Prototype
Validation
20
Design and verification
Cost ($M)

Physical

dominate escalating
15
Verification
10

5
Architecture
project costs
0
0.18µm 0.13µm 90nm
Silicon Feature Dimension wSource: IBM/IBS, Inc.

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-2

1
Common quotes
“Design is not a problem;
design is easy”
“Verification is a problem”
“Timing closure is a problem”
“Physical design is a problem”

t Almost complete reliance on post-design


se
nd verification for quality
Mi

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-3

Through the early 1980s:

The U.S. auto industry


Sought quality solely through post-build inspection
Planned for defects and rework
Make Inspect Rework
Defect
Defect
ct
fe
De

and U.S. quality was…


February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-4

2
… less than world class

Image of Chevrolet Chevette removed due to copyright restrictions.

Adding quality inspectors (“verification


engineers”) and giving them better tools, was
not the solution
The Japanese auto industry showed the way
„ “Zero defect” manufacturing

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-5

New mind set:


Design affects everything!
A good design methodology
„ Can keep up with changing specs
„ Permits architectural exploration
„ Facilitates verification and debugging
„ Eases changes for timing closure
„ Eases changes for physical design
„ Promotes reuse
⇒ It is essential to

Design for Correctness


February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-6

3
Why is traditional RTL
too low-level?

Examples with dynamic and


static constraints

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-7

Design must follow many


rules (“micro-protocols”)
Consider a FIFO (a queue)
first: examine item
at head of queue

enq: put an deq: remove an


item into the queue item from the queue

n
DATA_IN
enq

ENAB

In the hardware,
not full
RDY
deq

there are a number of requirements not empty ENAB


RDY
FIFO

for correct use n


first

DATA_OUT
not empty
RDY
February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-8

4
Requirements for correct use
Requirement 1: deq ENAB only when RDY (not empty)
Requirement 2: first DATA_OUT only when RDY (not empty)
Requirement 3: enq ENAB simultaneously with DATA_IN
Requirement 4: enq ENAB only when RDY (not full)

n
client DATA_IN

enq
ENAB
not full
RDY

deq
client ENAB FIFO
not empty
RDY
client n

first
DATA_OUT
not empty
RDY

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-9

Correct use of a shared FIFO


• Needs a multiplexer in front of each input ( )
• Needs proper control logic for the multiplexer

client 1 control

n
DATA_IN
enq

ENAB
not full
RDY
client 2
deq

ENAB FIFO
not empty
RDY
n
first

DATA_OUT
not empty
RDY

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-10

5
Concurrent uses of a FIFO
enq ENAB ok if deq ENAB, even if not RDY ??

client 1 n
DATA_IN

enq
ENAB
not full
RDY

deq
ENAB FIFO
not empty
RDY
client 2 n

first
DATA_OUT
not empty
RDY

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-11

Example from a commercially


available FIFO IP component

data_in data_out

push_req_n full
These constraints are taken
empty
pop_req_n from several paragraphs of
clk
documentation, spread over
rstn
many pages, interspersed
with other text
February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-12

6
A High-Bandwidth Credit-based
Communication Interface
Credit based interface:
I/F Control You can have X credits I/F Control
Credit = C1 Credit = C2

Module A I can send up to X items Module B

Static correctness constraints:


„ Data types agree on both ends?
„ Credit values agree (C1 == C2)?
„ Credit values automatically sized to comm latency?
„ B’s buffer properly sized (C2)?
„ B’s buffer pointers properly sized (log(C2))?

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-13

Why is Traditional RTL


low-level?
Hardware for dynamic constraints must
be designed explicitly
Design assumptions must be explicitly
verified
Design assumptions must be explicitly
maintained for future changes
If static constraints are not checked by
the compiler then they must also be
explicitly verified

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-14

7
In Bluespec SystemVerilog (BSV) …
Power to express complex static
structures and constraints
„ Checked by the compiler
“Micro-protocols” are managed by the
compiler
„ The compiler generates the necessary
hardware (muxing and control)
„ Micro-protocols need less or no verification
Easier to make changes while
preserving correctness

Î Smaller, simpler, clearer, more correct code


February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-15

Bluespec SystemVerilog (BSV)


Bluespec SystemVerilog
High-level description of FSMs
Rules, Interface Methods
Static elaboration, verification
Types, Procedures
SystemVerilog
Structure Structure
Modules, interfaces, types Modules, interfaces, types
HW semantics HW semantics
Cooperating FSMs Cooperating FSMs
+ Assertions + Assertions

Low-level description of FSMs Low-level description of FSMs


Processes, cycle counting, Processes, cycle counting,
explicit management of explicit management of
shared resources shared resources
February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-16

8
Bluespec Tool flow
Bluespec SystemVerilog source

Bluespec Compiler

Blueview C Verilog 95 RTL

Bluespec C sim Cycle


Accurate
Verilog sim RTL synthesis

VCD output gates

Legend
Debussy
files
Visualization
Bluespec tools
3rd party tools
February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-17

Bluespec: State and Rules


organized into modules
module

interface

All state (e.g., Registers, FIFOs, RAMs, ...) is explicit.


Behavior is expressed in terms of atomic actions on the state:
Rule: condition Î action
Rules can manipulate state in other modules only via their
interfaces.
February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-18

9
Programming with
rules: A simple example
Euclid’s algorithm for computing the
Greatest Common Divisor (GCD):
15 6
9 6 subtract
3 6 subtract
6 3 swap
3 3 subtract
0 answer: 3 subtract

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-19

GCD in BSV
module mkGCD (ArithIO#(int));
Reg#(int) x <- mkRegU;
Reg#(int) y <- mkReg(0); State

rule swap ((x > y) && (y != 0));


x <= y; y <= x;
Internal
endrule
behavior
rule subtract ((x <= y) && (y != 0));
y <= y – x;
endrule
method Action start(int a, int b) if (y==0);
x <= a; y <= b;
External
endmethod
interface
method int result() if (y==0);
return x;
endmethod
endmodule
February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-20

10
GCD Hardware Module
t
t

start
enab

module
y == 0 rdy

GCD
implicit
conditions t

result
rdy
y == 0

interface ArithIO #(type t);


method Action start (t a, t b);
method t result();
endinterface
Many different implementations can provide the same
interface:

module mkGCD (ArithIO#(int));


February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-21

Generated Verilog RTL: GCD


module mkGCD(CLK, RST_N,start__1, start__2, E_start_, ...)
input CLK; ...
output start__rdy; ...
wire [31 : 0] x$get; ...
assign result_ = x$get;
assign _d5 = y$get == 32'd0;
...
assign _d3 = x$get ^ 32'h80000000) <= (y$get ^ 32'h80000000);
assign C___2 = _d3 && !_d5;
...
assign x$set = E_start_ || P___1;
assign x$set_1 = P___1 ? y$get : start__1;
assign P___2 = _d3 && !_d5;
...
assign y$set_1 =
{32{P___2}} & y$get - x$get | {32{_dt1}} & x$get |
{32{_dt2}} & start__2;
RegUN #(32) i_x(.CLK(CLK), .RST_N(RST_N), .val(x$set_1), ...)
RegN #(32) i_y(.CLK(CLK), .RST_N(RST_N), .init(32'd0), ...)
endmodule
February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-22

11
Exploring microarchitectures
IP Lookup Module

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-23

IP Lookup block in a router


LC
Line Card (LC) Arbitration
Packet Processor
Control
SRAM
Processor
(lookup table)
Switch
Queue
IP Lookup
Manager

Exit functions

A packet is routed based on


the “Longest Prefix Match”
(LPM) of it’s IP address with LC
entries in a routing table
Line rate and the order of LC
arrival must be maintained line rate ⇒ 15Mpps for 10GE
February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-24

12
Sparse tree representation
0


F A A


7.14.*.* A F
14 7 3 B
5 E
7.14.7.3 B


F F A A
10.18.200.* C 7
10.18.200.5 D


F
5.*.*.* E 10


F
* F

F 18
255


F F
IP address Result M Ref


C
200
7.13.7.3 F 2 5 D


F
10.18.201.5 F 3


C
7.14.7.2 A 4 Real-world lookup algorithms
5.13.7.2 E 1 are more complex but all make
a sequence of dependent
10.18.200.7 C 4 memory references.
February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-25

SW (“C”) version of LPM


int
lpm (IPA ipa) /* 3 memory lookups */
{
int p;

p = RAM [ipa[31:16]]; /* Level 1: 16 bits */


if (isLeaf(p)) return p;

p = RAM [p + ipa [15:8]]; /* Level 2: 8 bits */


if (isLeaf(p)) return p;

p = RAM [p + ipa [7:0]]; /* Level 3: 8 bits */


return p; /* must be a leaf */
}

How to implement LPM in HW?


Not obvious from C code!
February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-26

13
Longest Prefix Match for IP lookup:
3 possible implementation architectures

Rigid pipeline Linear pipeline Circular pipeline

Inefficient memory Efficient memory Efficient memory


usage but simple usage through with most complex
design memory port control
replicator
Designer’s
Ranking:
1 2 3
Which is “best”?
Arvind, Nikhil,
February 22, 2005 Rosenbandhttp://csg.csail.mit.edu/6.884/
& Dave ICCAD 2004 L07-27

Synthesis results
LPM Code Best Area Best Speed Mem. util.
versions size (gates) (ns) (random
(lines) workload)
Static V 220 2271 3.56 63.5%

Static BSV 179 2391 (5% larger) 3.32 (7% faster) 63.5%

Linear V 410 14759 4.7 99.9%

Linear BSV 168 15910 (8% larger) 4.7 (same) 99.9%

Circular V 364 8103 3.62 99.9%

Circular BSV 257 8170 (1% larger) 3.67 (2% slower) 99.9%

Synthesis: TSMC 0.18 µm lib


- Bluespec results can match carefully coded Verilog
- Micro-architecture has a dramatic impact on performance
- Architecture differences are much more important than
language differences in determining QoR
V = Verilog;BSV
February 22, 2005 = Bluespechttp://csg.csail.mit.edu/6.884/
System Verilog L07-28

14
Implementations of the same arch -
Static pipeline: Two designers, two results
LPM versions Best Area Best Speed
(gates) (ns)
Static V (Replicated) 8898 3.60

Static V (BEST) 2271 3.56

IP addr
Replicated:
BEST: MUX
IP addr result
MUX / De-MUX
result

Each packet
is processed FSM FSM FSM FSM
by one FSM
Shared
Counter MUX / De-MUX FSM FSM

RAM
RAM
February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-29

Reorder Buffer
Verification-centric design

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-30

15
Example from CPU design
Speculative, out-of-order Register
Many, many concurrent File
activities

FIFO FIFO
ALU
Re- Unit
FIFO

FIFO
Fetch Decode Order
Buffer

FIFO FIFO
(ROB) MEM
Branch Unit

FIFO FIFO FIFO FIFO

Instruction Data
Memory Memory
Nirav 22,
February Dave,
2005MEMOCODE, 2004
http://csg.csail.mit.edu/6.884/ L07-31

ROB actions
Register Empty E
File Waiting W
Get operands Writeback
Dispatched Di
for instr results Killed K
Done Do
Re-Order Buffer
State Instruction Operand 1 Operand 2 Result
E Instr - V - V - -

E Instr - V - V - -
Head W Instr A V 0 V 0 - Get a ready
ALU instr
W Instr B V 0 V 0 -
ALU
Unit(s)
W Instr C V 0 V 0 -

Insert an Put ALU instr


Decode W Instr D V 0 V 0 -

instr into results in ROB


Unit
Tail E Instr - V - V - -
ROB E Instr - V - V - -

E Instr - V - V - -
Get a ready
E Instr - V - V - -
MEM instr
E Instr - V - V - - MEM
Resolve E Instr - V - V - - Unit(s)
branches E Instr - V - V - - Put MEM instr
E Instr - V - V - -
results in ROB
E Instr - V - V - -

E Instr - V - V - -

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-32

16
But, what about all
the potential race conditions?
Reading from the register file at the same
time a separate instruction is writing back to
the same location
„ Which value to read?
An instruction is being inserted into the ROB
simultaneously to a dependent upstream
instruction’s result coming back from an ALU
„ Put a tag or the value in the operand slot?
An instruction is being inserted into the ROB
simultaneously to A branch mis-prediction
must kill the mis-predicted instructions and
restore a “consistent state” across many
modules

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-33

Rule Atomicity
Lets you code each operation in isolation
Eliminates the nightmare of race conditions
(“inconsistent state”) under such complex
concurrency conditions
All behaviors are
Insert Instr in ROB
explainable as a
• Put instruction in first
available slot sequence of atomic
• Increment tailDispatch
pointer Instr actions on the
• Get source •operands
Mark instruction
dispatched Write Back Results to ROB
state
- RF <or> prev instr
• Forward to•appropriate
Write back results to
unit instr result
Commit Instr
• Write back to all waiting
• Write results to register
tags
file (or allow memory
Branch Resolution
• Set to done write for store)
•…
• Set to Empty
•…
• Increment head pointer
•…

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-34

17
Synthesizable model of IA64
CMU-Intel collaboration
Develop an Itanium µarch model that is
„ concise and malleable
„ executable and synthesizable
FPGA Prototyping
„ XC2V6000 FPGA interfaced to P6 memory bus
„ Executes binaries natively against a real PC
environment (i.e., memory & I/O devices)
An evaluation vehicle for:
„ Functionality and performance: a fast µarchitecture
emulator to run real software
„ Implementation: a synthesizable description to
assess feasibility, design complexity and
implementation cost
Roland Wunderlich & James Hoe @ CMU
Steve Hynal(SCL) & Shih-Lien Liu(MRL)
February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-35

IA64 in Bluespec Wunderlich & Hoe

IPF Microarchitecture Model


Pipe. Control

Branch Stack Read Execute Write

Bypass
Branch Pred.
Integer×3 Stack Read Execute Write

Fetch Decode Disperse


Stack Register Set

Instr. Cache
Memory Stack Read Execute Memory Write
Roland Wunderlich 3

FSB Control Unified L2 Data Cache

Platform Capabilities
Roland Wunderlich 5

„ High speed execution of the Bluespec model,


runs at 100 MHz, 4 orders of magnitude faster
than ModelSim

„ Full access to the FSB, allowing 800 MB/s cache


line reads and writes, plus a control channel to
The model was developed in a the Pentium III processor via mapped I/O

few months by one student! „ Large FPGA resources, the current design
occupies less than 30% of the FPGA resources

Roland Wunderlich 7

February 22, 2005 http://csg.csail.mit.edu/6.884/ L07-36

18

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy