0% found this document useful (0 votes)
34 views66 pages

Chapter 1

The document outlines a syllabus for a course on Hardware-Software Co-Design, covering topics such as the nature of hardware and software, data flow modeling, and system-on-chip concepts. It emphasizes the concurrent development of hardware and software to achieve performance goals and includes practical examples using C and assembly programming. Learning resources include textbooks and references on hardware/software co-design principles and practices.

Uploaded by

Sai Ranga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views66 pages

Chapter 1

The document outlines a syllabus for a course on Hardware-Software Co-Design, covering topics such as the nature of hardware and software, data flow modeling, and system-on-chip concepts. It emphasizes the concurrent development of hardware and software to achieve performance goals and includes practical examples using C and assembly programming. Learning resources include textbooks and references on hardware/software co-design principles and practices.

Uploaded by

Sai Ranga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Hardware-Software Co-Design

Detailed Syllabus:
The Nature of Hardware and Software: Introducing Hardware/ Software Co-design, The Quest
for Energy Efficiency, The Driving Factors in Hardware/ Software Co-design, The Dualism of
Hardware Design and Software Design.

Data Flow Modeling and Transformation: Introducing Data Flow Graphs, Analyzing
Synchronous Data Flow Graphs, Control Flow Modeling and the Limitations of Data Flow,
Transformations.

Data Flow Implementation in Software and Hardware: Software Implementation of Data


Flow, Hardware Implementation of Data Flow, Hardware/ Software Implementation of Data
Flow.

Analysis of Control Flow and Data Flow: Data and Control Edges of a C Program, Implementing
Data and Control Edges, Construction of the Control Flow Graph and Data Flow Graph.

Finite State Machine with Datapath: Cycle-Based Bit-Parallel Hardware, Hardware Modules,
Finite State Machines with Datapath, FSMD Design Example: A Median Processor.

System on Chip: The System-on-Chip Concept, Four Design Principles in SoC Architecture, SoC
Modeling in GEZEL. Applications: Trivium Crypto-Coprocessor, CORDIC Co-Processor.

1
Hardware-Software Co-Design

Learning Resources:

Text Books:
1. Patrick Schaumont, A Practical Introduction to Hardware/ Software Co-design,
Springer, 2010.
2. Ralf Niemann, Hardware/Software Co-Design for Data flow Dominated
Embedded Systems, Springer, 1998.

Reference Books:

1. Handbook of Hardware/Software Codesign (Springer Reference) by Soonhoi Ha,


Jürgen Teich, 2017, ISBN: 978-94-017-7267-9
2. Hardware/Software Co-Design: Principles and Practice, by JørgenStaunstrup,
Wayne Wolf, 1997

2
Hardware-Software Co-Design
The Nature of Hardware and Software
• What is H/S Codesign (Prof. Schaumont’s definition):

Hardware/Software Codesign is the partitioning and design of an application in terms


of fixed and flexible components

Other definitions
• HW/SW Codesign is a design methodology supporting the concurrent
development of hardware and software (co-specification, co-development and co-
verification) in order to achieve shared functionality and performance goals for a
combined system

• HW/SW Codesign means meeting system level objectives by exploiting the


cooperation of hardware and software through their concurrent design

Giovanni De Micheli and Rajesh Gupta, "Hardware/Software Co-design", IEEE


Proceedings, vol. 85, no.3, March 1997, pp. 349-365

• Codesign is the concurrent development of hardware and software

3
Hardware-Software Co-Design

Hardware
 we will model hardware by means of single-clock synchronous digital circuits
created using word-level combinational logic and flip-flops.
 Hardware is realized by word-level combinational and sequential components,
such as registers, MUXs, adders and multipliers.

3 3
d q 3 4
0
+ 1
clk 3 2
3

 Cycle-based hardware modeling is often called register-transfer-level (RTL)


modeling because the behavior of a circuit can be thought of as a sequence
of transfers between registers, with logical and arithmetic operations
performed on the signals during the transfers.

4
Hardware-Software Co-Design
Hardware

 Bear in mind that this is a very simplistic treatment of actual hardware, We ignore
advanced circuit styles including asynchronous hardware, dynamic logic, multi-
phase clocked hardware, etc.

 The cycle-based model is limited because it does not model glitches, race
conditions or events that occur within clk cycles

 However, it provides a convenient abstraction for a designer who is mapping a


behavior, e.g., an algorithm, into a set of discrete steps
5
Hardware-Software Co-Design
Software:
 Hardware/software codesign deals with hardware/software interfaces.
 The low-level construction details of software are important, because they can
directly affect the performance and implementation cost of the interface.
 Hence, the course will consider important software implementation aspects such as
the various sections of memory (global, stack, heap), the different kinds of memory
(registers, caches, RAM, and ROM).
 We will model software as single-thread sequential programs, written in C or
assembly.
 Most of the discussions in this topic will be processor-independent, and they will
assume 32-bit architectures (ARM, Microblaze) as well as 8-bit architectures (8051,
Picoblaze).
 The choice for single-thread sequential C is simply because it matches so well to the
actual execution model of a typical microprocessor.
 For example, the sequential execution of C programs corresponds to the sequential
instruction fetch-and-execute cycle of microprocessors.
6
Hardware-Software Co-Design

Software : We assume software refers to a single-thread sequential program written in C


or assembly program

Programs will be shown in the following (C example)


1 int max;
2
3 int findmax(int a[10])
4 {
5 unsigned i;
5 max = a[0];
6 for (i = 1; i < 10; i++)
7 if (a[i] > max) max = a[i];
8 }

7
Hardware-Software Co-Design

Software
ARM assembly example

start
LDR R0, =array ; Load the address of the array into R0
LDR R1, =array_length ; Load the length of the array into R1
LDR R2, [R0], #4 ; Load the first element of the array into R2
SUBS R1, R1, #1 ; Decrement array
Loop : CMP R1, #0 ; Check if we have processed all elements
BEQ done ; If R1 is 0, we're done
LDR R3, [R0], #4 ; Load the next element of the array into R3
CMP R3, R2 ; Compare the next with the current max (R2)
BLE continue_loop ; If R3 <= R2, skip the update of max
MOV R2, R3 ; If R3 > R2, update max value in R2
continue_loop : SUBS R1, R1, #1 ; Decrement the counter
BNE loop ; If there are more elements, continue the loop
done: END

8
Hardware-Software Co-Design

 The variables of C are stored in a single, shared-memory space, corresponding to


the memory attached to the microprocessor.
 There is a close connection between the storage concepts of a microprocessor
(registers, stack) and the storage types supported in C (register int, local variables).
 Furthermore, common datatypes in C (char, int) directly map onto microprocessor
variables(byte, word).
 A detailed understanding of C execution is closely related to a detailed
understanding of the microprocessor activity at a lower abstraction level.
 of course, there are many forms of software that do not fit the model of a single-
thread sequential C program.
 Multi-threaded software, for example, creates the illusion of concurrency and lets
users execute multiple programs at once.
 Other forms of software, such as object-oriented software and functional
programming, substitute the simple machine model of the microprocessor with a
more sophisticated one.

9
Hardware-Software Co-Design

 let’s compare C Program and Assembly program


 An ideal starting point when matching a C program to an assembly program is to
look for similar structures:
 loops in C will be reflected as branches in assembly;
 if-then-else statements in C will be reflected as conditional branches in assembly.
 Even if you’re unfamiliar with the assembly from a microprocessor, you can often
derive such structures easily.

10
Hardware-Software Co-Design

Hardware and Software


 The objective of this course is to discuss the combination of hardware design and
software design in all its forms.
 Hardware as well as software can be modeled using RTL programs and C programs,
respectively.
 A term model merely indicates they are not the actual implementation, but only a
representation of it.
 An RTL program is a model of a network of logic gates; a C program is a model of a
binary image of microprocessor instructions.
 Models are an essential part of the design process.
 They are a formal representation of a designers’ intent, and they are used as input
for simulation tools and implementation tools.
 In hardware/software codesign, we are working with models that are partly written
as C programs, and partly as RTL programs.

11
Hardware-Software Co-Design

 Figure shows an 8051 microcontroller and an attached coprocessor.


 The coprocessor is attached to the 8051 microcontroller through two 8-bit
ports P0 and P1.
 A C program executes on the 8051 microcontroller, and this program contains
instructions to write data to these two ports.
 When a given, predefined value appears on port P0, the coprocessor will
make a copy of the value present on port P1 into an internal register.

12
C Program

13
HDL program

14
15
Hardware-Software Co-Design
 This very simple design can be addressed using hardware/software codesign;

 it includes the design of a hardware model and the design of a C program.

 The hardware model contains the 8051 processor, the coprocessor, and the connections
between them.

 During execution, the 8051 processor will execute a software program written in C.

 C program and RTL hardware model for this design, written in the GEZEL language.

 The C driver sends three values to port P1.

 Each time, it also cycles the value on port P0 between ins hello and ins idle, which are
encoded as value 1 and 0,respectively.

 The hardware model includes both the microcontroller and the coprocessor.

 The coprocessor is on lines 1–18.

 This particular hardware model is a combination of a finite state machine (lines 10–18) and a
datapath (lines 1–8).

16
Hardware-Software Co-Design
 This FSMD is quite easy to understand.

 The datapath contains several instructions: decode and hello.

 The FSM controller selects, each clock cycle, which of those instructions to execute.

 For example, lines 14–15 shows the following control statement.

 This means: when the value of insreg is 1 and the FSM controller current state is s1, the
datapath will execute instructions hello and decode, and the FSM controller next-state is s2.

 When the value of insreg would be 0, the datapath will execute only instruction decode and
the FSM controller next-state is s1.

 The overall coprocessor behavior is like this: when the ins input changes from 0 to 1, then the
din input will be printed in the next clock cycle.

17
Hardware-Software Co-Design
 The 8051 microcontroller is captured with three ipblock (GEZEL library
modules), on lines 20–37.
 The first ipblock is an i8051system.
 It represents the 8051 microcontroller core, and it indicates the name
of the compiled C program that will execute on this core (driver.ihx on
line 22).
 The other two ipblock are two 8051 output ports (i8051systemsource),
one to model port P0, and the other to model port P1.
 Finally, the coprocessor and the 8051 ports are wired together in a
top-level module, shown in lines 39–49.
 We can now simulate the entire model, including hardware and
software, as follows.
 First, the 8051 C program is compiled to a binary executable.
 Next, the GEZEL simulator will combine the hardware model and the
8051 binary executable in a co-simulation.
18
Hardware-Software Co-Design

 The output of the simulation model is shown below.


 > sdcc driver.c
 > /opt/gezel/bin/gplatform hello.fdl
 i8051system: loading executable [driver.ihx]
 9662 Hello! You gave me 3/3
 9806 Hello! You gave me 2/2
 9950 Hello! You gave me 1/1
 Total Cycles: 10044
 You can notice that the model produces output on cycles 9662, 9806,
and 9950, while the complete C program executes in 10044 cycles.

19
Defining Hardware/Software Codesign

 Hardware/Software co-design is the design of cooperating hardware components and


software components in a single design effort

 For example, if you would design the architecture of a processor and


at the same time develop a program that could run on that processor,
then you would be using hardware/software codesign.
 However, this definition does not tell precisely what software and
hardware mean.
 In the previous example, the software was a C program, and the
hardware was an 8051 microcontroller with a coprocessor.
 In reality, there are many forms of hardware and software, and the
distinction between them can easily become blurred (not clear).

20
Defining Hardware/Software Codesign

 A Field Programmable gate Array (FPGA) is a hardware circuit that can


be reconfigured to a user-specified netlist of digital gates.
 The program for an FPGA is a ‘bitstream’, and it is used to configure
the netlist topology.
 Writing ‘software’ for an FPGA really looks like hardware development
– even though it is software.
 A soft-core is a processor implemented in the bitstream of an FPGA.
 However, the soft-core itself can execute a C program as well. Thus,
software can execute on top of other ‘software’.

21
Defining Hardware/Software Codesign
 A Digital-Signal Processor (DSP) is a processor with a specialized
instruction set, optimized for signal-processing applications.
 Writing efficient programs for a DSP requires detailed knowledge of
these specialized instructions.
 Very often, this means writing assembly code, or making use of a
specialized software library.
 Hence, there is a strong connection between the efficiency of the
software and the capabilities of the hardware.

 An Application-Specific Instruction-set Processor (ASIP) is a processor


with a customizable instruction set.
 The hardware of such a processor can be extended, and these
hardware extensions can be encapsulated as new instructions for the
processor.
 Thus, an ASIP designer will develop a hardware implementation for
these custom instructions and subsequently write software that uses
those instructions.
22
Defining Hardware/Software Codesign

 These examples illustrate a few of the many forms of hardware and


software that designers use today.
 A common characteristic of all these examples is that creating the
‘software’ requires intimate familiarity the ‘hardware’.
 In addition, hardware includes much more than RTL models: it also
includes specialized processor instructions, the FPGA fabric, multicore
architectures, and more.
 Let us define the application as the overall function of a design,
including hardware as well as software.
 This allows to define hardware/software codesign as follows:

Hardware/Software codesign is the partitioning and design of an


application in terms of fixed and flexible components.

23
The Quest for Energy Efficiency
 Choosing between implementing a design in hardware or
implementing it in software very difficult.
 Indeed, from a designers’ point-of-view, the easiest approach is to
write software, for example in C.
 Software is easy and flexible, software compilers are fast, there are
large amounts of source code available, and all you need to start
development is a nimble personal computer.
 Furthermore, why go through the effort of designing a hardware
architecture when there is already one available (namely, the RISC
processor)?

24
Relative Performance

 Proponents of hardware implementation will argue that performance


is a big plus of hardware over software.
 Specialized hardware architectures have a larger amount of
parallelism than software architectures.
 We can measure this as follows: Relative performance means the
amount of useful work done per clock cycle.
 Under this metric, highly parallel implementations are at an advantage
because they do many things at the same time.

25
Relative Performance
 Figure illustrates various cryptographic implementations in software
and hardware that have been proposed over the past few years.
 These are all designs proposed for embedded applications, where the
trade-off between hardware and software is crucial.
 As demonstrated by the graph, hardware crypto architectures have,
on the average, a higher relative performance compared to embedded
processors.

26
Relative Performance
 However, relative performance may not be a sufficient argument to
motivate the use of a dedicated hardware implementation.
 Consider for example a specialized Application-Specific Integrated
Circuit (ASIC) versus a high-end (workstation) processor.
 The hardware inside of the ASIC can execute many operations in
parallel, but the processor runs at a much higher clock frequency.
 Furthermore, modern processors are very effective in completing
multiple operations per clock cycle.
 As a result, an optimized software program on top of a high-end
processor may outperform a quick-and-dirty hardware design job on
an ASIC.
 Thus, the absolute performance of software may very well be higher
than the absolute performance of hardware.
 In contrast to relative performance, the absolute performance needs
to take clock frequency into account.

27
Energy Efficiency

 There is another metric which is independent from clock frequency, and which can
be applied to all architectures.

 That metric is energy-efficiency: the amount of useful work done per unit of energy.
Flexibility

28
Energy Efficiency
 Take an example of a particular encryption application (AES) for
different target platforms.
 The flexibility of these platforms varies from very high on the left to
very low on the right.
 The platforms include: Java on top of a Java Virtual machine on top of
an embedded processor;
 C on top of an embedded processor;
 optimized assembly-code on top of a Pentium-III processor;
 Verilog code on top of a Virtex-II FPGA; and
 an ASIC implementation using 0.18 micron CMOS standard cells.
 Y-axis shows the amount of gigabits that can be encrypted on each of
these platforms using a single Joule of energy.
 This shows battery-operated devices would greatly benefit using less
flexible, dedicated hardware engines
29
The Driving Factors in Hardware/Software Codesign

 Energy-efficiency and relative performance are important factors to prefer a


(fixed, parallel) hardware implementation over a (flexible, sequential)
software implementation.
 In the design of modern electronic systems, many tradeoffs have to be
made, often between conflicting objectives.
 Some factors argue for more software while other factors argue for more
hardware.

30
The Driving Factors in Hardware/Software Codesign

 There is a large overhead associated with executing software instructions in the


microprocessor implementation

 Instruction and operand fetch from memory


 Complex state machine for control of the datapath, etc.

 Also, specialized hardware architectures are usually also more efficient than software
from a relative performance perspective, i.e., amount of useful work done per clock
cycle

 Flexibility comes with a significant energy cost -- one which energy optimized
applications cannot tolerate
 Therefore, you will never find a Pentium processor in a cell phone!

31
The Driving Factors in Hardware/Software Codesign

Arguments in favor of increasing the amount of hardware (HW):


 Performance: The classic argument in favor of dedicated hardware design
has been increased performance: more work done per clock cycle.
 Increased performance is obtained by reducing the flexibility of an application
 Energy Efficiency: Almost every electronic consumer product today carries
a battery (iPod, PDA, mobile phone, Bluetooth device, ..).
 This makes these products energy-constrained.
 At the same time, such consumer appliances are used for similar
applications as traditional high-performance personal computers.
 In order to become sufficiently energy-efficient, consumer devices are
implemented using a combination of embedded software and dedicated
hardware components.
 Thus, a well-known use of hardware–software co-design is to trade
function specialization and energy-efficiency by moving (part of) the
flexible software of a design into fixed hardware.

32
The Driving Factors in Hardware/Software Codesign
Power Densities:
 Further increasing clock speed in modern high-end processors as a performance
enhancer has run-out-of-gas because of thermal limits
 This is driven a broad and fundamental shift to increase parallelism within
processor architectures
 However, at this moment, there is no dominant parallel computer architecture
that has shown to cover all applications. commercially available systems include
 Symmetric multiprocessors with shared memory
 Traditional processors tightly coupled with FPGAs as accelerator engines
 Multi-core and many-core architectures such as GPUs

 Nor is there yet any universally adopted parallel programming language, i.e.,
code must be crafted differently depending on the target parallel platform
 This forces programmers to be architecturally-aware of the target platform

33
The Driving Factors in Hardware/Software Codesign
Design Complexity:
 Today, it is common to integrate multiple microprocessors together with all related
peripherals and hardware components on a single chip.
 This approach has been touted system-on-chip (SoC). Modern SoC are extremely complex.
 The conception of such a component is impossible without a detailed planning and design
phase.
 Extensive simulations are required to test the design upfront, before committing to a costly
implementation phase.
 Since software bugs are easier to address than hardware bugs, there is a tendency to increase
the amount of software.
Design Cost:

 New chips are very expensive to design. As a result, hardware designers make chips
programmable so that these chips can be reused over multiple products or product
generations.
 The SoC is a good example of this trend.
 However, ‘programmability’ can be found in many different forms other than embedded
processors: reconfigurable systems are based on the same idea of reuse-through-
reprogramming.

34
The Driving Factors in Hardware/Software Codesign

Shrinking Design Schedules:


 Each new generation of technology tends to replace the older one more
quickly.
 In addition, each of these new technologies is exponentially more complex
than the previous generation.
 For a design engineer, this means that each new product generation brings
more work that needs to be completed in a shorter period of time.
 Shrinking design schedules require engineering teams to work on multiple
tasks at once: hardware and software are developed concurrently.
 A software development team will start software development as soon as
the characteristics of the hardware platform are established, even before an
actual hardware prototype is available.

35
The Driving Factors in Hardware/Software Codesign

Deep-Submicron Effects:
 Designing new hardware from-scratch in high-end silicon processes is
difficult due to second-order effects in the implementation.
 For example, each new generation of silicon technology has an increased
variability and a decreased reliability.
 Programmable, flexible technologies make the hardware design process
simpler, more straightforward, and easier to control.
 In addition, programmable technologies can be created to take the effects
of variations into account.

 Finding the correct balance, while weighing in all these factors, is a complex
problem Instead, we will focus on optimizing metrics related to design cost
and performance
 In particular, we will consider how adding hardware to a software
implementation increases performance while weighing in the increase in
design cost
36
The Hardware–Software Codesign Space
 The proceeding discussion makes it apparent that there are a multitude of
alternatives available for mapping an application to an architecture
 For a given application, there are many different possible solutions.
 The collection of all these implementations is called the hardware–software
codesign space.
 The following figure gives a symbolic representation of this design space and
indicates the main design activities in this design space.

37
The Hardware–Software Codesign Space

 On top is the application, and a designer will map this application


onto a platform.
 A platform is a collection of programmable components.
 Mapping an application onto a platform means writing software for
that platform, and if needed, customizing the hardware of the
platform.
 The format of the software varies according to the components of the
platform.
 For example, a RISC processor may be programmed in C, while an
FPGA could be programmed starting from a HDL program.

38
The Hardware–Software Codesign Space
Examples Micrographs of Target Platforms
Microprocessor FPGA SoC

DSP Microcontroller

39
The Hardware–Software Codesign Space
SoC Examples
Example System-on-Chip (SoC) with IP cores
Processor RF Micro
Memory RF #RF2173
Pow Amp
Transreflective Analog Devices
monochrome Maxim
#AD7873 #MAX4472
backlit display Screen digitizer Pow. Amp contrl

Hynix drivers
#HY57V641629 Motorola
SDRAM 8MB
Motorola DSP #MC1376VF
#MC68VZ328 Dig. Transceivers
Fijitsu DragonBall Proc.
#MBM29D1323
Flash 4MB Philips TCXO
#PDIUBD12
USB Interface K001 VCO

MMC-format Xilinx Maxim Universal


memory #XCR3064 #MAX3386 Connector
card slot CPLD Transceivers

FPGA Interface
Manual inputs

40
The Hardware–Software Codesign Space
Codesign Examples
Video Codec (H261)

Camera Display

Grabber VLD MSQ IDCT MCC

MSQ bus
MCC bus

Unframer Framer Pred.


DCT Filter M.E. VLC
ISD N
Line CODEC

uP+code SW Processors

HW HW Processors

41
The Hardware–Software Codesign Space

 A specification is a description of the desired application.


 A new application could be for example a novel way of encoding audio
in a more economical format than current encoding methods.
 Often, applications start out as informal ideas, uncoupled from the
actual implementation.
 Designers then write C programs to render their ideas in detail.
 The objective of the design process is to implement the application on
a target platform.
 In hardware–software codesign, we are interested in using
programmable components.
 A RISC microprocessor, a FPGA, a DSP, an ASIP, and finally an ASIC.

42
The Hardware–Software Codesign Space

Each of the above platforms presents a trade-off between flexibility and efficiency

The wedge-shape of the diagram expresses this idea:


Increasing flexibility
Increasing energy efficiency

 Flexibility refers to the versatility of the platform for implementing different


application requirements, and how easy it is to update and fix bugs

 Efficiency refers to performance (i.e. time-efficiency) or to energy efficiency

43
The Hardware–Software Codesign Space
Codesign involves the following three activities:
• Platform selection
• Application mapping
• Platform programming

We start with a specification:


 For example, a new application can be a novel way of encoding audio in a
more economical format than current encoding methods

 Designers can optionally write C programs to implement a prototype

 Very often, a specification is just a piece of English text, that leaves many
details of the application undefined

Step 1: Select a target platform


This involves choosing one or more programmable component as discussed pre-
viously, e.g., a RISC micro, an FPGA, etc.

44
The Hardware-Software Codesign Space
Step 2: Application mapping

The process of mapping an application onto a target platform involves writing C


code and/ or VHDL/verilog

Examples include:
• RISC: Software is written in C while the hardware is a processor
• FPGAs: Software is written in a hardware description language (HDL)
FPGAs can be configured to implement a soft processor, in which case, software
also needs to be written in C
• DSP: A digital signal processor is programmed using a combination of C and
assembly, which is run on a specialized processor architecture
• ASIP: Programming an ASIP is a combination of C and an HDL description
• ASIC: The application is written in a HDL which is then synthesized to a hardwired
netlist and implementation
Note: ASICs are typically non-programmable, i.e., the application and platform
are one and the same

45
HW/SW Codesign
The Hardware-Software Codesign Space

Step 3: Platform programming is the task of mapping SW onto HW


This can be done automatically, e.g., using a C compiler or an HDL synthesis
tool

 However, many platforms are not just composed of simple components, but
rather require multiple pieces of software, possibly in different programming
languages.

 For example, the platform may consist of a RISC processor and a specialized
hardware coprocessor
 Here, the software consists of C (for the RISC) as well as dedicated
coprocessor instruction-sequences (for the coprocessor).

 Therefore, the reality of platform programming is more complicated, and


automated tools and compilers are NOT always available

46
The Hardware–Software Codesign Space
 Another concept reflected in the wedge-figure is the domain-specific platform

 General-purpose platforms, such as RISC and FPGA, are able to support a


broad range of applications

 Application-specific platforms, such as the ASIC, are optimized to execute a


single application

 In the middle is a class called domain-specific platforms that are optimized


to execute a range of applications in a particular application domain
 Signal-processing, cryptography, networking, are examples of domains

 And domains can have sub-domains,


 e.g., voice-signal processing vs. video-signal processing
 Optimized platforms can be designed for each of these cases

 DSPs and ASIPs are two examples of domain-specific platforms


47
The Hardware–Software Codesign Space

The Hardware-Software Codesign Space


Difficult questions:
• How does one select a platform for a given specification (harder problem of two)
• How can one map an application onto a selected platform

The first question is harder - seasoned designers choose based on their previous expe-
rience with similar applications

The second issue is also challenging, but can be addressed in a more systematic fash-
ion using a design methodology
A design method is a systematic sequence of steps to convert a specification
into an implementation

Design methods cover many aspects of application mapping


• Optimization of memory usage
• Design performance
• Resource usage
• Precision and resolution of data types, etc.
A design method is a canned sequence of design steps, You can learn it in the context of
one design, and next apply this design knowledge in the context of another design
48
The Dualism of Hardware Design and Software Design

 A key challenge in hardware–software codesign is that a designer


needs to combine two radically different design paradigms.
 In fact, hardware and software are each other’s dual in many respects.

49
The Dualism of Hardware Design and Software Design

 Design Paradigm: Parallel vs. sequential operation


 Hardware supports parallel execution of operations, while software supports
sequential execution of operations

 The natural parallelism available in hardware enables more work to be


accomplished by adding more elements.

 In contrast, adding more operations in software increases its execution time.

 Designing requires the decomposition of a specification into low level primitives such
as gates (HW) and instructions (SW)

 Hardware designers develop solutions using spatial decomposition while soft-


ware designer use temporal decomposition

50
The Dualism of Hardware Design and Software Design
 Resource Cost:
Temporal vs. spatial decomposition
 The dualism in decomposition methods leads a similar dual resource cost.
Decomposition in space, as used by a hardware designer, means that more
gates are required for when a more complex design needs to be implemented.
 Decomposition in time, as used by a software designer, implies that a more
complex design will take more instructions to complete.
 Therefore, resource cost for hardware is circuit area while resource cost for
software is execution time
 Design Constraints:
 A hardware designer is constrained by the clock cycle period of a design.
 A software designer, on the other hand, is limited by the capabilities of the
processor instruction set and the memory space available with the processor.
 Thus, the design constraints for hardware are in terms of a time budget, while the
design constraints for software are fixed by the CPU.
 So, a hardware designer invests circuit area to maintain control over execution
time, and a software designer invests execution time for an almost constant circuit
area.

51
The Dualism of Hardware Design and Software Design
Flexibility:
 Software excels over hardware in the support of application flexibility.
 Flexibility is the ease by which the application can be modified or
adapted after the target architecture for that application is
manufactured.
 In software, flexibility is essentially free.
 In hardware on the other hand, flexibility is not trivial.
 Hardware flexibility requires that circuit elements can be easily reused
for different activities or functions in a design.

52
The Dualism of Hardware Design and Software Design
Parallelism:
 A dual of flexibility can be found in the ease with which parallel
implementations can be created.
 Parallelism is the most obvious approach to improving performance.
 For hardware, parallelism comes for free as part of the design
paradigm.
 For software, on the other hand, parallelism is a major challenge.
 If only a single processor is available, software can only implement
concurrency, which requires the use of special programming
constructs such as threads.
 When multiple processors are available, a truly parallel software
implementation can be made, but inter-processor communication and
synchronization become a challenge.

53
The Dualism of Hardware Design and Software Design
Modelling:
 In software, modeling and implementation are very close.
 Indeed, when a designer writes a C program, the compilation of that
program for the appropriate target processor will also result in the
implementation of the program.
 In hardware, the model and the implementation of a design are
distinct concepts.
 Initially, a hardware design is modeled using a HDL.
 Such a hardware description can be simulated, but it is not an
implementation of the actual circuit.
 Hardware designers use a hardware description language, and their
programs are models which are later transformed to implementation.
 Software designers use a software programming language, and their
programs are an implementation by itself.
54
The Dualism of Hardware Design and Software Design
Reuse:
 Finally, hardware and software are also quite different when it comes to
Intellectual Property Reuse or IP-reuse.
 The idea of IP-reuse is that a component of a larger circuit or a program can
be packaged, and later reused in the context of a different design.
 In software, IP-reuse has known dramatic changes in recent years due to
open source software and the proliferation of open platforms.
 When designing a complex program these days, designers will start from a
set of standard libraries that are well-documented and available on a wide
range of platforms.
 For hardware design, IP-reuse is still in its infancy.
 Hardware Designers are only starting to define standard exchange
mechanisms.
 IP-reuse of hardware has a long way to go compared to the state of reuse in
software.
55
Abstraction Levels

 Abstraction refers to the level of detail that is available in a model


 Lower levels have more detail, but are often much more complex and
difficult to manage
 Abstraction is heavily used to design hardware systems, and the
representations at different levels are very different
 A concept of abstraction is well exemplified by time-granularity in
simulations.
 There are five abstraction levels commonly used by computer
engineers for the design of electronic hardware–software systems.
I. Continuous time (lowest level):
II. Discrete-event
III. Cycle-accurate
IV. Instruction-accurate
V. Transaction-accurate
56
Continuous Time

 At the lowest abstraction level, we describe operations as continuous


actions.
 For example, electric networks can be described as systems of interacting
differential equations.
 Solving these equations leads to an estimate for voltages and currents in
these electric networks.
 This is a very detailed level, useful to analyze analog effects.
 However, this level of abstraction is not used to describe typical hardware–
software systems.

57
Discrete-event
 Here, simulators abstract node behavior into discrete, possibly
irregularly spaced, time steps called events
 Events represent the changes that occur to circuit nodes when the
inputs are changed in the test bench
 The simulator is capable of modeling actual propagation delay of the
gates, similar to what would happen in a hardware instance of the
circuit
 Discrete-event simulation is very popular for modeling hardware at
the lowest layer of abstraction in codesign
 This level of abstraction is much less compute-intensive than
continuous time but accurate enough to capture details of circuit
behavior including glitches

58
Cycle-accurate
 Single-clock synchronous hardware circuits have the important property that
all interesting things happen at regularly spaced intervals, namely at the clock
edge.
 This abstraction is important enough to merit its own abstraction level, and it
is called cycle-accurate modeling.
 A cycle-accurate model does not capture propagation delays or glitches.
 All activities that fall ‘in between’ clock edges are concentrated at the clock
edge itself.
 This level of abstraction is considered the golden reference in HW/SW
codesign

59
Instruction-accurate
 RTL models are great but may be too slow for complex systems.
 For example, your laptop has a processor that probably clocks over 1 GHz (one billion cycles).
 Assuming that you could write a C function that expresses a single clock cycle of processing,
you would have to call that function one billion times to simulate just a single second of
processing.
 Clearly, further abstraction can be useful to build leaner and faster models.
 Instruction-accurate modeling expresses activities in steps of one microprocessor instruction
(not cycle count)
 Each instruction lumps together several cycles of processing.
 Instruction-accurate simulators are used to verify complex software systems, such as
complete operating systems.

60
Transaction-accurate
 For very complex systems, even instruction-accurate models may be too slow
or require too much modeling effort
 In transaction-accurate modeling, only the interactions (transactions) that
occur between components of a system are of interest
 For example, suppose you want to model a system in which a user process is
performing hard disk operations, e.g., writing a file
 The simulator simulates commands exchanged between the disk drive and
the user application
 The sequence of instruction-level operations between two transactions can
number in the millions but the simulator instead simulates a single function
call
 Transaction-accurate models are important in the exploratory phases of a
design, before effort is spent on developing detailed models
For this course, we are interested in instruction-accurate and cycle-accurate levels

61
Concurrency and Parallelism

 Concurrency and parallelism are terms that often occur in the context of hardware-
software codesign.
 They mean very different things.
 Concurrency is the ability to execute simultaneous operations because these
operations are completely independent.
 Parallelism is the ability to execute simultaneous operations because the operations
can run on different processors or circuit elements.
 Thus, concurrency relates to an application model, while parallelism relates to the
implementation of that model.
 Hardware is always parallel.
 Software on the other hand can be sequential, concurrent, or parallel.
 Sequential and concurrent software requires a single processor
 Parallel software requires multiple processors.
 Software running on your laptop, e.g., WORD, email, etc. is concurrent
 Software running on a 65536-processor IBM Blue Gene/L is parallel
62
Concurrency and Parallelism Cont..
 A key objective of HW/SW codesign is to allow designers to leverage the
benefits of true parallelism in cases where concurrency exists in the
application
 There is a well-known Comp. Arch principle called Amdahl’s law
 The maximum speedup of any application that contains q% sequential
code is: 1 / (q/100).
 For example, if your application spends 33% of its time running
sequentially, the maximum speedup is 3
 This means that no matter how fast you make the parallel component
run, the maximum speedup you will ever be able to achieve is 3
 Thus, you see that we don’t only need to have parallel platforms, we
also need a way to write parallel programs to run on those platforms.
 Surprisingly, even algorithms that seem sequential at first can be executed (and
specified) in a parallel fashion.

63
Concurrency and Parallelism Cont..

 Consider an application that performs addition, and assume it is implemented on a


Connection Machine (from the’80s)
 The Connection Machine (CM) is a massively parallel processor, with a network of
processors, each with its own local memory
 Connection Machine: Original machine contained 65556 processors, each with 4Kbits of
local memory
 How hard is it to write programs for this machine?
 It’s possible to write individual C programs for each node, but this is really not practical
with 64K nodes!

64
Concurrency and Parallelism Cont..
 The authors of the CM, Hellis and Steele, show that it is possible to express
algorithms in a concurrent fashion so that they map neatly onto a CM
 Consider the problem of summing an array of numbers
 The array can be distributed across the CM by assigning one number to each
processor
 To take the sum, distribute the array over the CM processors so that each
processor holds one number.
 We can now take the sum over the entire array in log(n) steps (n being the
number of processors)

65
Concurrency and Parallelism Cont..
 Even through the parallel sum speeds up the computation significantly, there
remains a lot of wasted compute power

 Compute power of a smaller 8-node CM for 3 times steps is 3*8 = 24


computation time-steps of which only 7 are being used

 On the other hand, if the application requires all partial sums, i.e., the sum of the
first two, three, four, etc. numbers, then the full power of the parallel machine is
used

66

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy