0% found this document useful (0 votes)
102 views50 pages

HSE-3 Soc Chip Basics - Clear

This document discusses key considerations in chip design including cycle time, die area and cost, power consumption, reliability, and configurability. It explains how optimizing factors like cycle time, pipeline stages, and area can impact performance, power usage, and cost. The document also provides examples of how requirements like performance, cost, power constraints, or reliability may drive design tradeoffs.

Uploaded by

UdayaJasper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views50 pages

HSE-3 Soc Chip Basics - Clear

This document discusses key considerations in chip design including cycle time, die area and cost, power consumption, reliability, and configurability. It explains how optimizing factors like cycle time, pipeline stages, and area can impact performance, power usage, and cost. The document also provides examples of how requirements like performance, cost, power constraints, or reliability may drive design tradeoffs.

Uploaded by

UdayaJasper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

CHIP BASICS

TIME, AREA, POWER, RELIABILITY &


CONFIGURABILITY

Mr. A. B. Shinde
Assistant Professor,
Electronics Engineering,
PVPIT, Budhgaon, Sangli
shindesir.pvp@gmail.com
Contents…
2

• Introduction,

• Cycle Time,

• Die Area and Cost,

• Ideal and Practical Scaling,

• Power,

• Area–Time–Power Trade-Offs in
Processor Design,

• Reliability,

• Configurability
Introduction
3
• The trade-off (balance achieved between two desirables but incompatible features)
between cost and performance is fundamental to any system design.

• The Semiconductor Industry Association (SIA) regularly makes


projections, called the SIA road map, of technology advances.
• Advances in lithography, makes the transistors smaller.
• The minimum width of the transistor gates is defined by the process
technology.

Table refers to process technology generations in terms of nanometers; older


generations are referred to in terms of microns ( μ m).
Design Trade - Offs
4
• In making basic design trade-offs, we have five different considerations.

1. First is Time: Which includes partitioning instructions into events or


cycles, basic pipelining mechanisms used in speeding up the
instruction execution
2. Second, is Area: The cost or area occupied by a particular feature is
another important aspect of the architectural trade-off.
3. Third, Power Consumption: It affects both performance and
implementation. Instruction sets that require more implementation area
are less valuable than instruction sets that use less area.
4. Fourth, Reliability: Comes into play to cope (deal) with deep
submicron effects.
5. Fifth, Configurability: Provides an additional opportunity for designers
to trade-off recurring and nonrecurring design costs.
Design Trade - Offs
5
• In terms of complexity, various trade - offs are possible.
• For instance, area can be traded off for performance.
• Very large scale integration (VLSI) complexity theory have shown that
bound exists for processor designs.
• It is also possible to trade-off time T for power P.
• Figure shows the possible trade-off involving area, time, and power in a
processor design.

Processor design trade - offs


Requirements and Specifications
6
• The five basic SOC trade - offs provide a framework for analyzing
SOC requirements so that these can be translated into specifications.

• Cost requirements coupled with market size can be translated into die
cost and process technology.

• Requirements for wearable and weight put limit bounds on power or


energy consumption.

• Limitations on clock frequency, can affect heat dissipation.

• Any one of the trade - off criteria for a particular design, have the highest
priority.
Requirements and Specifications
7

• Consider some examples:


• High - performance systems will optimize time at the expense of cost
and power.
• Low - cost systems will optimize die cost, reconfigurability, and design
reuse.
• Wearable systems stress low power (since, the power supply determines
the system weight). e.g. cell phones.
• Embedded systems in planes and other safety - critical applications would
stress reliability, with performance and design lifetime being important
secondary considerations.
• Gaming systems would stress cost (specially production cost,
secondarily, performance).
Cycle Time
8
• The time receives considerable attention from processor designers.

• It is the basic measure of performance;


however, breaking actions into cycles and reducing both cycle count and
cycle times are important but not preferable.

• The way in which actions are partitioned into cycles is important.

• A common problem is having unanticipated “extra” cycles required


by a basic action such as a cache miss.
Cycle Time
9
• Defining a Cycle:
• A cycle (of the clock) is the basic time unit for processing information.
• In a synchronous systems, the clock rate is a fixed value and the
cycle time is determined by finding the maximum time to accomplish
a frequent operation in the machine, such as an add or register data
transfer.
• Cycle time must be sufficient for data to be stored into a specified
destination register.

Possible sequence of actions within a cycle


Cycle Time
10
• A cycle begins when the instruction decoder specifies the values
for the registers in the system.

• These control values connect the output of a specified register to


another register or an adder or similar object.

• This allows data from source registers to propagate through


designated combinatorial logic into the destination register.

• Finally, after a suitable setup time, all registers are sampled by an


edge or pulse produced by the clocking system.
Cycle Time
11
• In a synchronous system:

• The cycle time is determined by the sum of the worst - case time for
each step or action within the cycle.

• However, the clock itself may not arrive at the anticipated time (due
to propagation or loading effects).

• We call the maximum deviation from the expected time of clock arrival
the (uncontrolled) clock skew.
Cycle Time
12
• In an asynchronous system:

• The cycle time is simply determined by the completion of an event


or operation.

• A completion signal is generated, which then allows the next


operation to begin.

• Asynchronous design is generally not used within pipelined


processors because of the pipeline timing constraints.
Cycle Time
13
• Optimum Pipeline:

• At one time, the concept of pipelining in a processor was treated as


an advanced processor design technique.

• From several decades, pipelining has been an integral part of any


processor or controller design.

• The trade - off between cycle time and number of pipeline stages is
treated in the section on optimum pipeline.
Cycle Time
14
• Optimum Pipeline:

• A basic optimization for the pipeline processor designer is the


partitioning of the pipeline into concurrently operating segments.

• A large number of segments allow a maximum speedup.


However, each new segment carries clocking overhead with it, which
can adversely affect performance.

• If we ignore the problem of fitting actions into an integer number of


cycles, we can derive an optimal cycle time, Δt, and
hence the level of segmentation for a simple pipelined processor.
Cycle Time
15

Optimal pipelining.

• (a) Unclocked instruction execution time, T .


• (b) T is partitioned into S segments. Each segment requires C clocking
overhead.
• (c) Clocking overhead and its effect on cycle time, T / S .
• (d) Effect of a pipeline disruption (or a stall in the pipeline).
Cycle Time
16
• Optimum Pipeline:

• Total time required to execute an instruction without pipeline segments is


T nanoseconds.
• Here, we need to find the optimum number of segments S to allow
clocking and pipelining.
• The ideal delay through a segment is Tseg.
Tseg = T/S =
Partitioning overhead is associated with each segment.
• This clock overhead time C (nS), includes clock skew, setup & hold
times of register.
• Now, the actual cycle time (Figure c) of the pipelined processor is the
ideal cycle time T / S + overhead:
Cycle Time
17
• Optimum Pipeline:

• In Ideal pipelined processor, there will not be any delays, but certain
delays can occur due to unexpected branches.

• Suppose, such delays (interruptions) occur with frequency b and have


the effect of invalidating the (S − 1) instructions prepared to enter, or
already in the pipeline (figure d)

• The performance of the processor is:


Cycle Time
18
• Optimum Pipeline:
• The throughput ( G ) can be calculated as

If we find the S for which

we can find Sopt, the optimum number of pipeline segments


Cycle Time
19
• Optimum Pipeline:

• The total instruction execution latency ( Tinstr ) is

We can compute the throughput performance G in mips.

Suppose T = 12.0 ns and b = 0.2, C = 0.5 ns.


Then, Sopt = 10 stages.

Determining Sopt can serve as:


A design starting point or
As an important check on an optimized design.
Die Area and Cost
20
• Cycle time, machine organization, and memory configuration determine
machine performance.

• Determining performance is relatively straightforward when compared to


the determination of overall cost.

• A good design achieves an optimum cost – performance trade - off at a


particular target performance. This determines the quality of a processor
design.
Die Area and Cost
21
• Processor Area:

• SOCs usually have die sizes of about


10 – 15 mm.

• This die is produced in bulk from a


larger wafer, 30 cm in diameter.

• Unfortunately, neither the silicon wafers


nor processing technologies are
perfect.

• Defects randomly occur over the


wafer surface.
Die Area and Cost
22
• Processor Area:

• Large chip areas require an


absence of defects over that area.

• If chips are too large for a


particular processing technology,
there will be little or no yield
(good chips produced in a manufacturing
process).

• Figure illustrates yield versus chip


area.
Die Area and Cost
23
• Processor Area:

• Example:
Find the die yield for dies that are 1.5 cm on a side and 1.0 cm on a
side, assuming a defect density of 0.4 per cm 2 and α is 4.
• Answer:
The total die areas are 2.25 cm 2 and 1.00 cm 2 . For the larger die, the
yield is

That is, less than half of all the large die are good but more than two-
thirds of the small die are good.
Die Area and Cost
24
• Processor Area:

Number of die (of area A ) on


a wafer of diameter d .
Die Area and Cost
25
• Processor Area:
• Suppose a die with square aspect ratio has area A. About N of these
dice can be realized in a wafer of diameter d:

• Now suppose there are NG good chips and ND point defects on the
wafer.

• Even if ND > N , we can expect several good chips since the defects are
randomly distributed and several defects would cluster on defective
chips, sparing a few goodones.
Die Area and Cost
26
• Processor Area:
• Suppose we add a random defect to a wafer; (NG / N) is the probability
that the defect destruct a good die.
• If the defect hits an bad die, it would cause no change to the number of
good die.
• In other words, the change in the number of good die (NG), with respect
to the change in the number of defects (ND), is

On Integrating and solving


Die Area and Cost
27
• Processor Area:
• To evaluate C, note that when NG = N , then ND = 0; so, C must be ln (N).
• Then the yield is

This describes a Poisson distribution of defects. If ρD is the defect


density per unit area, then

For large wafers d >> A, the diameter of the wafer is significantly larger
than the die side and

and

so that
Die Area and Cost
28
• Processor Area:

• Figure shows the projected


number of good die as a
function of die area for several
defect densities.

• Modern fab facility would have


ρD between 0.15 – 0.5.

• Doubling the die area has a


significant effect on yield.
Ideal and Practical Scaling
29
• As feature sizes shrink and transistors gets smaller, the transistor
density will improve.

• Similarly, transistor delay (or gate delay) should decrease linearly


with feature size.

• Practical scaling is different as wire delay, and wire density does not
scale at the same rate as transistors scale.

• Wire delay remains almost constant as feature sizes shrink.


Ideal and Practical Scaling
30

The dominance of wire


delay over gate delay.

• Figure illustrates the increasing dominance of wire delay over gate


delay.
Ideal and Practical Scaling
31
• Scaling factor of 1.5 is commonly considered more accurate.

• Major technology changes can affect scaling in a discontinuous


manner.

• The simple scaling of a design might only scale as 1.5, but a new
implementation taking advantage of all technology features could
scale at 2.
Ideal and Practical Scaling
32
• Baseline SOC Area Model:

• The key factor to design efficient system is chip floor planning.

• Each functional area of the processor must be allocated sufficient


space for its implementation.

• Functional units that frequently communicate must be placed close


together. Sufficient room must be allocated for connection paths.

• Baseline system can be used to illustrate possible trade - offs in


optimizing the chip floorplan.

• This model is based upon observations made of existing chips and


design experience
Ideal and Practical Scaling
33
• Baseline SOC Area Model:

• Starting Point: The design process


begins with an understanding of the
parameters of the semiconductor
process.
• Suppose we expect to be able to use
a manufacturing process that has a
defect density of 0.2, defect per
square centimeter; for economic
reasons, we target an initial yield of
about 95%:

where ρD = 0.2 defect per square centimeter, Y = 0.95. Then

approximately 0.25 cm2


Ideal and Practical Scaling
34
• Baseline SOC Area Model:

• So the chip area available to us is 25


mm2 .
• This is the total die area of the chip,
• but such things as pads for the wire
bonds that connect the chip to the
external world, drivers for these
connections, and power supply
lines all act to decrease the
amount of chip area available to the
designer.

• Suppose we allow 12% of the chip


area to accommodate these
functions (usually around the periphery
of the chip), then the net area will be
22 mm2
Ideal and Practical Scaling
35
• Baseline SOC Area Model:

• Feature Size: The smaller the feature size, the more logic that can
be accommodated within a fixed area.
• At feature size, f = 65 nm, we have about 5200 A or area units in 22
mm2

• The Architecture: Each system has different objectives.


• For example, assume that we need the following:
– A small 32 - bit core processor with an 8 KB I - cache and a 16 KB D -
cache;
– Two 32 - bit vector processors
– Memory; an 8 KB I - cache and a 16 KB D - cache for scalar data;
– A bus control unit;
– Directly addressed application memory of 128 KB ; and
– A shared L2 cache.
Ideal and Practical Scaling
36
• Baseline SOC Area Model:

• An Area Model: The following is a breakdown of the area required for


various units used in the system.

• Latches, Buses, and Interunit Control: For each of the functional


units, there is a certain amount of overhead to accommodate
nonspecific storage (latches), interunit communications (buses), and
interunit control.
• This is allocated as 10% overhead for latches and 40% overhead for
buses, routing, clocking, and overall control.
Ideal and Practical Scaling
37
• Baseline SOC Area Model:

• Total System Area: The designated processor elements and storage


occupy 2462 A . This leaves a net of 5200 − 2462 = 2738 A available
for cache.

• Cache Area: The net area available for cache is 2738 A .


• However, bits and pieces that may be unoccupied on the chip are not
always useful to the cache designer.
• These pieces must be collected into a reasonably compact area that
accommodates efficient cache designs.
Ideal and Practical Scaling
38
• Baseline SOC Area Model:
• An example baseline floor plan is shown in
figure.
• A summary of area design rules follow:
1. Compute the target chip size from the target
yield and defect density.
2. Compute the die cost and determine whether
it is satisfactory.
3. Compute the net available area. Allow 10 –
20% for pins, guard ring, power supplies, and
so on.
4. Determine the rbe (register bit equivalent)
size from the minimum feature size.
5. Allocate the area based on a trial system
architecture until the basic system size is
determined.
6. Subtract the basic system size (5) from the
net available area (3). This is the die area
available for cache and storage.
Power
39
• Growing demands for wireless and portable electronic appliances
have focused much attention on power consumption.

• The SIA road map points to increasingly higher power for


microprocessor chips because of their higher operating frequency,
higher overall capacitance, and larger size.

• Power scales indirectly with feature size (45 nm, 32nm 22 nm etc).
Power
40
• At the device level, total power dissipation (Ptotal) has two major
sources:
– dynamic or switching power and
– static power caused by leakage current:

Where C is the device capacitance;


V is the supply voltage;
freq is the device switching frequency; and
Ileakage is the leakage current.

Gate delays are roughly proportional to CV / (V − Vth )2 , where Vth is the


threshold voltage of the transistors.
Power
41
• As feature sizes decrease, so do device sizes.

• Smaller device sizes result in reduced capacitance.

• Decreasing the capacitance decreases both the dynamic power


consumption and the gate delays.

• As device sizes decreases, the electric field applied to them becomes


destructively large in quantity.

• To increase the device reliability, we need to reduce the supply


voltage V.
Power
42
• Reducing V effectively reduces the dynamic power consumption but
results in an increase in the gate delays.

• We can avoid this loss by reducing Vth.

• Reducing Vth increases the leakage current and hence, static power
consumption also increases.

• This has an important effect on design and production; there are two
device designs that must be accommodated in production:
1. The high - speed device with low Vth and high static power; and
2. The slower device maintaining Vth and low static power with increase
of circuit density .
Reliability
43
• The important design dimension is reliability, (dependability or fault
tolerance).

• Reliability is related to
– die area,
– clock frequency, and
– power.

• Die area increases the amount of circuitry and the probability of a fault.

• Higher clock frequencies increase electrical noise and noise sensitivity.


Reliability
44
• Faults, if detected, can be masked by
– error - correcting codes (ECCs),
– instruction retry, or
– functional reconfiguration.

• Some definitions:
1. A failure is a deviation from a design specification.
2. An error is a failure that results in an incorrect signal value
3. A fault is an error that manifests itself as an incorrect logical result.
4. A physical fault is a failure caused by the environment, such as aging,
radiation, temperature, or temperature cycling. The probability of
physical faults increases with time.
5. A design fault is a failure caused by a design implementation that is
inconsistent with the design specification.
Reliability
45
• Dealing with Manufacturing Faults:

• The traditional way of dealing with manufacturing faults is through


testing.

• As transistor density increases, the problem of testing increases even


faster.

• The testable combinations increase exponentially with transistor count.


Reliability
46
• Dealing with Manufacturing Faults:

• A technique to give testing access to interior (not accessible from the


instruction set) storage cells is called scan .

• A scan chain in its simplest form consists of a separate entry and exit
point from each storage cell.

• Scan allows predetermined data configurations to be entered into


storage, and the output of particular configurations can be compared
with known correct output configurations.
Configurability
47

Single

?
ASIC
Processor
Configurable
Computing
Temporal Spatial

• Slow • Fast
Configurability
• Flexible • Inflexible

Configurable Computing is some times also called as Reconfigurable Computing


Configurability
48

Application
Typical FPGAs
X + LUT D LUT D LUT D

Coarse-grain
X Units -
- Look UpFFT
Tables
Butterfly
- Flip Flops LUT D LUT D LUT D
- Adders, Multipliers, etc.
X X

Multiplexers and Switches


+ + LUT D LUT D LUT D
2-Stage Filter
Configurability
49
• Reconfigurable Design is used to:

• Reduce the Time: (Execution time)

• Reduce the Area: (Reuse the same area)

• Increase the reliability (Quality should not degrade over the time)
50

Thank You…

This presentation is published only for Educational Purpose

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy