0% found this document useful (0 votes)

59 views35 pages

Superscalar and VLIW Architectures

This document discusses parallel processing architectures including superscalar and VLIW. Superscalar processors exploit instruction-level parallelism by executing independent instructions simultaneously. VLIW architectures specify multiple independent operations per instruction word. Key differences between architectures include instruction size and format, semantics, registers, and memory references. VLIW moves complexity from hardware to software by relying on the compiler to explicitly specify parallelism. Exploiting instruction-level parallelism requires techniques like pipelining, register renaming, speculative execution, branch prediction, and out-of-order execution.

Uploaded by

Nusrat Mary Chowdhury

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views35 pages

Superscalar and VLIW Architectures

Uploaded by

Nusrat Mary Chowdhury

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 35

Superscalar and VLIW

Architectures
Parallel processing [2]
Processing instructions in parallel requires
three major tasks:
1. checking dependencies between
instructions to determine which
instructions can be grouped together for
parallel execution;
2. assigning instructions to the functional
units on the hardware;
3. determining when instructions are initiated
placed together into a single word.
Major categories [2]
VLIW Very Long Instruction Word
EPIC Explicitly Parallel Instruction Computing
Major categories [2]
Superscalar Processors [1]
Superscalar processors are designed to exploit
more instruction-level parallelism in user
programs.
Only independent instructions can be executed
in parallel without causing a wait state.
The amount of instruction-level parallelism
varies widely depending on the type of code
being executed.
Pipelining in Superscalar
Processors [1]
In order to fully utilise a superscalar processor
of degree m, m instructions must be executable
in parallel. This situation may not be true in all
clock cycles. In that case, some of the pipelines
may be stalling in a wait state.
In a superscalar processor, the simple
operation latency should require only one cycle,
as in the base scalar processor.
Superscalar Execution
Superscalar
Implementation
Simultaneously fetch multiple instructions
Logic to determine true dependencies
involving register values
Mechanisms to communicate these values
Mechanisms to initiate multiple instructions in
parallel
Resources for parallel execution of multiple
instructions
Mechanisms for committing process state in
correct order
Some Architectures
PowerPC 604
six independent execution units:
Branch execution unit
Load/Store unit
3 Integer units
Floating-point unit
in-order issue
register renaming
Power PC 620
provides in addition to the 604 out-of-order issue
Pentium
three independent execution units:
2 Integer units
Floating point unit
in-order issue

VLIW
Very Long Instruction Word (VLIW) architectures are used for executing more
than one basic instruction at a time.

These processors contain multiple functional units, which fetch from the
instruction cache a Very-Long Instruction Word containing several basic
instructions, and dispatch the entire VLIW for parallel execution. These
capabilities are exploited by compilers which generate code that has grouped
together independent primitive instructions executable in parallel.

VLIW has been described as a natural successor to RISC (Reduced Instruction
Set Computing), because it moves complexity from the hardware to the compiler,
allowing simpler, faster processors.

VLIW eliminates the complicated instruction scheduling and parallel dispatch
that occurs in most modern microprocessors.
WHY VLIW ?
The key to higher performance in microprocessors for a broad range of
applications is the ability to exploit fine-grain, instruction-level
parallelism.

Some methods for exploiting fine-grain parallelism include:

Pipelining
Multiple processors
Superscalar implementation
Specifying multiple independent operations per instruction

Architecture Comparison:
CISC, RISC & VLIW
ARCHITECTURE
CHARACTERISTIC

CISC RISC VLIW

INSTRUCTION SIZE

Varies One size, usually 32 bits One size

INSTRUCTION
FORMAT

Field placement varies Regular, consistent
placement of fields

Regular, consistent
placement of
Fields
INSTRUCTION
SEMANTICS

Varies from simple to
complex ; possibly many
dependent operations per
instruction

Almost always one
simple operation

Many simple,
independent
operations

REGISTERS Few, sometimes special Many, general-purpose

Many, general-purpose
Architecture Comparison:
CISC, RISC & VLIW
ARCHITECTURE
CHARACTERISTIC

CISC RISC VLIW

MEMORY REFERENCES Bundled with operations
in many different types
of instructions

Not bundled with
operations, i.e.,load/store
architecture

Not bundled with
operations,i.e., load/store
architecture

HARDWARE DESIGN
FOCUS
Exploit micro coded
implementations

Exploit
implementations
with one pipeline and &
no microcode

Exploit
Implementations
With multiple pipelines,
no microcode & no
complex dispatch logic
PICTURES OF FIVE
TYPICAL INSTRUCTIONS

Advantages of VLIW
VLIW processors rely on the compiler that generates the VLIW code to
explicitly specify parallelism. Relying on the compiler has advantages.
VLIW architecture reduces hardware complexity. VLIW simply moves
complexity from hardware into software.

What is ILP ?

Instruction-level parallelism (ILP) is a measure of how many of the
operations in a computer program can be performed simultaneously.
A system is said to embody ILP (instruction-level parallelism) is
multiple instructions runs on them at the same time.
ILP can have a significant effect on performance which is critical to
embedded systems.
ILP provides an form of power saving by slowing the clock.
What we intend to do with ILP ?
We use Micro-architectural techniques to exploit the ILP. The various techniques
include :
Instruction pipelining which depend on CPU caches.
Register renaming which refers to a technique used to avoid unnecessary.
serialization of program operations imposed by the reuse of registers by those
operations.
Speculative execution which reduce pipeline stalls due to control dependencies.
Branch prediction which is used to keep the pipeline full.
Superscalar execution in which multiple execution units are used to execute
multiple instructions in parallel.
Out of Order execution which reduces pipeline stall due to operand dependencies.
Algorithms for scheduling
Few of the Instruction scheduling algorithms used are :
List scheduling
Trace scheduling
Software pipelining (modulo scheduling)
List Scheduling
List scheduling by steps :
1. Construct a dependence graph of the basic block. (The edges are
weighted with the latency of the instruction).
2. Use the dependence graph to determine instructions that can execute;
insert on a list, called the Readylist.
3. Use the dependence graph and the Ready list to schedule an instruction
that causes the smallest possible stall; update the Ready list. Repeat

Code Representation for
List Scheduling
a = b + c
d = e - f
1. load R1, b
2. load R2, c
3. add R2,R1
4. store a, R2
5. load R3, e
6. load R4,f
7. sub R3,R4
8. store d,R3
4
3
8
7
1 2 5 6
Code Representation for
List Scheduling
1. load R1, b
5.load R3, e
2. load R2, c
6.load R4, f
3.add R2,R1
7.sub R3,R4
4.store a, R2
8. store d, R3
4
3
8
7
1 2 5 6 1. load R1, b
2. load R2, c
3. add R2,R1
4. store a, R2
5. load R3, e
6. load R4,f
7. sub R3,R4
8. store d,R3
a = b + c
d = e - f
Now we have a schedule that requires no stalls and no NOPs.
Problem and Solution
Register allocation conflict : use of same register creates
anti-Dependencies that restrict scheduling
Register allocation before scheduling
prevents good scheduling
Scheduling before register allocation
spills destroy scheduling
Solution : Schedule abstract assembly, Allocate registers, Schedule again.
Trace scheduling
Steps involved in Trace Scheduling :
Trace Selection
Find the most common trace of basic blocks.
Trace Compaction
Combine the basic blocks in the trace and schedule them as one block
Create clean-up code if the execution goes off-trace
Parallelism across IF branches vs. LOOP branches
Can provide a speedup if static prediction is accurate

How Trace Scheduling works
Look for higher priority and trace the blocks as shown below.
How Trace Scheduling works
After tracing the priority blocks you schedule it first and rest
parallel to that .
How Trace Scheduling works
We can see the blocks been traced
depending on the priority.
How Trace Scheduling works
Creating large extended basic blocks by duplication
Schedule the larger blocks
Figure above shows how the extended basic blocks can be
created.
How Trace Scheduling works
This block diagram in its final stage shows you the parallelism across the
branches.
Limitations of Trace Scheduling

Optimizations depends on the traces being the dominant paths
in the programs control-flow.
Therefore, the following two things should be true:
Programs should demonstrate the behavior of being skewed in
the branches taken at run-time, for typical mixes of input data.
We should have access to this information at compile time.
Not so easy.
Software Pipelining
In software pipelining, iterations of a loop in the source program are
continuously initiated at constant intervals, before the preceding
iterations complete thus taking advantage of the parallelism in data path.
Its also explained as scheduling the operations within an iteration,
such that the iterations can be pipelined to yield optimal throughput.
The sequence of instructions before the steady state are called
PROLOG and the ones that are in the sequence after the steady state is
called EPI LOG.

Software Pipelining Example
Source code:
for(i=0;i<n;i++) sum += a[i]

Loop body in assembly:
r1 = L r0
---;stall
r2 = Addr2,r1
r0 = addr0,4

Unroll loop & allocate registers
r1 = L r0
---;stall
r2 = Add r2,r1
r0 = Add r0,12

r4 = L r3
---;stall
r2 = Add r2,r4
r3 = add r3,12

r7 = L r6
---;stall
r2 = Add r2,r7
r6 = add r6,12

r10 = L r9
---;stall
r2 = Add r2,r10
r9 = add r9,12

Software Pipelining Example
Software Pipelining Example
Schedule Unrolled Instructions, exploiting VLIW (or not)
Identify
Repeating
Pattern
(Kernel)
EPILOG
PROLOG
Constraints in Software pipelining
Recurrence Constraints: which is determined
by loop carried data dependencies.
Resource Constraints: which is determined by
total resource requirements.
Remarks on Software Pipelining
Innermost loop, loops with larger trip count, loops without conditionals
can be software pipelined.
Code size increase due to prolog and epilog.
Code size increase due to unrolling for MVE (Modulo Variable
Expansion).
Register allocation strategies for software pipelined loops .
Loops with conditional can be software pipelined if predicated execution
is supported.
Higher resource requirement, but efficient schedule

Name: Rafi Dar: Very Large Instruction Word
No ratings yet
Name: Rafi Dar: Very Large Instruction Word
18 pages
Module3
No ratings yet
Module3
49 pages
BLUEEYES Presentation
No ratings yet
BLUEEYES Presentation
30 pages
ACA Mod2
No ratings yet
ACA Mod2
45 pages
Pipelining - Lec 2-3-4
No ratings yet
Pipelining - Lec 2-3-4
72 pages
Lec9 Multiple Issue Processors
No ratings yet
Lec9 Multiple Issue Processors
33 pages
Computer Architecture Unit 3
No ratings yet
Computer Architecture Unit 3
8 pages
Very Long Instruction Word
No ratings yet
Very Long Instruction Word
19 pages
Vliw Architecture
No ratings yet
Vliw Architecture
30 pages
Vliw Processor: Submitted By, Manjiri Phadnis. Neha Naik. Guided By, Prof. M.S. Nagmode
No ratings yet
Vliw Processor: Submitted By, Manjiri Phadnis. Neha Naik. Guided By, Prof. M.S. Nagmode
23 pages
Aca Notes
No ratings yet
Aca Notes
23 pages
Input Unit: Memory: in Processing Element (PE) or CPU: Output
No ratings yet
Input Unit: Memory: in Processing Element (PE) or CPU: Output
24 pages
VLIW Processors: Spring 2003 CSE P548 1
No ratings yet
VLIW Processors: Spring 2003 CSE P548 1
17 pages
Very Large Scale Instruction Word
No ratings yet
Very Large Scale Instruction Word
22 pages
Pipelining - Lec 3-Modified
No ratings yet
Pipelining - Lec 3-Modified
36 pages
Vliw Processors
No ratings yet
Vliw Processors
20 pages
Advanced Computer Architecture Prof Thriveni T K
No ratings yet
Advanced Computer Architecture Prof Thriveni T K
59 pages
Cs2354 Advanced Computer Architecture 2 Marks
No ratings yet
Cs2354 Advanced Computer Architecture 2 Marks
10 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
Zareen 13
No ratings yet
Zareen 13
13 pages
Superscalar Vs VLIW
No ratings yet
Superscalar Vs VLIW
30 pages
Module II
No ratings yet
Module II
60 pages
18 20210619 Computer Architecture Super Pipelined VLIW Processor Architecture
No ratings yet
18 20210619 Computer Architecture Super Pipelined VLIW Processor Architecture
15 pages
Parallel Processing: sp2016 Lec#3
No ratings yet
Parallel Processing: sp2016 Lec#3
23 pages
Chapter 04 Processors and Memory Hierarchy PDF
No ratings yet
Chapter 04 Processors and Memory Hierarchy PDF
50 pages
Chapter 04 Processors and Memory Hierarchy
75% (8)
Chapter 04 Processors and Memory Hierarchy
50 pages
Simultaneous Multithreading
No ratings yet
Simultaneous Multithreading
50 pages
A Comparative Report Between EPIC and VLIW Architecture
No ratings yet
A Comparative Report Between EPIC and VLIW Architecture
2 pages
A. Instruction-Level Parallelism: Ntroduction
No ratings yet
A. Instruction-Level Parallelism: Ntroduction
3 pages
Lecture #2
No ratings yet
Lecture #2
11 pages
Architecture PDF
No ratings yet
Architecture PDF
19 pages
VLIW Architecture
No ratings yet
VLIW Architecture
5 pages
5-Instruction Level Support For Parallel Programming-22!12!2022
No ratings yet
5-Instruction Level Support For Parallel Programming-22!12!2022
16 pages
Me FIRST
No ratings yet
Me FIRST
4 pages
Epic Vliw
No ratings yet
Epic Vliw
4 pages
System-on-Chip Design: 2ECDE54
No ratings yet
System-on-Chip Design: 2ECDE54
24 pages
CSO Computer Programming
No ratings yet
CSO Computer Programming
73 pages
VLIW Philips
No ratings yet
VLIW Philips
11 pages
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
No ratings yet
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
42 pages
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
No ratings yet
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
57 pages
Superscalar and Vliw Architectures
No ratings yet
Superscalar and Vliw Architectures
2 pages
CSE 431 Computer Architecture Fall 2005 Lecture 17: VLIW Processors
No ratings yet
CSE 431 Computer Architecture Fall 2005 Lecture 17: VLIW Processors
18 pages
Lecture12 Vliw
No ratings yet
Lecture12 Vliw
19 pages
03a ILP Superscalar VLIW
No ratings yet
03a ILP Superscalar VLIW
21 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
02b ILP Superscalar VLIW
No ratings yet
02b ILP Superscalar VLIW
20 pages
VLIW Processors Explanation
No ratings yet
VLIW Processors Explanation
2 pages
Central Processing Unit Architecture: Architecture Overview Machine Organization Speeding Up CPU Operations
No ratings yet
Central Processing Unit Architecture: Architecture Overview Machine Organization Speeding Up CPU Operations
34 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
20 pages
Mod5 1
No ratings yet
Mod5 1
18 pages
HSE-6-Soc Introduction To The System Design Approach
No ratings yet
HSE-6-Soc Introduction To The System Design Approach
69 pages
8.1.2.7 Lab - Using The Windows Calculator With Network Addresses
No ratings yet
8.1.2.7 Lab - Using The Windows Calculator With Network Addresses
7 pages
Gateway
No ratings yet
Gateway
80 pages
7.2.5.3 Packet Tracer - Configuring IPv6 Addressing Instructions PDF
No ratings yet
7.2.5.3 Packet Tracer - Configuring IPv6 Addressing Instructions PDF
3 pages
700-V Asymmetrical 4H-Sic Gate Turn-Off Thyristors (Gto'S)
No ratings yet
700-V Asymmetrical 4H-Sic Gate Turn-Off Thyristors (Gto'S)
3 pages
Byou Dissertation
No ratings yet
Byou Dissertation
177 pages
A New Current-Source Converter Using A Symmetric Gate-Commutated Thyristor (SGCT)
No ratings yet
A New Current-Source Converter Using A Symmetric Gate-Commutated Thyristor (SGCT)
8 pages
Application of High Power Thyristors in HVDC and FACTS Systems
No ratings yet
Application of High Power Thyristors in HVDC and FACTS Systems
8 pages
Studies Abroad Counselors
No ratings yet
Studies Abroad Counselors
38 pages
IELTS in The US and Beyond: A Truly Global Experience: NAFSA Region XII 2012
No ratings yet
IELTS in The US and Beyond: A Truly Global Experience: NAFSA Region XII 2012
30 pages
7 - SCR
No ratings yet
7 - SCR
11 pages
WWW Study-India Co in
No ratings yet
WWW Study-India Co in
16 pages
Computer Organization and Architecture: Instruction-Level Parallelism and Superscalar Processors
No ratings yet
Computer Organization and Architecture: Instruction-Level Parallelism and Superscalar Processors
43 pages
Success Starts With IELTS: British Council 2012
No ratings yet
Success Starts With IELTS: British Council 2012
33 pages
IELTS in The US and Beyond: A Truly Global Experience: NAFSA Region XII 2012
No ratings yet
IELTS in The US and Beyond: A Truly Global Experience: NAFSA Region XII 2012
30 pages
Subba Thesis
No ratings yet
Subba Thesis
182 pages
Ieltsreadingpreparationtips 121229130101 Phpapp02
No ratings yet
Ieltsreadingpreparationtips 121229130101 Phpapp02
29 pages
The Ielts Exam - : Reading
No ratings yet
The Ielts Exam - : Reading
11 pages
Gateway
No ratings yet
Gateway
80 pages
Subb Arao
No ratings yet
Subb Arao
191 pages
Publications Requirements 1.4
No ratings yet
Publications Requirements 1.4
11 pages
The Ielts Exam - Listening
No ratings yet
The Ielts Exam - Listening
9 pages
Computer Organization and Architecture What Does Superscalar Mean?
No ratings yet
Computer Organization and Architecture What Does Superscalar Mean?
14 pages
Analysis of The Task Superscalar Architecture Hardware Design
No ratings yet
Analysis of The Task Superscalar Architecture Hardware Design
10 pages
Multiple Issue
No ratings yet
Multiple Issue
10 pages
11.3 Pipelining: Flip Flops
No ratings yet
11.3 Pipelining: Flip Flops
1 page
Classic RISC Pipeline
No ratings yet
Classic RISC Pipeline
10 pages
Intel® Itanium™ Processor Core: Harsh Sharangpani
No ratings yet
Intel® Itanium™ Processor Core: Harsh Sharangpani
15 pages
Increasing Instruc: Microprocessors W of The Oe
No ratings yet
Increasing Instruc: Microprocessors W of The Oe
3 pages
CPU Organization Bindu Agarwalla
No ratings yet
CPU Organization Bindu Agarwalla
22 pages
Investigating Instruction Pipelining
No ratings yet
Investigating Instruction Pipelining
20 pages
Computer Organization and Architecture Assignment - 2: 1 Knreddy
No ratings yet
Computer Organization and Architecture Assignment - 2: 1 Knreddy
6 pages
The Processor: Computer Organization and Design
No ratings yet
The Processor: Computer Organization and Design
162 pages
Hyperion Core White Paper 2
No ratings yet
Hyperion Core White Paper 2
14 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
39 pages
Pipelined Processor Tamu
No ratings yet
Pipelined Processor Tamu
72 pages
Mod 3 Class 1 - Machine Dependent Assembler Features (Part 1)
No ratings yet
Mod 3 Class 1 - Machine Dependent Assembler Features (Part 1)
12 pages
Control Flow and Data Flow
No ratings yet
Control Flow and Data Flow
15 pages
Microprogramming Concepts
No ratings yet
Microprogramming Concepts
2 pages
Module 5 - Introduction-to-Pentium-Processor
No ratings yet
Module 5 - Introduction-to-Pentium-Processor
15 pages
Microprocessor Programming: by Prof. Y. P. Jadhav. Physics Dept. Smt. C.H.M. College, Ulhasnagar-3
100% (1)
Microprocessor Programming: by Prof. Y. P. Jadhav. Physics Dept. Smt. C.H.M. College, Ulhasnagar-3
104 pages
Instruction Set of 8086
No ratings yet
Instruction Set of 8086
69 pages
Stud CSA Mod 5p2 Arithmetic SuperPipeline
No ratings yet
Stud CSA Mod 5p2 Arithmetic SuperPipeline
57 pages
Android-Based Simulator To Support Tomasulo Algorithm Teaching and Learning
No ratings yet
Android-Based Simulator To Support Tomasulo Algorithm Teaching and Learning
7 pages
GPGPU Sim Tutorial
No ratings yet
GPGPU Sim Tutorial
28 pages
Test 1
No ratings yet
Test 1
51 pages
CA Lab Manual 5
No ratings yet
CA Lab Manual 5
2 pages
04 Instruction Execution Cycle
No ratings yet
04 Instruction Execution Cycle
23 pages
The Schemes and Performances of Dynamic Branch Predictors: Chih-Cheng Cheng
No ratings yet
The Schemes and Performances of Dynamic Branch Predictors: Chih-Cheng Cheng
18 pages
NASM - The Netwide Assembler
No ratings yet
NASM - The Netwide Assembler
284 pages
Presented By: Syeda Rida Fatima Taqvi
No ratings yet
Presented By: Syeda Rida Fatima Taqvi
22 pages
Addressing Modes: by V.Saritha Assistant Professor (SR) SCSE, VIT University
No ratings yet
Addressing Modes: by V.Saritha Assistant Professor (SR) SCSE, VIT University
26 pages
CompArch Studcopy4units
No ratings yet
CompArch Studcopy4units
22 pages
Branch Prediction - 1: Computer Architecture: A Constructive Approach
No ratings yet
Branch Prediction - 1: Computer Architecture: A Constructive Approach
29 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Superscalar and VLIW Architectures

Uploaded by

Superscalar and VLIW Architectures

Uploaded by

Superscalar and VLIW

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.