0% found this document useful (0 votes)

156 views20 pages

15-740/18-740 Computer Architecture Lecture 3: Performance: Carnegie Mellon University

The document discusses various instruction set architecture (ISA) level tradeoffs that affect computer performance. It covers topics like the number of registers, addressing modes, instruction formats, and whether to support features like transactional memory. One of the key messages is that while more complex ISA features can help programmers, they also complicate the hardware design. The document analyzes how different ISA decisions impact the number of instructions, number of cycles per instruction, and clock cycle time to ultimately influence overall performance.

Uploaded by

Hari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

156 views20 pages

15-740/18-740 Computer Architecture Lecture 3: Performance: Carnegie Mellon University

Uploaded by

Hari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 20

15-740/18-740

Computer Architecture
Lecture 3: Performance

Prof. Onur Mutlu

Carnegie Mellon University
Last Time …
 Some microarchitecture ideas
 Part of microarchitecture vs. ISA
 Some ISA level tradeoffs
 Semantic gap
 Simple vs. complex instructions -- RISC vs. CISC
 Instruction length
 Uniform decode
 Number of registers

2
Review: ISA-level Tradeoffs: Number of Registers
 Affects:
 Number of bits used for encoding register address
 Number of values kept in fast storage (register file)
 (uarch) Size, access time, power consumption of register file

 Large number of registers:

+ Enables better register allocation (and optimizations) by
compiler  fewer saves/restores
-- Larger instruction size
-- Larger register file size
-- (Superscalar processors) More complex dependency check
logic

3
ISA-level Tradeoffs: Addressing Modes
 Addressing mode specifies how to obtain an operand of an
instruction
 Register
 Immediate
 Memory (displacement, register indirect, indexed, absolute,
memory indirect, autoincrement, autodecrement, …)

 More modes:
+ help better support programming constructs (arrays, pointer-
based accesses)
-- make it harder for the architect to design
-- too many choices for the compiler?
 Many ways to do the same thing complicates compiler design
 Read Wulf, “Compilers and Computer Architecture”
4
x86 vs. Alpha Instruction Formats
 x86:

 Alpha:

5
x86
register
indirect

absolute

6
x86

indexed
(base +
index)

scaled
(base +
index*4)

7
Other ISA-level Tradeoffs
 Load/store vs. Memory/Memory
 Condition codes vs. condition registers vs. compare&test
 Hardware interlocks vs. software-guaranteed interlocking
 VLIW vs. single instruction
 0, 1, 2, 3 address machines
 Precise vs. imprecise exceptions
 Virtual memory vs. not
 Aligned vs. unaligned access
 Supported data types
 Software vs. hardware managed page fault handling
 Granularity of atomicity
 Cache coherence (hardware vs. software)
 …
8
Programmer vs. (Micro)architect
 Many ISA features designed to aid programmers
 But, complicate the hardware designer’s job

 Virtual memory
 vs. overlay programming
 Should the programmer be concerned about the size of code
blocks?
 Unaligned memory access
 Compile/programmer needs to align data
 Transactional memory?

9
Transactional Memory
THREAD 1 THREAD 2

enqueue (Q, v) { enqueue (Q, v) {

Node_t node = malloc(…); Node_t node = malloc(…);
node->val = v; node->val = v;
node->next = NULL; node->next = NULL;
acquire(lock); acquire(lock);
if (Q->tail) if (Q->tail)
Q->tail->next = node; Q->tail->next = node;
else else
Q->head = node; Q->head = node;
release(lock);
Q->tail = node; Q->tail
release(lock);
= node;
Q->tail
release(lock);
= node; release(lock);
Q->tail = node;
} }

begin-transaction begin-transaction
… …
enqueue (Q, v); //no locks enqueue (Q, v); //no locks
… …
end-transaction end-transaction

10
Transactional Memory
 A transaction is executed atomically: ALL or NONE

 If there is a data conflict between two transactions, only

one of them completes; the other is rolled back
 Both write to the same location
 One reads from the location another writes

11
ISA-level Tradeoff: Supporting TM
 Still under research
 Pros:
 Could make programming with threads easier
 Could improve parallel program performance vs. locks. Why?

 Cons:
 What if it does not pan out?
 All future microarchitectures might have to support the new
instructions (for backward compatibility reasons)
 Complexity?

 How does the architect decide whether or not to support

TM in the ISA? (How to evaluate the whole stack)
12
ISA-level Tradeoffs: Instruction Pointer
 Do we need an instruction pointer in the ISA?
 Yes: Control-driven, sequential execution
 An instruction is executed when the IP points to it
 IP automatically changes sequentially (except control flow
instructions)
 No: Data-driven, parallel execution
 An instruction is executed when all its operand values are
available (data flow)

 Tradeoffs: MANY high-level ones

 Ease of programming (for average programmers)?
 Ease of compilation?
 Performance: Extraction of parallelism?
 Hardware complexity?

13
The Von-Neumann Model
MEMORY
Mem Addr Reg

Mem Data Reg

PROCESSING UNIT
INPUT OUTPUT
ALU TEMP

CONTROL UNIT

IP Inst Register

14
The Von-Neumann Model
 Stored program computer (instructions in memory)
 One instruction at a time
 Sequential execution
 Unified memory
 The interpretation of a stored value depends on the control
signals

 All major ISAs today use this model

 Underneath (at uarch level), the execution model is very
different
 Multiple instructions at a time
 Out-of-order execution
 Separate instruction and data caches
15
Fundamentals of Uarch Performance Tradeoffs

Instruction Data Path Data

Supply (Functional Supply
Units)

- Zero-cycle latency - Perfect data flow - Zero-cycle latency

(no cache miss) (reg/memory dependencies)
- Infinite capacity
- No branch mispredicts - Zero-cycle interconnect
(operand communication) - Zero cost
- No fetch breaks
- Enough functional units

- Zero latency compute?

We will examine all these throughout the course (especially data supply)
16
How to Evaluate Performance Tradeoffs

time
Execution time =
program

# instructions # cycles time

= X X cycle
program instruction

Algorithm Microarchitecture
Program ISA Logic design
ISA Microarchitecture Circuit implementation
Compiler Technology

17
Improving Performance
 Reducing instructions/program

 Reducing cycles/instruction (CPI)

 Reducing time/cycle (clock period)

18
Improving Performance (Reducing Exec Time)
 Reducing instructions/program
 More efficient algorithms and programs
 Better ISA?

 Reducing cycles/instruction (CPI)

 Better microarchitecture design
 Execute multiple instructions at the same time
 Reduce latency of instructions (1-cycle vs. 100-cycle memory
access)

 Reducing time/cycle (clock period)

 Technology scaling
 Pipelining

19
Improving Performance: Semantic Gap
 Reducing instructions/program
 Complex instructions: small code size (+)
 Simple instructions: large code size (--)

 Reducing cycles/instruction (CPI)

 Complex instructions: (can) take more cycles to execute (--)
 REP MOVS
 How about ADD with condition code setting?
 Simple instructions: (can) take fewer cycles to execute (+)

 Reducing time/cycle (clock period)

 Does instruction complexity affect this?
 It depends
20

Computer Architecture
No ratings yet
Computer Architecture
667 pages
Node.js 63 Interview Questions and Answers
From Everand
Node.js 63 Interview Questions and Answers
John Edward Cooper Berg
No ratings yet
Comparch 03
No ratings yet
Comparch 03
44 pages
Comparch 2015 S 03
No ratings yet
Comparch 2015 S 03
44 pages
Computer Architecture: Ali Saeed Khan March, 2015
No ratings yet
Computer Architecture: Ali Saeed Khan March, 2015
36 pages
02.EECE 345 Computer Architecture ISA Design (4)
No ratings yet
02.EECE 345 Computer Architecture ISA Design (4)
109 pages
Architecture-and-micro
No ratings yet
Architecture-and-micro
69 pages
CAO - Processor Organization and Control Unit
No ratings yet
CAO - Processor Organization and Control Unit
120 pages
Instruction Set Architecture
No ratings yet
Instruction Set Architecture
45 pages
Slide 2
No ratings yet
Slide 2
35 pages
Prepared by Dasun Nilanjana For
No ratings yet
Prepared by Dasun Nilanjana For
24 pages
Unit 2
No ratings yet
Unit 2
7 pages
Onur 447 Spring15 Lecture17 Memoryhierarchyandcaches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture17 Memoryhierarchyandcaches Afterlecture
51 pages
The First Encounter: Authors: Nemanja Perovic, Prof. Dr. Veljko Milutinovic
No ratings yet
The First Encounter: Authors: Nemanja Perovic, Prof. Dr. Veljko Milutinovic
44 pages
Wk05 - CPU Architecture (Part 1)
No ratings yet
Wk05 - CPU Architecture (Part 1)
72 pages
week8
No ratings yet
week8
48 pages
1.2 Software and Software Development.280155520
No ratings yet
1.2 Software and Software Development.280155520
2 pages
Ca 12
No ratings yet
Ca 12
64 pages
ACA Notes
No ratings yet
ACA Notes
60 pages
ISA
No ratings yet
ISA
4 pages
2_cps310_ARC_I_ISA_Intro
No ratings yet
2_cps310_ARC_I_ISA_Intro
114 pages
Lecture 4: Reduced Instruction Set Computers (RISC) and Assembly Language
No ratings yet
Lecture 4: Reduced Instruction Set Computers (RISC) and Assembly Language
39 pages
MC Module Questions With Key Answers
No ratings yet
MC Module Questions With Key Answers
15 pages
10-isa
No ratings yet
10-isa
27 pages
Ift 212 Computer Architecture Lecture Notes 2
No ratings yet
Ift 212 Computer Architecture Lecture Notes 2
38 pages
UNIT-2 Embedded Processors: ISA Architecture Models
100% (1)
UNIT-2 Embedded Processors: ISA Architecture Models
30 pages
03 Cpu Overview
No ratings yet
03 Cpu Overview
86 pages
Computer Architecture ITC2202: Instruction Set Architecture (ISA)
No ratings yet
Computer Architecture ITC2202: Instruction Set Architecture (ISA)
20 pages
Architectures - 1: Mariagiovanna Sami
No ratings yet
Architectures - 1: Mariagiovanna Sami
27 pages
Computer Architectures And Organisation
No ratings yet
Computer Architectures And Organisation
106 pages
L2 Computer Architecture (1)_075755
No ratings yet
L2 Computer Architecture (1)_075755
12 pages
onur-447-spring15-lecture5-uarch-afterlecture
No ratings yet
onur-447-spring15-lecture5-uarch-afterlecture
80 pages
Computer System Organizations: Ms - Chit Su Mon
No ratings yet
Computer System Organizations: Ms - Chit Su Mon
74 pages
Lec02-ISA-intro
No ratings yet
Lec02-ISA-intro
38 pages
Assembly 2016 F 02
No ratings yet
Assembly 2016 F 02
45 pages
RISC Vs CISC, Harvard V/s Van Neumann
No ratings yet
RISC Vs CISC, Harvard V/s Van Neumann
35 pages
basicfunctionalunit-190124043726
No ratings yet
basicfunctionalunit-190124043726
37 pages
Lec03-ISA-intro
No ratings yet
Lec03-ISA-intro
20 pages
CA 08 (ISA)
No ratings yet
CA 08 (ISA)
37 pages
LD and CO Module 3
No ratings yet
LD and CO Module 3
74 pages
8229_90_51_RISC-CISC-ARM
No ratings yet
8229_90_51_RISC-CISC-ARM
98 pages
IAS & MIPS Rate
No ratings yet
IAS & MIPS Rate
42 pages
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
No ratings yet
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
32 pages
L04-PipeliningII
No ratings yet
L04-PipeliningII
33 pages
Week1 2 Computer Org Interconnection
No ratings yet
Week1 2 Computer Org Interconnection
53 pages
CO - Module 1 - PPTtt2
No ratings yet
CO - Module 1 - PPTtt2
87 pages
Unit Ii
No ratings yet
Unit Ii
36 pages
Introduction To Instruction Set Architecture - PPTX 20240527 211717 0000
No ratings yet
Introduction To Instruction Set Architecture - PPTX 20240527 211717 0000
10 pages
Computer Architecture ISA
No ratings yet
Computer Architecture ISA
32 pages
Computer Architecture Taxonomy
No ratings yet
Computer Architecture Taxonomy
13 pages
The stored program concept
No ratings yet
The stored program concept
11 pages
Instruction Set Architecture and Design
No ratings yet
Instruction Set Architecture and Design
27 pages
The First Encounter
50% (2)
The First Encounter
44 pages
Group 1 NCC 315
No ratings yet
Group 1 NCC 315
6 pages
Lecture11_new
No ratings yet
Lecture11_new
31 pages
6 Computer Architecture and Organization
No ratings yet
6 Computer Architecture and Organization
65 pages
Lecture 4
No ratings yet
Lecture 4
76 pages
The Complete Future Trait Guide
From Everand
The Complete Future Trait Guide
Hamze Ghalebi
No ratings yet
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
H310MHD Pro - Cpu
No ratings yet
H310MHD Pro - Cpu
6 pages
AHB Vs AXI Vs APB
75% (4)
AHB Vs AXI Vs APB
2 pages
bcs402-microcontrollers-model-question-paper-solutions-for-4th-sem-be
No ratings yet
bcs402-microcontrollers-model-question-paper-solutions-for-4th-sem-be
44 pages
EE222 Lecture 10-12
No ratings yet
EE222 Lecture 10-12
117 pages
Byte Magazine Vol 10-08
No ratings yet
Byte Magazine Vol 10-08
442 pages
btech-ee-6-sem-micro-processor-and-micro-controller-pe-ee-602-2024
No ratings yet
btech-ee-6-sem-micro-processor-and-micro-controller-pe-ee-602-2024
1 page
1 Instructions Formats
No ratings yet
1 Instructions Formats
31 pages
Embedded System (Course Outline)
No ratings yet
Embedded System (Course Outline)
3 pages
Lecture 1_Introduction to Microprocessor and computer
No ratings yet
Lecture 1_Introduction to Microprocessor and computer
25 pages
Multiple Choice Questions
50% (6)
Multiple Choice Questions
4 pages
Micro-Programmed Control
No ratings yet
Micro-Programmed Control
18 pages
Pipeline and Vector Processing
No ratings yet
Pipeline and Vector Processing
4 pages
With 8051 Microcontroller: Analog To Digital Convertor Interface
No ratings yet
With 8051 Microcontroller: Analog To Digital Convertor Interface
37 pages
Embedded System Design Using Arduino
50% (2)
Embedded System Design Using Arduino
52 pages
CCS CMCS 611-101 Advanced Computer Architecture Advanced Computer Architecture
100% (2)
CCS CMCS 611-101 Advanced Computer Architecture Advanced Computer Architecture
24 pages
Basic Electronics
No ratings yet
Basic Electronics
13 pages
The 8051 Microcontroller and Embedded Systems
No ratings yet
The 8051 Microcontroller and Embedded Systems
19 pages
MCPI
No ratings yet
MCPI
4 pages
Intel 8086 Instruction Format
No ratings yet
Intel 8086 Instruction Format
16 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page
Multi2sim-M2s Simulation Framework
No ratings yet
Multi2sim-M2s Simulation Framework
36 pages
Lilijenemasarumi
No ratings yet
Lilijenemasarumi
3 pages
CSE209 Computer Organization and Architecture 4 3-1-0
No ratings yet
CSE209 Computer Organization and Architecture 4 3-1-0
2 pages
COMP 200 - Assignment #3
No ratings yet
COMP 200 - Assignment #3
3 pages
Project #15 - Cioc: PIC24 Project Rlan Marius Cosmin
No ratings yet
Project #15 - Cioc: PIC24 Project Rlan Marius Cosmin
7 pages
Microprocessor - Lab Assignment 1
No ratings yet
Microprocessor - Lab Assignment 1
30 pages
11 Instruction Set of 8086
0% (1)
11 Instruction Set of 8086
61 pages
Multiple Choice Questions For Unit 3
No ratings yet
Multiple Choice Questions For Unit 3
3 pages
MacBook Pro (13-Inch Late 2020) - Geekbench Browser
No ratings yet
MacBook Pro (13-Inch Late 2020) - Geekbench Browser
1 page
MP MC
50% (2)
MP MC
660 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

15-740/18-740 Computer Architecture Lecture 3: Performance: Carnegie Mellon University

Uploaded by

15-740/18-740 Computer Architecture Lecture 3: Performance: Carnegie Mellon University

Uploaded by

15-740/18-740

Prof. Onur Mutlu

 Large number of registers:

enqueue (Q, v) { enqueue (Q, v) {

 If there is a data conflict between two transactions, only

 How does the architect decide whether or not to support

 Tradeoffs: MANY high-level ones

Mem Data Reg

 All major ISAs today use this model

Instruction Data Path Data

- Zero-cycle latency - Perfect data flow - Zero-cycle latency

- Zero latency compute?

# instructions # cycles time

 Reducing cycles/instruction (CPI)

 Reducing time/cycle (clock period)

 Reducing cycles/instruction (CPI)

 Reducing time/cycle (clock period)

 Reducing cycles/instruction (CPI)

 Reducing time/cycle (clock period)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.