0% found this document useful (0 votes)
156 views20 pages

15-740/18-740 Computer Architecture Lecture 3: Performance: Carnegie Mellon University

The document discusses various instruction set architecture (ISA) level tradeoffs that affect computer performance. It covers topics like the number of registers, addressing modes, instruction formats, and whether to support features like transactional memory. One of the key messages is that while more complex ISA features can help programmers, they also complicate the hardware design. The document analyzes how different ISA decisions impact the number of instructions, number of cycles per instruction, and clock cycle time to ultimately influence overall performance.

Uploaded by

Hari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
156 views20 pages

15-740/18-740 Computer Architecture Lecture 3: Performance: Carnegie Mellon University

The document discusses various instruction set architecture (ISA) level tradeoffs that affect computer performance. It covers topics like the number of registers, addressing modes, instruction formats, and whether to support features like transactional memory. One of the key messages is that while more complex ISA features can help programmers, they also complicate the hardware design. The document analyzes how different ISA decisions impact the number of instructions, number of cycles per instruction, and clock cycle time to ultimately influence overall performance.

Uploaded by

Hari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

15-740/18-740

Computer Architecture
Lecture 3: Performance

Prof. Onur Mutlu


Carnegie Mellon University
Last Time …
 Some microarchitecture ideas
 Part of microarchitecture vs. ISA
 Some ISA level tradeoffs
 Semantic gap
 Simple vs. complex instructions -- RISC vs. CISC
 Instruction length
 Uniform decode
 Number of registers

2
Review: ISA-level Tradeoffs: Number of Registers
 Affects:
 Number of bits used for encoding register address
 Number of values kept in fast storage (register file)
 (uarch) Size, access time, power consumption of register file

 Large number of registers:


+ Enables better register allocation (and optimizations) by
compiler  fewer saves/restores
-- Larger instruction size
-- Larger register file size
-- (Superscalar processors) More complex dependency check
logic

3
ISA-level Tradeoffs: Addressing Modes
 Addressing mode specifies how to obtain an operand of an
instruction
 Register
 Immediate
 Memory (displacement, register indirect, indexed, absolute,
memory indirect, autoincrement, autodecrement, …)

 More modes:
+ help better support programming constructs (arrays, pointer-
based accesses)
-- make it harder for the architect to design
-- too many choices for the compiler?
 Many ways to do the same thing complicates compiler design
 Read Wulf, “Compilers and Computer Architecture”
4
x86 vs. Alpha Instruction Formats
 x86:

 Alpha:

5
x86
register
indirect

absolute

register +
displacement
register

6
x86

indexed
(base +
index)

scaled
(base +
index*4)

7
Other ISA-level Tradeoffs
 Load/store vs. Memory/Memory
 Condition codes vs. condition registers vs. compare&test
 Hardware interlocks vs. software-guaranteed interlocking
 VLIW vs. single instruction
 0, 1, 2, 3 address machines
 Precise vs. imprecise exceptions
 Virtual memory vs. not
 Aligned vs. unaligned access
 Supported data types
 Software vs. hardware managed page fault handling
 Granularity of atomicity
 Cache coherence (hardware vs. software)
 …
8
Programmer vs. (Micro)architect
 Many ISA features designed to aid programmers
 But, complicate the hardware designer’s job

 Virtual memory
 vs. overlay programming
 Should the programmer be concerned about the size of code
blocks?
 Unaligned memory access
 Compile/programmer needs to align data
 Transactional memory?

9
Transactional Memory
THREAD 1 THREAD 2

enqueue (Q, v) { enqueue (Q, v) {


Node_t node = malloc(…); Node_t node = malloc(…);
node->val = v; node->val = v;
node->next = NULL; node->next = NULL;
acquire(lock); acquire(lock);
if (Q->tail) if (Q->tail)
Q->tail->next = node; Q->tail->next = node;
else else
Q->head = node; Q->head = node;
release(lock);
Q->tail = node; Q->tail
release(lock);
= node;
Q->tail
release(lock);
= node; release(lock);
Q->tail = node;
} }

begin-transaction begin-transaction
… …
enqueue (Q, v); //no locks enqueue (Q, v); //no locks
… …
end-transaction end-transaction

10
Transactional Memory
 A transaction is executed atomically: ALL or NONE

 If there is a data conflict between two transactions, only


one of them completes; the other is rolled back
 Both write to the same location
 One reads from the location another writes

11
ISA-level Tradeoff: Supporting TM
 Still under research
 Pros:
 Could make programming with threads easier
 Could improve parallel program performance vs. locks. Why?

 Cons:
 What if it does not pan out?
 All future microarchitectures might have to support the new
instructions (for backward compatibility reasons)
 Complexity?

 How does the architect decide whether or not to support


TM in the ISA? (How to evaluate the whole stack)
12
ISA-level Tradeoffs: Instruction Pointer
 Do we need an instruction pointer in the ISA?
 Yes: Control-driven, sequential execution
 An instruction is executed when the IP points to it
 IP automatically changes sequentially (except control flow
instructions)
 No: Data-driven, parallel execution
 An instruction is executed when all its operand values are
available (data flow)

 Tradeoffs: MANY high-level ones


 Ease of programming (for average programmers)?
 Ease of compilation?
 Performance: Extraction of parallelism?
 Hardware complexity?

13
The Von-Neumann Model
MEMORY
Mem Addr Reg

Mem Data Reg

PROCESSING UNIT
INPUT OUTPUT
ALU TEMP

CONTROL UNIT

IP Inst Register

14
The Von-Neumann Model
 Stored program computer (instructions in memory)
 One instruction at a time
 Sequential execution
 Unified memory
 The interpretation of a stored value depends on the control
signals

 All major ISAs today use this model


 Underneath (at uarch level), the execution model is very
different
 Multiple instructions at a time
 Out-of-order execution
 Separate instruction and data caches
15
Fundamentals of Uarch Performance Tradeoffs

Instruction Data Path Data


Supply (Functional Supply
Units)

- Zero-cycle latency - Perfect data flow - Zero-cycle latency


(no cache miss) (reg/memory dependencies)
- Infinite capacity
- No branch mispredicts - Zero-cycle interconnect
(operand communication) - Zero cost
- No fetch breaks
- Enough functional units

- Zero latency compute?


We will examine all these throughout the course (especially data supply)
16
How to Evaluate Performance Tradeoffs

time
Execution time =
program

# instructions # cycles time


= X X cycle
program instruction

Algorithm Microarchitecture
Program ISA Logic design
ISA Microarchitecture Circuit implementation
Compiler Technology

17
Improving Performance
 Reducing instructions/program

 Reducing cycles/instruction (CPI)

 Reducing time/cycle (clock period)

18
Improving Performance (Reducing Exec Time)
 Reducing instructions/program
 More efficient algorithms and programs
 Better ISA?

 Reducing cycles/instruction (CPI)


 Better microarchitecture design
 Execute multiple instructions at the same time
 Reduce latency of instructions (1-cycle vs. 100-cycle memory
access)

 Reducing time/cycle (clock period)


 Technology scaling
 Pipelining

19
Improving Performance: Semantic Gap
 Reducing instructions/program
 Complex instructions: small code size (+)
 Simple instructions: large code size (--)

 Reducing cycles/instruction (CPI)


 Complex instructions: (can) take more cycles to execute (--)
 REP MOVS
 How about ADD with condition code setting?
 Simple instructions: (can) take fewer cycles to execute (+)

 Reducing time/cycle (clock period)


 Does instruction complexity affect this?
 It depends
20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy