0% found this document useful (0 votes)
11 views41 pages

Introducción 2024

The document provides an overview of computer architecture, covering key concepts such as Instruction Set Architecture (ISA), organization, and hardware implementation. It discusses the evolution of computer technology, current trends in processors and memory, and the challenges of power consumption and energy efficiency. Additionally, it highlights the shift from uniprocessor to multicore architectures as a response to limitations in performance and power management.

Uploaded by

Clash Oflords
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views41 pages

Introducción 2024

The document provides an overview of computer architecture, covering key concepts such as Instruction Set Architecture (ISA), organization, and hardware implementation. It discusses the evolution of computer technology, current trends in processors and memory, and the challenges of power consumption and energy efficiency. Additionally, it highlights the shift from uniprocessor to multicore architectures as a response to limitations in performance and power management.

Uploaded by

Clash Oflords
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Version 9/23/23

Computer Architecture

Introduction to Computer
Architecture and
Programming Models

Fernando.Rincón@uclm.es, JoseAntonio.Torre@uclm.es
Outline

● The concept of Architecture


● Computer technology evolution
● Types of computers
● Trends
● Processors, DRAM, flash, disks and LAN

● Power and Energy

● Architecture

● Parallel programming models


● Review
● Principles of qualitative design

Computer Architecture 2
The Concept of Architecture
● Considers three aspects:
● ISA (Instruction Set Architecture): design and properties of the
machine instructions.
– External architecture of the hardware.
● Organization: the structure formed by the main components of a
computer (memory, CPU, buses...), as well as the more abstract
characteristics of those components.
– Also referred as internal architecture or microarchitecture.
● Hardware: detailed implementation of the structure. Main aspects:
– Detailed logical design
– Implementation as an Integrated Circuit (IC)

Computer Architecture 3
Main Aspects in the Design of a
Computer

Computer Structure course covers


Computer ISA introduction and main
microarchitectural units
Architecture
At Computer Organization you
The main concern of learn about memory hierarchy, CPU
this course from a pipelines and modern ISA
global point of view foundations.

Partially covered at Computer


Technology course: logic design,
clocking, memory technology, ...

Computer Architecture 4
Organization & Implementation
● All 3 aspects are interdependent
● Ex.: an ideal architecture for a MSI implementation may
not be right for a VLSI one.
● The computer designer cannot design an ISA ignoring
implementation aspects
● Implementation technology evolution has a deep impact
on the performance of microarchitectural units:
– Ex: memory keeps improving bandwidth but not latency → need
for deeper caches
– Ex: power consumption increases but has less area for
dissipation → downclocking

Computer Architecture 5
Current Architectures

Sensor networks
Cameras Set-top boxes

Game consoles

Media Players Servers


Laptops

Robots

Smart phones Cars Supercomputers

Computer Architecture 6
But this is how we started
● Eniac (1946)
● 30 tons
● 17.000 vacuum tubes
– Transistors not invented
● 5K additions per second
● 175kW
● 167m2 area
● $6,904,265 equivalent price in 2022

Which are the equivalent numbers for a nowadays laptop?

Computer Architecture 7
Computer technology evolution
● IC technology evolution quite steady
● Computer architecture evolution not that much
● 5 phases:
● Initial 25 years: improvement mainly due to IC technology
– Clock and size
● Birth of RISC architectures: performance boost (ILP)
– Better compilers thanks to simpler architectures
● Power & ILP wall
– Power dissipation in smaller areas → heat
– Bandwidth & latency mismatch
● Birth of multicores
– From Implicit Level Parallelism to explicit one
● The end of Moore’s Law
– Specialized processors as an alternative to Von-Neumann architecture

Computer Architecture 8
CPUs Performance

Ley de Moore cosas


mas pequeñas,
concentrando temp
Amdahl’s Law

Limitaciones físicas End of


uniprocessors

End of Moore’s
Law
Incrementar freq reloj

Paralelismo a nivel de instrucción Power wall


Compilador: resuelve
ILP wall
dependencias
Simplificar ISA

Procesadores superescalares más de una First RISC


instrucción a la vez

Modelo Von Neuman mover datos a base de registros

Computer Architecture 9
Trends in technology
● Integrated circuit technology (Moore’s Law)
● Transistor density: 35%/year
● Die size: 10-20%/year
● Integration overall: 40-55%/year
● Nowadays coming to an end
● DRAM capacity: 25-40%/year (slowing)
● 8 Gb (2014), 16 Gb (2019), possibly no 32 Gb
● Flash capacity: 50-60%/year
● 8-10X cheaper/bit than DRAM
● Magnetic disk capacity: recently slowed to 5%/year
● Density increases may no longer be possible, maybe increase from 7 to 9 platters
● 8-10X cheaper/bit than Flash
● 200-300X cheaper/bit than DRAM

Computer Architecture 10
Trends in technology
Latency vs Bandwidth
Cuello de botella memorias cpu

Bandwidth or throughput Memory Wall

Total work done in a given time


32,000-40,000X improvement for
processors
300-1200X improvement for
memory and disks

Latency or response time


Time between start and completion
of an event
50-90X improvement for processors
6-8X improvement for memory and
latency = simple operation w/o contention
disks BW = best-case

Computer Architecture 11
Trends in technology
Milestones

Computer Architecture 12
Trends in technology
Milestones

Computer Architecture 13
Power and Energy

Capacidad de consumo

● Energy = Power x time


● Power = Energy / time
Computer Architecture 14
Power & Energy:
Systems perspective
● 3 primary concerns:
● Maximum power required
– To avoid voltage drops
– Many components can dynamically regulate their consumption
● Sustained power consumption:
– Referred as thermal design power (TDP)
– Used to properly design the typical case of the system
● Power or energy to measure efficiency:
– In general energy. Determines life of batteries and electricity bill

Computer Architecture 15
Energy and power in the chips
Complementary MOS (silicio) -> combinan transistor -> se comporta como interruptor -> Las puertas lógicas son combinaciones de transistores

● We assume CMOS technology


● Transistor consumption:
aumenta con las oscilaciones del reloj

● Dynamic power: consumed because of logic transitions


oscilaciones entre 0 y 1
(main source of consumption)
● Static power: consumed when the transistor is not active,
and due to current leaks
– Permanent power consumption unless switch-off

Computer Architecture 16
Dynamic energy and power
● Dynamic energy consumed by a single transistor during a
transition (0→1 or 1→0):
te enfrentas a ciertas variables en el diseño del circuito

● Edyn = k x CL x V2
– CL: capacitive load; tiene que ver con las características fiscas de los transistores
– V: voltage
– K: proportionality constant

● Since Power = Energy / time, and assuming number of


transitions ∝ to clock frequency
● Pdyn = Edyn / transition time
● And since Tclock = 1 / F
● Pdyn = Edyn x F
● Pdyn = k x CL x V2 x F
Sobrecalentamiento -> Bajan V o F

Computer Architecture 17
Dynamic energy and power
● If there are N active transistors in the integrated
circuit (IC):
● PdynIC = Pdyn x N = k x CL x V2 x F x N
N = (transiciones que se producen)

● If the IC is active for a time t, the energy consumed


in terms of the cycle count will be:
● EdynIC = PdynIC x t
● But t=CC/F (CC: cycle count)
● From that, we deduce that:
● EdynIC = k x CL x V2 x CC x N

Computer Architecture 18
Static energy and power
Es fija
● Static power per transistor:
● Pst ∝ Cst x V
● Cst: static current
● Inside the chip
● It increases due to the raising number of transistors
● Some subsystems may be switched off when not used
– May affect performance
● Is around 25% of the overall power consumed
– Raising up to a 50% for high performance designs

Computer Architecture 19
Trends in power and energy
● Energy consumption per unit of time (power) in
microprocessors keeps raising:
● For a single transistor it decreased (lower V and CL)
● But we increase in the number of transistors
● And also increase in the clock frequency (but not much:
Power wall. See next slide)
● More heat is generated which must be dissipated
● Otherwise the chip will get burned
● Power consumption is one of the main concern of
current chip designers

Computer Architecture 20
Evolution of clock frequency in
microprocessors

Computer Architecture 21
The problem of power
consumption
● How to design the power supply system to provide
as much power as required
● How to design the cooling system to avoid
overheating
● Current chips reduce their clock frequency when they
reach a critical temperature
● But efficiency is better measured with energy than
with power. Which chip is more efficient, A or B?
Se planifica la tarea (subir o bajar V)
● A compare to B: requires 20% more power, but takes
75% of the time to complete a task
– A consumes 90% of the energy (1,2 * 0,75 = 0,9) to complete
the same task

Computer Architecture 22
Minimizing energy consumption
● Several alternatives:
● Turn off inactive modules
Consumo estatico -> Coste levantar modulo

● Dynamic Voltage-Frequency Scaling (DVFS)


Schedulers (Planificador) -> Reducir consumo potenencia

● Design for the typical case


Se diseña un sistema que funcione a un determinado porcentaje, se deja un margen (sobredimensiona)

● Underclocking
Subir frecuencia- > riesgo

Computer Architecture 23
Dark silicon problem
No podemos alimentar todas las partes (gran parte apagada)

● Some parts of the IC can’t be powered-on for a


given TDP restriction
● Solution:
● Fill it with specialized cores that won’t be used at the
same time than the general-purpose ones

Evolution of
the dark silicon
problem with
the arrival of
multicores

Computer Architecture 24
Trends in architecture
● Cannot continue to leverage Instruction-Level parallelism (ILP)
Paralelismo oculto -> pasos que se pueden hacer en paralelo, los compiladores y arquitecturas son capaces de explotar esto
● Single processor performance improvement ended in 2003
● New models for performance:
Paralelismo explicito -> depende del programador
GPUs ● Data-level parallelism (DLP)
● Thread-level parallelism (TLP)
Cloud ● Request-level parallelism (RLP)
● These require explicit restructuring of the application
● ILP was automatically handled by the compiler
● Parallel programming requires the programmer to be aware of the architecture
– Granularity of the parallelism
– Memory hierarchy bandwidth and latencies
– Data structures and their behavior with respect to caches

Computer Architecture 25
Trends in architecture:
Ex: CPUs evolution

● 1982 Intel 80286 ● 2001 Intel Pentium 4


● 1500 MHz (120X)
● 12.5 MHz
● 4500 MIPS (peak) (2250X)
● 2 MIPS (peak) ● Latency 15 ns (20X)
● Latency 320 ns ● 42,000,000 xtors, 217 mm2
● 134,000 xtors, 47 mm2 ● 64-bit data bus, 423 pins
● 3-way superscalar,
● 16-bit data bus, 68 pins ● Dynamic translate to RISC,
● Microcode interpreter, Superpipelined (22 stage),
● separate FPU chip
● Out-of-Order execution
● On-chip 8KB Data caches,
● (no caches) ● 96KB Instr. Trace cache,
● 256KB L2 cache

Computer Architecture 26
Design complexity
● CPI evolution
● 80s: 5.0 → 1.15 (decade of pipelining)
● 90s: 1.15 → 0.5 (decade of superscalar) Factor proporcional de reducción pero nunca exacto
Limitado por el tanto porciento de paralelismo del sistema

● 2000s: core CPI unchanged; chip CPI scales with #cores

Aumentan el rendimiento pero consumen más, son más agresivos y desechan instrucciones condicionales
Computer Architecture 27
The End of Uni-processors
● In the early 2000s we got to a point where increasing
ILP and clock freq. results in more problems than
benefits:
● Consequence:
● Around 2005 Intel and other manufacturers switched to
multicore as the way to increase performance
– Integrating several identical processors in the same chip vs a
bigger and more complex unicore is:
● Simpler to design and verify → cheaper
● More energy efficient (no clock increase and selective powering of the
cores used)
mas grados de libertad
● The birth of multi-cores is the most important landmark in
computer architecture since pipelining and ILP

Computer Architecture 28
The Context Changes
● The arrival of multicore can be seen a the response to new
big challenges
● With respect to consumption
● Before: Energy cheap / transistors expensive
● Now: “Power Wall”. Energy expensive / transistors very cheap
– A chip can integrate more transistors than can be powered
● With respect to parallelism
● Before: Enough with ILP extraction via compilers (out-of-order,
speculation, VLIW, …)
● Now: “ILP Wall” Hw complexity avoids improvements in the
parallelism degree

Computer Architecture 29
The Context Changes II
● With respect to memory:
● Before: Multiplications slow / access to memory fast
● Now: “Memory Wall” memory very slow / multiplications very fast
● Processor performance:
● Before: improvement 2x/1.5 years
● Now:
– Improvement 3% per year
– Improvement 2x processors per chip/ ~ 2 years
● With respect to complexity:
● Before: Design and verification of increasingly large cores is very expensive
“Complexity Wall”
– 100s of engineers for 3-5 years
– And caches are easy to design, but locality is limited
● Now:
– 2x cores/chip with each CMOS generation
– It doesn’t compromise clock frequency

Computer Architecture 30
Superscalar vs Multicore
● Typical architectures Multicore: Interconexión mediante buses, hay que orquestar y describir como
trabajar con los distintos procesadores

OOO Superscalar Multicore

Computer Architecture 31
Trends in architecture
● Nowadays all computers are parallel
● A parallel computer is a system where some
processors or computers collaborate for the
resolution of a problem
● Only those where parallelism is visible to the eyes of the
programmer are considered parallel computers
● Even older computers include some kind of parallelism:
– There's parallelism when at least during some instants some
computing events happen at the same time

Computer Architecture 32
Trends in architecture
● During the last decades of XX century, performance
improvement relied on the increase in ILP
● All current processors include ILP techniques (studied
during the course)
● Now we should outline:
● MIMD architectures
– Task parallelism
● Vectorial architectures & Graphic processing units (GPU
[Graphic Processor Unit], GPGPU [General-Purpose GPU])
– They exploit data parallelism

Computer Architecture 33
Trends in architecture
● Combination of multi-core, GPU and other specific
processors results in heterogeneous systems:
● Multi-cores deal with task parallelism
● GPUs deal with data parallelism SIMD
● Can be integrated in a single chip
– Technology has led to Systems-on-Chip

Computer Architecture 34
Hybrid MIMD

● The combination of multiple several multi-


core systems results in hybrid MIMD Paralelismo a nivel de instrucciones, no
colaboran entre sí, se adapta bien a multicore
● Groups of processors share a common memory
● Communication between processors in a different group
takes place through the bus
Se combinan dos cluster de procesadores, se produce un cambio entre uno y otro dependiendo de la
configuración y la forma de adaptarse al problema

Example:
ARM big-little architecture

Computer Architecture 35
Specific processors

● CPUs with specialized functional units


and instruction sets for image processing,
neural networks, …
● As the way to keep the improvement of the
relationship between power-performance-cost
triad. Aplicaciones diferentes y muy variadas arquitectura heterogénea colección de elementos optimizados

Neural
Processing
Unit

Computer Architecture 36
Classes of Computers

Amazon: Ofrece estos servicios a base de pagar por uso

● PMD:
● e.g. start phones, tablet computers, wearables
● Clusters / Warehouse Scale Computers
● Used for “Software as a Service (SaaS)”
● Sub-class: Supercomputers, emphasis: floating-point
performance and fast internal networks
Computer Architecture 37
Review: Computing
performance
● Latency & Throughput:
● Throughput or bandwidth:
– Amount of work performed per unit of time
Cantidad de tiempo desde la ejecución hasta finalizar
● Latency or response time (wall-clock time):
– Time passed between the start and end of an event
● If the event is the execution of a program: running time
● It's not CPU time, which doesn't include stalls, nor idle times

Computer Architecture 38
CPU performance
● Measured as the inverse of the CPU execution time
● CPU time has two components:
● CPU user time: devoted to the user program
● CPU system time: devoted to OS tasks related to the program
● CPU time equations:
– TCPU = CC * T = IC * CPI * T = IC * CPI / F
● CC: total number of clock cycles
● IC: number of instructions executed
● CPI: average number of instructions per cycle
● T: clock period time
● F: clock frequency

Computer Architecture 39
Amdahl's law
● Which is the global improvement when executing a
certain task if a part of that task has been
accelerated?
● Data:
– To: original time to perform the task
– F: fraction of the task that gets accelerated
– Gp: partial gain (acceleration rate) Las soluciones son una nube de puntos que
dependen de las prestaciones
● We want to compute Ti, the time of the improved version,
and the relationship between both of them
Ejercicio
A -> 2 Cores
B -> 4 Cores, 70% + de transistores (importa consumo)
El 70% se corresponde con el Runtime que distribuye a cada core, existe un punto de sincronización
Cual es la Ganancia global respecto A El resto se corresponde con el procesador master del sistema
La Frecuencia de B < A Ley de Amdahl -> No, dos factores que afectan al tiempo (frecuencia B distinta)
| 70% (se puede beneficiar más de 1 core) |
Ta = Cc (Ciclo reloj totales) * Ta (Tiempo reloj A) Cca = Cc(1 * 0,7) + 0,7/2 = 0,65cc
Ciclo de reloj 20% mayor Tb = Cc * Tb Ccb = Cc(1 * 0,7) + (0,7/4)Cc = 0,475
Tb = Ta * 1,2

Ta 0,65 = 14%
Tb 0,425 * 1,2

Computer Architecture 40
Amdahl's law
● How do we compare Ti versus To?
● Dividing To/Ti which represents the global gain
● Ti = F · To / Gp + (1 – F) · To
● Gg = To / Ti = 1 / (1-F + F / Gp)
● But:
● The law only applies if both the accelerated and non-
accelerated parts don't overlap
● There's a limit in the global gain, no matter what the partial
gain is: 1 / (1 – F)
● We should focus the improvement on the bottlenecks

Computer Architecture 41

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy