0% found this document useful (0 votes)
102 views17 pages

High Performance Scientific Computing: S. Gopalakrishnan!

HPSC Notes

Uploaded by

Pratik Shirsath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views17 pages

High Performance Scientific Computing: S. Gopalakrishnan!

HPSC Notes

Uploaded by

Pratik Shirsath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

High Performance Scientific computing

Lecture 4
S. Gopalakrishnan!
Memory Issues
Memory hierarchy

Faster

Costlier
Typical Hierarchy
Memory Latency Problem
Cache/MM virtual memory
Processor-DRAM Memory Performance Gap
Motivation for Memory Hierarchy
C µProc
1000CPU 8B a 32 B Memory 4 KB
CPU Memory disk
60%/yr.
disk
regs c (2X/1.5yr)
Performance

regs
h
100 Processor-Memory
e
Performance Gap:
! Notice 10
that the data width is changing (grows 50% / year)
• Why? DRAM
! Bandwidth: Transfer rate between various levels 5%/yr.
1 (2X/15 yrs)
• CPU-Cache: 24 GBps
1980

1984

1986

1988
1989
1990

1992

1994

1996

1998
1999
1981
1982
1983

1985

1987

1991

1993

1995

1997

2000
• Cache-Main: 0.5-6.4GBps
• Main-Disk: 187MBps (serial ATA/1500)
Time
ECE232: Memory Hierarchy 5 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

Source:Ece
ECE232: Memory Hierarchy 12 232 Umass-Amherst
Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Virtual Memory and Paging
Virtual memory Physical
(per process) memory

Another
process's
memory

RAM

Source: www. wikipedia.com


Disk
Memory Hierarchy Terminology
Memory Hierarchy: Terminology
! Hit: data appears in upper level in block X
! Hit Rate: the fraction of memory accesses found in the upper
level
! Miss: data needs to be retrieved from a block in the lower
level (Block Y)
! Miss Rate = 1 - (Hit Rate)
! Hit Time: Time to access the upper level which consists of
Time to determine hit/miss + upper level access time
! Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block to the processor
! Note: Hit Time << Miss Penalty Lower Level
To Processor Upper Level
Block Y

From Processor
Block X

Source: ECE 232 Umass-Amherst


ECE232: Memory Hierarchy 15 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Current Memory Hierarchy
Current Memory Hierarchy
Memory Latency Problem
Processor
Processor-DRAM Memory Performance Gap
Motivation for Memory Hierarchy
Control µProc
Secondary
1000
Main 60%/yr.
Memory
(2X/1.5yr)
Performance

L2 Memory
Data-
regs

100 L1 Cache Processor-Memory


path Cache
Performance Gap:
10 (grows 50% / year)
DRAM
Speed(ns): 1ns 2ns 6ns 100ns 10,000,000ns
5%/yr.
Size (MB): 1 0.0005 0.1 1-4 1000-6000 500,000
(2X/15 yrs)
Cost ($/MB): -- $10 $3 $0.01 $0.002
1980

1984

1986

1988
1989
1990

1992

1994

1996

1998
1999
1981
1982
1983

1985

1987

1991

1993

1995

1997

2000
Technology: Regs SRAM SRAM DRAM Disk

• Cache - Main memory: Time


Speed
• Main
ECE232: memory
Memory Hierarchy 5 – Disk (virtual memory): Capacity
Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

Source:Ece
ECE232: Memory Hierarchy 16 232 Umass-Amherst
Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Introduction to Parallel Programming
Shared'Memory,Processing,
,Each,processor,can,access,the,en6re,data,space,
,
–  Pro’s,
•  Easier,to,program,
•  Amenable,to,automa6c,parallelism,
•  Can,be,used,to,run,large,memory,serial,programs,
–  Con’s,
•  Expensive,
•  Difficult,to,implement,on,the,hardware,level,
•  Processor,count,limited,by,conten6on/coherency,(currently,around,512),
•  Watch,out,for,“NU”,part,of,“NUMA”,
Distributed*–*Memory*Machines*
!  Each*node*in*the*computer*has*a*locally*addressable*memory*space*
!  The*computers*are*connected*together*via*some*high:speed*network*
–  Infiniband,*Myrinet,*Giganet,*etc..*

•  Pros*
–  Really*large*machines*
–  Size*limited*only*by*gross*physical*
consideraFons:*
•  Room*size*
•  Cable*lengths*(10’s*of*meters)*
•  Power/cooling*capacity*
•  Money!*
–  Cheaper*to*build*and*run*
•  Cons*
–  Harder*to*program*
* *Data*Locality*
MPPs$(Massively$Parallel$Processors)$

Distributed$memory$at$largest$scale.$$OTen$shared$memory$
$at$lower$hierarchies.$

•  IBM$BlueGene/L$(LLNL)$
–  131,072$700$Mhz$processors$
–  256$MB$or$RAM$per$processor$
–  Balanced$compute$speed$with$interconnect$

!  Red$Storm$(Sandia$NaJonal$Labs)$
–  12,960$Dual$Core$2.4$Ghz$Opterons$
–  4$GB$of$RAM$per$processor$
–  Proprietary$SeaStar$interconnect$
fundamentally different design
Comparison of CPU vs GPU Architecture
philosophies.
ALU ALU
Control
ALU ALU
CPU GPU

Cache

DRAM DRAM

Source: Prof. Wen-mei W. Hwu UIUC


©Wen-mei W. Hwu and David Kirk/NVIDIA, Chile,
G2S3
Parallelization

GPU vs CPU computingGPU CPU Analogy

It is more effective to deliver Pizza’s through light duty scooters


rather than big truck. Similarly effective to use several lightweight
GPU processors for parallel tasks.
GPU Performance
Performance Advantage of GPUs
Peak performance increase
• An enlarging peak performance
Calculation advantage:
~ 1 TFlop on Desktop
– Calculation:
Memory1 TFLOPS vs. 100~GFLOPS
Bandwidth 150 GB/s
– Memory Bandwidth: 100-150 GB/s vs. 32-64 GB/s

Courtesy: John Owens

– GPU in every PC and workstation – massive volume and potential


source: top500.org
source: top500.org
Compute Unified Device Architecture
(CUDA)

• CUDA set of APIs (application program interface)


to use GPU’s for general purpose computing

• Developed and released by NVIDIA Inc. Works


only on NVIDIA GPU hardware

• Works on commercial GPU’s and as well as


specialized ones for scientific computing (Tesla)

• CUDA compiler supports C programming


language. Extensions to FORTRAN are possible.

• Opensource alternative is OpenCL.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy