0% found this document useful (0 votes)

102 views17 pages

High Performance Scientific Computing: S. Gopalakrishnan!

HPSC Notes

Uploaded by

Pratik Shirsath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views17 pages

High Performance Scientific Computing: S. Gopalakrishnan!

HPSC Notes

Uploaded by

Pratik Shirsath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

High Performance Scientific computing

Lecture 4
S. Gopalakrishnan!
Memory Issues
Memory hierarchy

Faster

Costlier
Typical Hierarchy
Memory Latency Problem
Cache/MM virtual memory
Processor-DRAM Memory Performance Gap
Motivation for Memory Hierarchy
C µProc
1000CPU 8B a 32 B Memory 4 KB
CPU Memory disk
60%/yr.
disk
regs c (2X/1.5yr)
Performance

regs
h
100 Processor-Memory
e
Performance Gap:
! Notice 10
that the data width is changing (grows 50% / year)
• Why? DRAM
! Bandwidth: Transfer rate between various levels 5%/yr.
1 (2X/15 yrs)
• CPU-Cache: 24 GBps
1980

1984

1986

1988
1989
1990

1992

1994

1996

1998
1999
1981
1982
1983

1985

1987

1991

1993

1995

1997

2000
• Cache-Main: 0.5-6.4GBps
• Main-Disk: 187MBps (serial ATA/1500)
Time
ECE232: Memory Hierarchy 5 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

Source:Ece
ECE232: Memory Hierarchy 12 232 Umass-Amherst
Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Virtual Memory and Paging
Virtual memory Physical
(per process) memory

Another
process's
memory

RAM

Source: www. wikipedia.com

Disk
Memory Hierarchy Terminology
Memory Hierarchy: Terminology
! Hit: data appears in upper level in block X
! Hit Rate: the fraction of memory accesses found in the upper
level
! Miss: data needs to be retrieved from a block in the lower
level (Block Y)
! Miss Rate = 1 - (Hit Rate)
! Hit Time: Time to access the upper level which consists of
Time to determine hit/miss + upper level access time
! Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block to the processor
! Note: Hit Time << Miss Penalty Lower Level
To Processor Upper Level
Block Y

From Processor
Block X

Source: ECE 232 Umass-Amherst

ECE232: Memory Hierarchy 15 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Current Memory Hierarchy
Current Memory Hierarchy
Memory Latency Problem
Processor
Processor-DRAM Memory Performance Gap
Motivation for Memory Hierarchy
Control µProc
Secondary
1000
Main 60%/yr.
Memory
(2X/1.5yr)
Performance

L2 Memory
Data-
regs

100 L1 Cache Processor-Memory

path Cache
Performance Gap:
10 (grows 50% / year)
DRAM
Speed(ns): 1ns 2ns 6ns 100ns 10,000,000ns
5%/yr.
Size (MB): 1 0.0005 0.1 1-4 1000-6000 500,000
(2X/15 yrs)
Cost ($/MB): -- $10 $3 $0.01 $0.002
1980

1984

1986

1988
1989
1990

1992

1994

1996

1998
1999
1981
1982
1983

1985

1987

1991

1993

1995

1997

2000
Technology: Regs SRAM SRAM DRAM Disk

• Cache - Main memory: Time

Speed
• Main
ECE232: memory
Memory Hierarchy 5 – Disk (virtual memory): Capacity
Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

Source:Ece
ECE232: Memory Hierarchy 16 232 Umass-Amherst
Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Introduction to Parallel Programming
Shared'Memory,Processing,
,Each,processor,can,access,the,en6re,data,space,
,
–  Pro’s,
•  Easier,to,program,
•  Amenable,to,automa6c,parallelism,
•  Can,be,used,to,run,large,memory,serial,programs,
–  Con’s,
•  Expensive,
•  Diﬃcult,to,implement,on,the,hardware,level,
•  Processor,count,limited,by,conten6on/coherency,(currently,around,512),
•  Watch,out,for,“NU”,part,of,“NUMA”,
Distributed*–*Memory*Machines*
!  Each*node*in*the*computer*has*a*locally*addressable*memory*space*
!  The*computers*are*connected*together*via*some*high:speed*network*
–  Inﬁniband,*Myrinet,*Giganet,*etc..*

•  Pros*
–  Really*large*machines*
–  Size*limited*only*by*gross*physical*
consideraFons:*
•  Room*size*
•  Cable*lengths*(10’s*of*meters)*
•  Power/cooling*capacity*
•  Money!*
–  Cheaper*to*build*and*run*
•  Cons*
–  Harder*to*program*
* *Data*Locality*
MPPs$(Massively$Parallel$Processors)$

Distributed$memory$at$largest$scale.$$OTen$shared$memory$
$at$lower$hierarchies.$

•  IBM$BlueGene/L$(LLNL)$
–  131,072$700$Mhz$processors$
–  256$MB$or$RAM$per$processor$
–  Balanced$compute$speed$with$interconnect$

!  Red$Storm$(Sandia$NaJonal$Labs)$
–  12,960$Dual$Core$2.4$Ghz$Opterons$
–  4$GB$of$RAM$per$processor$
–  Proprietary$SeaStar$interconnect$
fundamentally different design
Comparison of CPU vs GPU Architecture
philosophies.
ALU ALU
Control
ALU ALU
CPU GPU

Cache

DRAM DRAM

Source: Prof. Wen-mei W. Hwu UIUC

GPU vs CPU computingGPU CPU Analogy

It is more effective to deliver Pizza’s through light duty scooters

rather than big truck. Similarly effective to use several lightweight
GPU processors for parallel tasks.
GPU Performance
Performance Advantage of GPUs
Peak performance increase
• An enlarging peak performance
Calculation advantage:
~ 1 TFlop on Desktop
– Calculation:
Memory1 TFLOPS vs. 100~GFLOPS
Bandwidth 150 GB/s
– Memory Bandwidth: 100-150 GB/s vs. 32-64 GB/s

Courtesy: John Owens

– GPU in every PC and workstation – massive volume and potential

source: top500.org
source: top500.org
Compute Unified Device Architecture
(CUDA)

• CUDA set of APIs (application program interface)

to use GPU’s for general purpose computing

• Developed and released by NVIDIA Inc. Works

only on NVIDIA GPU hardware

• Works on commercial GPU’s and as well as

specialized ones for scientific computing (Tesla)

• CUDA compiler supports C programming

language. Extensions to FORTRAN are possible.

• Opensource alternative is OpenCL.

Leader Test Instruments: Signal Genera Tor
100% (1)
Leader Test Instruments: Signal Genera Tor
2 pages
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
No ratings yet
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
112 pages
551 10 14 2010 Memory PDF
No ratings yet
551 10 14 2010 Memory PDF
66 pages
CSE332 Cache Memory 1 May2024
No ratings yet
CSE332 Cache Memory 1 May2024
80 pages
ECE131 Unit1 Part2
No ratings yet
ECE131 Unit1 Part2
87 pages
MCP 9804
No ratings yet
MCP 9804
54 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Comfund Quiz 3
No ratings yet
Comfund Quiz 3
11 pages
13 Memory
No ratings yet
13 Memory
56 pages
Stud-CSA Memory Mod2-Part2 (Autosaved) (Autosaved)
No ratings yet
Stud-CSA Memory Mod2-Part2 (Autosaved) (Autosaved)
48 pages
Memory Hirecracy
No ratings yet
Memory Hirecracy
3 pages
CM303
No ratings yet
CM303
17 pages
CAO - Unit 4
No ratings yet
CAO - Unit 4
57 pages
MEMORY_GRADUATE-SP22
No ratings yet
MEMORY_GRADUATE-SP22
91 pages
Halo Network
No ratings yet
Halo Network
26 pages
CS 3853 Computer Architecture - Memory Hierarchy
No ratings yet
CS 3853 Computer Architecture - Memory Hierarchy
37 pages
907-0601-00 907 Users Manual RevC
No ratings yet
907-0601-00 907 Users Manual RevC
71 pages
Lec13 Memory 1 Notes
No ratings yet
Lec13 Memory 1 Notes
27 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
Lecture4-Ch2-Memory Hierarchy Design
No ratings yet
Lecture4-Ch2-Memory Hierarchy Design
34 pages
DPCO UNIT 5 NOTES
No ratings yet
DPCO UNIT 5 NOTES
49 pages
Week6 Slides
No ratings yet
Week6 Slides
18 pages
Ieee Bakker 1996
No ratings yet
Ieee Bakker 1996
5 pages
Lecture 3 (Memory Hierarchy and Caches)
No ratings yet
Lecture 3 (Memory Hierarchy and Caches)
88 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
28 pages
week10
No ratings yet
week10
59 pages
Tutorial 2 Solution: Kinematics and Dynamics of Machines (Me 316)
No ratings yet
Tutorial 2 Solution: Kinematics and Dynamics of Machines (Me 316)
5 pages
ECE 152 Introduction To Computer Architecture Where We Are in This Course Right Now
No ratings yet
ECE 152 Introduction To Computer Architecture Where We Are in This Course Right Now
12 pages
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
No ratings yet
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
11 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
Data Storage Hierarchy
No ratings yet
Data Storage Hierarchy
14 pages
Memory Design
No ratings yet
Memory Design
36 pages
Power Distribution Network Design For Vlsi
No ratings yet
Power Distribution Network Design For Vlsi
211 pages
Capacitor Series Parallel
No ratings yet
Capacitor Series Parallel
4 pages
Ddca 2024 Lecture24 Memory Hierarchy and Caches Beforelecture
No ratings yet
Ddca 2024 Lecture24 Memory Hierarchy and Caches Beforelecture
304 pages
ECE4680 Computer Organization and Architecture Memory Hierarchy
No ratings yet
ECE4680 Computer Organization and Architecture Memory Hierarchy
7 pages
workshop CAT-3 ppt
No ratings yet
workshop CAT-3 ppt
14 pages
1. Memory Hierarchy Basics
No ratings yet
1. Memory Hierarchy Basics
12 pages
Huntron HTR 1005B1S Maintenance
100% (1)
Huntron HTR 1005B1S Maintenance
38 pages
ACA - Memory
No ratings yet
ACA - Memory
26 pages
Coa 2..2
No ratings yet
Coa 2..2
42 pages
Memory Hierarchy: Memory Hierarchy Design in A Computer System Mainly
No ratings yet
Memory Hierarchy: Memory Hierarchy Design in A Computer System Mainly
16 pages
Chapter 3 P1
No ratings yet
Chapter 3 P1
57 pages
Lecture 9 - The Memory Hierarchy
No ratings yet
Lecture 9 - The Memory Hierarchy
25 pages
Matv Digital + Gpon + WDM
No ratings yet
Matv Digital + Gpon + WDM
12 pages
Munshi Meraj Hossain_PCCCS302
No ratings yet
Munshi Meraj Hossain_PCCCS302
10 pages
Class 1 Notes
No ratings yet
Class 1 Notes
8 pages
Memory HIerarchy
No ratings yet
Memory HIerarchy
53 pages
LECTURE 1 - Memory Hierarchy and Locality of Reference
No ratings yet
LECTURE 1 - Memory Hierarchy and Locality of Reference
58 pages
Memory and Cache
No ratings yet
Memory and Cache
9 pages
Module 6_Memory
No ratings yet
Module 6_Memory
32 pages
Hierarchy of Memory in Computer Organization and Architecture
No ratings yet
Hierarchy of Memory in Computer Organization and Architecture
8 pages
Arduino Uno & Respbarry Pi & Node Mcu Codes
No ratings yet
Arduino Uno & Respbarry Pi & Node Mcu Codes
29 pages
Chapter 3
No ratings yet
Chapter 3
31 pages
2. Lecture 18 Memory Hierarchy
No ratings yet
2. Lecture 18 Memory Hierarchy
8 pages
Unit 4 MMemory Hierarchy
No ratings yet
Unit 4 MMemory Hierarchy
14 pages
Lecture Slides 07 073-Caches-hierar
No ratings yet
Lecture Slides 07 073-Caches-hierar
7 pages
Deepcool CK560 Manual
No ratings yet
Deepcool CK560 Manual
2 pages
Computer Architecture Unit 2
No ratings yet
Computer Architecture Unit 2
32 pages
An Expanded View of The Memory System: Processor
No ratings yet
An Expanded View of The Memory System: Processor
3 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
3 pages
FALLSEM2021-22 CSE2001 TH VL2021220104187 Reference Material I 20-09-2021 Memory Hierarchy Design and Its Characteristics
No ratings yet
FALLSEM2021-22 CSE2001 TH VL2021220104187 Reference Material I 20-09-2021 Memory Hierarchy Design and Its Characteristics
3 pages
Fluid Mechanics Lab (Me 224) : Sr. No. Roll No. Name Section Batch Tuesday Batch
No ratings yet
Fluid Mechanics Lab (Me 224) : Sr. No. Roll No. Name Section Batch Tuesday Batch
4 pages
Case Study Allotment
No ratings yet
Case Study Allotment
4 pages
Vid Notes
No ratings yet
Vid Notes
5 pages
Memory Hierarchy
100% (1)
Memory Hierarchy
47 pages
Lecture Slides-Week2
No ratings yet
Lecture Slides-Week2
58 pages
MEMORY HIERACHY
No ratings yet
MEMORY HIERACHY
4 pages
Sol - Kvpy 2014 Stage-I - SX (Xii) - Answer Keys (NW)
No ratings yet
Sol - Kvpy 2014 Stage-I - SX (Xii) - Answer Keys (NW)
1 page
Course Structure (2013-2017) For B. Tech in Electronics and Telecommunication Engineering
No ratings yet
Course Structure (2013-2017) For B. Tech in Electronics and Telecommunication Engineering
5 pages
Model Mania 2003 Phase 1
No ratings yet
Model Mania 2003 Phase 1
1 page
Amplificador Retekess TR619-English (Oct-2020)
No ratings yet
Amplificador Retekess TR619-English (Oct-2020)
1 page
Hikvision Monitor
No ratings yet
Hikvision Monitor
4 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
OS Module 4.Pptx
No ratings yet
OS Module 4.Pptx
32 pages
coaint
No ratings yet
coaint
16 pages
Memory Organization Memory Hierarchy 2.2.1
No ratings yet
Memory Organization Memory Hierarchy 2.2.1
3 pages
4 Memory Models
No ratings yet
4 Memory Models
19 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
COA micro Ahsham(1)
No ratings yet
COA micro Ahsham(1)
9 pages
Lecture 2.2.1 (Memory Organization-Memory Hierarchy)
No ratings yet
Lecture 2.2.1 (Memory Organization-Memory Hierarchy)
10 pages
07 - Input Output
No ratings yet
07 - Input Output
82 pages
Lesson 3 Memory Hierarchy RAM Cache and ROM
No ratings yet
Lesson 3 Memory Hierarchy RAM Cache and ROM
8 pages
Matsonic 8177C
No ratings yet
Matsonic 8177C
34 pages
Water Level Indicator With Numeric Display - Final
0% (1)
Water Level Indicator With Numeric Display - Final
17 pages
Cable Operators With TBN
0% (1)
Cable Operators With TBN
2 pages
Homework 1 Solutions
No ratings yet
Homework 1 Solutions
18 pages
Latest Sardar Jokes
No ratings yet
Latest Sardar Jokes
5 pages
SpyGlass 4 6 Training PDF
100% (1)
SpyGlass 4 6 Training PDF
74 pages
(MWJ0210) Reviewing The Basics of Suspended Striplines
No ratings yet
(MWJ0210) Reviewing The Basics of Suspended Striplines
5 pages
A Step by Step Procedure On Designing Circuit Layout Using Express PCB Software
No ratings yet
A Step by Step Procedure On Designing Circuit Layout Using Express PCB Software
7 pages
Hfe Bose 1800 1801 Service
No ratings yet
Hfe Bose 1800 1801 Service
32 pages
Bipolar MLTL Speaker Design
No ratings yet
Bipolar MLTL Speaker Design
6 pages
Electronics CBLM
67% (3)
Electronics CBLM
94 pages
Computer's components
From Everand
Computer's components
Jose Israel Jirón Méndez
No ratings yet
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
From Everand
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
Rodrigo Copetti
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

High Performance Scientific Computing: S. Gopalakrishnan!

Uploaded by

High Performance Scientific Computing: S. Gopalakrishnan!

Uploaded by

High Performance Scientific computing

Source: www. wikipedia.com

Source: ECE 232 Umass-Amherst

100 L1 Cache Processor-Memory

• Cache - Main memory: Time

Source: Prof. Wen-mei W. Hwu UIUC

GPU vs CPU computingGPU CPU Analogy

It is more effective to deliver Pizza’s through light duty scooters

Courtesy: John Owens

– GPU in every PC and workstation – massive volume and potential

• CUDA set of APIs (application program interface)

• Developed and released by NVIDIA Inc. Works

• Works on commercial GPU’s and as well as

• CUDA compiler supports C programming

• Opensource alternative is OpenCL.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.