0% found this document useful (0 votes)

23 views

13 Wrapup

Uploaded by

oreh2345

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

13 Wrapup

Uploaded by

oreh2345

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

University of Washington

thanks to Dan Grossman for the succinct definitions

What is parallel processing?

When can we execute things in parallel?

Parallelism: Concurrency:
Use extra resources to Correctly and efficiently manage
solve a problem faster access to shared resources
work requests

resources resource

Autumn 2013 Wrap-up 1

University of Washington

What is parallel processing?

¢ Briefly introduction to key ideas of parallel processing

§ instruction level parallelism
§ data-level parallelism
§ thread-level parallelism

Autumn 2013 Wrap-up 2

University of Washington

Exploiting Parallelism

¢ Of the computing problems for which performance is important, many

have inherent parallelism

¢ computer games
§ Graphics, physics, sound, AI etc. can be done separately
§ Furthermore, there is often parallelism within each of these:
§ Each pixel on the screen’s color can be computed independently
§ Non-contacting objects can be updated/simulated independently
§ Artificial intelligence of non-human entities done independently

¢ search engine queries

§ Every query is independent
§ Searches are (ehm, pretty much) read-only!!

Autumn 2013 Wrap-up 3

University of Washington

Instruction-Level Parallelism
add %r2 <- %r3, %r4 Dependences?
or %r2 <- %r2, %r4 RAW – read after write
lw %r6 <- 0(%r4) WAW – write after write
addi %r7 <- %r6, 0x5 WAR – write after read
sub %r8 <- %r8, %r4
When can we reorder instructions?

When should we reorder instructions?

add %r2 <- %r3, %r4

or %r5 <- %r2, %r4 Superscalar Processors:
lw %r6 <- 0(%r4) Multiple instructions executing in
sub %r8 <- %r8, %r4 parallel at *same* stage
addi %r7 <- %r6, 0x5

Take 352 to learn more.

Autumn 2013 Wrap-up 4
University of Washington

Data Parallelism
¢ Consider adding together two arrays:

void array_add(int A[], int B[], int C[], int length) {

int i;
for (i = 0 ; i < length ; ++ i) {
C[i] = A[i] + B[i];
}
}

Operating on one element at a time

Autumn 2013 Wrap-up 5

University of Washington

Data Parallelism
¢ Consider adding together two arrays:

void array_add(int A[], int B[], int C[], int length) {

int i;
for (i = 0 ; i < length ; ++ i) {
C[i] = A[i] + B[i];
}
}

Operating on one element at a time

Autumn 2013 Wrap-up 6

University of Washington

Data Parallelism with SIMD

¢ Consider adding together two arrays:

void array_add(int A[], int B[], int C[], int length) {

int i;
for (i = 0 ; i < length ; ++ i) {
C[i] = A[i] + B[i];
}
}
Operate on MULTIPLE elements

+ + ++ Single Instruction,
Multiple Data (SIMD)

Autumn 2013 Wrap-up 7

University of Washington

Is it always that easy?

¢ Not always… a more challenging example:

unsigned sum_array(unsigned *array, int length) {

int total = 0;
for (int i = 0 ; i < length ; ++ i) {
total += array[i];
}
return total;
}

¢ Is there parallelism here?

¢ Each loop iteration uses data from previous iteration.

Autumn 2013 Wrap-up 8

University of Washington

Restructure the code for SIMD…

// one option...
unsigned sum_array2(unsigned *array, int length) {
unsigned total, i;
unsigned temp[4] = {0, 0, 0, 0};
// chunks of 4 at a time
for (i = 0 ; i < length & ~0x3 ; i += 4) {
temp[0] += array[i];
temp[1] += array[i+1];
temp[2] += array[i+2];
temp[3] += array[i+3];
}
// add the 4 sub-totals
total = temp[0] + temp[1] + temp[2] + temp[3];
// add the non-4-aligned parts
for ( ; i < length ; ++ i) {
total += array[i];
}
return total;
}
Autumn 2013 Wrap-up 9
University of Washington

What are threads?

¢ Independent “thread of control” within process
¢ Like multiple processes within one process, but sharing the
same virtual address space.
§ logical control flow
§ program counter
§ stack
§ shared virtual address space
§ all threads in process use same virtual address space

¢ Lighter-weight than processes

§ faster context switching
§ system can support more threads

Autumn 2013 Wrap-up 10

University of Washington

Thread-level parallelism: Multicore Processors

¢ Two (or more) complete processors, fabricated on the same silicon chip
¢ Execute instructions from two (or more) programs/threads at same time

#1 #2

IBM Power5

Autumn 2013 Wrap-up 11

University of Washington

Multicores are everywhere. (circa 2013)

¢ Laptops, desktops, servers
§ Most any machine from the past few years has at least 2 cores
¢ Game consoles:
§ Xbox 360: 3 PowerPC cores; Xbox One: 8 AMD cores
§ PS3: 9 Cell cores (1 master; 8 special SIMD cores);
PS4: 8 custom AMD x86-64 cores
§ Wii U: 2 Power cores
¢ Smartphones
§ iPhone 4S, 5: dual-core ARM CPUs
§ Galaxy S II, III, IV: dual-core ARM or Snapdragon
§ …

Autumn 2013 Wrap-up 12

University of Washington

Why Multicores Now?

¢ Number of transistors we can put on a chip growing
exponentially…
¢ But performance is no longer growing along with transistor
count.
¢ So let’s use those transistors to add more cores to do more at
once…

Autumn 2013 Wrap-up 13

University of Washington

As programmers, do we care?
¢ What happens if we run this program on a multicore?

void array_add(int A[], int B[], int C[], int length) {

int i;
for (i = 0 ; i < length ; ++i) {
C[i] = A[i] + B[i];
}
}

#1 #2

Autumn 2013 Wrap-up 14

University of Washington

What if we want one program to run on

multiple processors (cores)?
¢ We have to explicitly tell the machine exactly how to do this
§ This is called parallel programming or concurrent programming

¢ There are many parallel/concurrent programming models

§ We will look at a relatively simple one: fork-join parallelism

Autumn 2013 Wrap-up 15

University of Washington

How does this help performance?

¢ Parallel speedup measures improvement from parallelization:

time for best serial version

speedup(p) =
time for version with p processors

¢ What can we realistically expect?

Autumn 2013 Wrap-up 16

University of Washington

Reason #1: Amdahl’s Law

¢ In general, the whole computation is not (easily) parallelizable
¢ Serial regions limit the potential parallel speedup.

Serial regions

Autumn 2013 Wrap-up 17

University of Washington

Reason #1: Amdahl’s Law

¢ Suppose a program takes 1 unit of time to execute serially
¢ A fraction of the program, s, is inherently serial (unparallelizable)

New Execution 1-s

= + s
Time p

¢ For example, consider a program that, when executing on one processor, spends 10% of its
time in a non-parallelizable region. How much faster will this program run on a 3-processor
system?

New Execution .9T

= + .1T = Speedup =
Time 3
¢ What is the maximum speedup from parallelization?

Autumn 2013 Wrap-up 18

University of Washington

Reason #2: Overhead

— Forking and joining is not instantaneous

• Involves communicating between processors
• May involve calls into the operating system
— Depends on the implementation

New Execution 1-s

= + s + overhead(P)
Time P

Autumn 2013 Wrap-up 19

University of Washington

Multicore: what should worry us?

¢ Concurrency: what if we’re sharing resources, memory, etc.?
¢ Cache Coherence
§ What if two cores have the same data in their own caches?
How do we keep those copies in sync?
¢ Memory Consistency, Ordering, Interleaving,
Synchronization…
§ With multiple cores, we can have truly concurrent execution of threads.
In what order do their memory accesses appear to happen?
Do the orders seen by different cores/threads agree?
¢ Concurrency Bugs
§ When it all goes wrong…
§ Hard to reproduce, hard to debug
§ http://cacm.acm.org/magazines/2012/2/145414-you-dont-know-jack-
about-shared-variables-or-memory-models/fulltext
Autumn 2013 Wrap-up 20
University of Washington

Summary
¢ Multicore: more than one processor on the same chip.
§ Almost all devices now have multicore processors
§ Results from Moore’s law and power constraint

¢ Exploiting multicore requires parallel programming

§ Automatically extracting parallelism too hard for compiler, in general.
§ But, can have compiler do much of the bookkeeping for us

¢ Fork-Join model of parallelism

§ At parallel region, fork a bunch of threads, do the work in parallel, and then join,
continuing with just one thread
§ Expect a speedup of less than P on P processors
§ Amdahl’s Law: speedup limited by serial portion of program
§ Overhead: forking and joining are not free

¢ Take 332, 352, 451 to learn more!

Autumn 2013 Wrap-up 21

Basic Starter Kit For Arduino Uno (CH340)
No ratings yet
Basic Starter Kit For Arduino Uno (CH340)
103 pages
Pipelining vs. Parallel Processing
No ratings yet
Pipelining vs. Parallel Processing
23 pages
CS-3006_2_PDC_Overview_compressed
No ratings yet
CS-3006_2_PDC_Overview_compressed
107 pages
omp_hands_on
No ratings yet
omp_hands_on
200 pages
08 Systems Programming-Concurrent Programming
No ratings yet
08 Systems Programming-Concurrent Programming
61 pages
Module 1: PARALLEL AND DISTRIBUTED COMPUTING
No ratings yet
Module 1: PARALLEL AND DISTRIBUTED COMPUTING
65 pages
DigitalLogic ComputerOrganization L23 Multicore Handout
No ratings yet
DigitalLogic ComputerOrganization L23 Multicore Handout
32 pages
Dijkstra's Algorithm Overview: Mergesort Example: Merge As We Return From Recursive Calls
No ratings yet
Dijkstra's Algorithm Overview: Mergesort Example: Merge As We Return From Recursive Calls
4 pages
multicore02-2
No ratings yet
multicore02-2
18 pages
CSE524sp10-01
No ratings yet
CSE524sp10-01
62 pages
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
No ratings yet
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
22 pages
Unit1 RMD PDF
No ratings yet
Unit1 RMD PDF
27 pages
Intro To OpenMP Mattson Customized
No ratings yet
Intro To OpenMP Mattson Customized
94 pages
Getting More Out of Processors: Everyone Wants To Compute Faster, But How?
No ratings yet
Getting More Out of Processors: Everyone Wants To Compute Faster, But How?
8 pages
Overview of Parallel Programming in C++ - Pablo Halpern - CppCon 2014
No ratings yet
Overview of Parallel Programming in C++ - Pablo Halpern - CppCon 2014
37 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
2-TypesofParallelism (1)
No ratings yet
2-TypesofParallelism (1)
69 pages
Getting Full Speed With Delphi
No ratings yet
Getting Full Speed With Delphi
25 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
Distributed Computing Seminar
No ratings yet
Distributed Computing Seminar
37 pages
01 Introduction
No ratings yet
01 Introduction
41 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Lec6 - TLP Data Dependence Solutions
No ratings yet
Lec6 - TLP Data Dependence Solutions
20 pages
Threads_on_a_Multi_Core_Processor_1737287536
No ratings yet
Threads_on_a_Multi_Core_Processor_1737287536
9 pages
Lect 02
No ratings yet
Lect 02
51 pages
Hpc_unit-1 Insem Notes
No ratings yet
Hpc_unit-1 Insem Notes
76 pages
Lecture 05
No ratings yet
Lecture 05
73 pages
Multicore Architecture
No ratings yet
Multicore Architecture
159 pages
CS0051 - Module 01
No ratings yet
CS0051 - Module 01
52 pages
001__DDS-IIIT-Jan-10th
No ratings yet
001__DDS-IIIT-Jan-10th
34 pages
Threads
No ratings yet
Threads
12 pages
Arch13 Multiprocessors Afterlecture
No ratings yet
Arch13 Multiprocessors Afterlecture
70 pages
hpc_parallel
No ratings yet
hpc_parallel
122 pages
Lecture1 Introduction PDF
No ratings yet
Lecture1 Introduction PDF
43 pages
Multicore Architecture
No ratings yet
Multicore Architecture
159 pages
Introduction to Paralel Procesing
No ratings yet
Introduction to Paralel Procesing
40 pages
WWII 457th Anti-Aircraft Artillery
No ratings yet
WWII 457th Anti-Aircraft Artillery
229 pages
HPC-Unit-1
No ratings yet
HPC-Unit-1
65 pages
Comp322 s19 Lec01 Slides v1 PDF
No ratings yet
Comp322 s19 Lec01 Slides v1 PDF
17 pages
OS Week 6 Threads
No ratings yet
OS Week 6 Threads
28 pages
CH 4
No ratings yet
CH 4
21 pages
Parallelism
No ratings yet
Parallelism
22 pages
Lecture1 Notes
No ratings yet
Lecture1 Notes
19 pages
Computer Hardware Engineering: IS1200, Spring 2015
No ratings yet
Computer Hardware Engineering: IS1200, Spring 2015
17 pages
Part 1 - Lecture 1 - Introduction Parallel Computing
No ratings yet
Part 1 - Lecture 1 - Introduction Parallel Computing
33 pages
Computer Architecture
No ratings yet
Computer Architecture
29 pages
HPC-Unit-2
No ratings yet
HPC-Unit-2
72 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
Chapter-5 Threads and Concurrancy
No ratings yet
Chapter-5 Threads and Concurrancy
47 pages
pdc2: MODULE2
No ratings yet
pdc2: MODULE2
113 pages
Threads
No ratings yet
Threads
32 pages
2.ParallelArchExec
No ratings yet
2.ParallelArchExec
46 pages
Lec7 - TLP Shared Memory and OpenMP
No ratings yet
Lec7 - TLP Shared Memory and OpenMP
45 pages
Intro Parallel Programming 2015
No ratings yet
Intro Parallel Programming 2015
38 pages
09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
COA UNIT 5 (AutoRecovered)
No ratings yet
COA UNIT 5 (AutoRecovered)
14 pages
Ingle Hreaded Vs Ultithreaded Here Hould E Ocus: S - T - M: W S W F ?
No ratings yet
Ingle Hreaded Vs Ultithreaded Here Hould E Ocus: S - T - M: W S W F ?
11 pages
Group 2 Assignment 1
No ratings yet
Group 2 Assignment 1
10 pages
50 Java Concepts Every Developer Should Know
From Everand
50 Java Concepts Every Developer Should Know
Hernando Abella
No ratings yet
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Chapter7 MultipleResources
No ratings yet
Chapter7 MultipleResources
48 pages
11 Memallocation
No ratings yet
11 Memallocation
77 pages
01 Intro
No ratings yet
01 Intro
27 pages
03 Integersfloats
No ratings yet
03 Integersfloats
99 pages
3 Tobias Grosser 2017 Day2
No ratings yet
3 Tobias Grosser 2017 Day2
122 pages
3 Tobias Grosser 2017 Day1
No ratings yet
3 Tobias Grosser 2017 Day1
136 pages
2 Hal Finkel LLVM 2017
No ratings yet
2 Hal Finkel LLVM 2017
134 pages
01 Popov AWhirlwindTour oftheLLVMOptimizer
No ratings yet
01 Popov AWhirlwindTour oftheLLVMOptimizer
109 pages
Fundamental of Computer by NCTI Institut
No ratings yet
Fundamental of Computer by NCTI Institut
26 pages
Install Webmin On Ubuntu
No ratings yet
Install Webmin On Ubuntu
2 pages
GAMECAST A Cross Media Game and Entertai (Copy)
No ratings yet
GAMECAST A Cross Media Game and Entertai (Copy)
5 pages
Simulator Training Brochure PDF
No ratings yet
Simulator Training Brochure PDF
4 pages
Fire Alarm System
No ratings yet
Fire Alarm System
22 pages
Irr I Express 1 ST Project Tutorial
No ratings yet
Irr I Express 1 ST Project Tutorial
31 pages
Powerful PDF Creator: Free PDF Converter Commercial PDF Converter
No ratings yet
Powerful PDF Creator: Free PDF Converter Commercial PDF Converter
4 pages
CIE IGCSE Computer Science: 1.1 Number Systems
No ratings yet
CIE IGCSE Computer Science: 1.1 Number Systems
16 pages
Automated Vehicle License Plate Detection System Using Image Processing Algorithms PDF
No ratings yet
Automated Vehicle License Plate Detection System Using Image Processing Algorithms PDF
5 pages
Assignment Formatting Instructions
No ratings yet
Assignment Formatting Instructions
20 pages
Healthy Food Suggestions Based On Blood Parameters Web Application
No ratings yet
Healthy Food Suggestions Based On Blood Parameters Web Application
23 pages
Operating Systems-Interrupt Handling
No ratings yet
Operating Systems-Interrupt Handling
9 pages
DVT Unit3 Own
No ratings yet
DVT Unit3 Own
20 pages
722 9 5 2011 Review
No ratings yet
722 9 5 2011 Review
101 pages
Bab 6 Komputer 2018
No ratings yet
Bab 6 Komputer 2018
20 pages
Multimedia QB-I
No ratings yet
Multimedia QB-I
14 pages
TwinmotionTutorial A55453929dcae993
No ratings yet
TwinmotionTutorial A55453929dcae993
26 pages
CERTIFICATE IN COMP APPLICATION PACKAGES-THEORY QUESTIONS-UNIT 1 TO 8
No ratings yet
CERTIFICATE IN COMP APPLICATION PACKAGES-THEORY QUESTIONS-UNIT 1 TO 8
8 pages
User Manual For Trainer (TR)
No ratings yet
User Manual For Trainer (TR)
64 pages
PG 4thsem Geoinformatics Image Classification Process by Dr. Bharati Gogoi
No ratings yet
PG 4thsem Geoinformatics Image Classification Process by Dr. Bharati Gogoi
19 pages
N1250Z-H2 V0.1 (J6426) Nano PC Specs
No ratings yet
N1250Z-H2 V0.1 (J6426) Nano PC Specs
2 pages
GV500 Manage Tool User Guide R1.07 - Unlocked
No ratings yet
GV500 Manage Tool User Guide R1.07 - Unlocked
23 pages
Log
No ratings yet
Log
3 pages
Chapter 1 3
No ratings yet
Chapter 1 3
18 pages
Analysis of Hidden Data in the NTFS File System 3452 wersd Analysis of Hidden Data in the NTFS File System Analysis of Hidden Data in the NTFS File System Analysis of Hidden Data in the NTFS File System
No ratings yet
Analysis of Hidden Data in the NTFS File System 3452 wersd Analysis of Hidden Data in the NTFS File System Analysis of Hidden Data in the NTFS File System Analysis of Hidden Data in the NTFS File System
23 pages
Wago To-Pass 761 761-xxx: Manual
No ratings yet
Wago To-Pass 761 761-xxx: Manual
24 pages
ds_d3231-s
No ratings yet
ds_d3231-s
4 pages
Using Avaya Agent For Desktop: Release 2.0.0 Issue 1.1 October 2019
No ratings yet
Using Avaya Agent For Desktop: Release 2.0.0 Issue 1.1 October 2019
56 pages
computer graphics
No ratings yet
computer graphics
24 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

13 Wrapup

Uploaded by

13 Wrapup

Uploaded by

University of Washington

thanks to Dan Grossman for the succinct definitions

What is parallel processing?

When can we execute things in parallel?

Autumn 2013 Wrap-up 1

What is parallel processing?

¢ Briefly introduction to key ideas of parallel processing

Autumn 2013 Wrap-up 2

¢ Of the computing problems for which performance is important, many

¢ search engine queries

Autumn 2013 Wrap-up 3

When should we reorder instructions?

add %r2 <- %r3, %r4

Take 352 to learn more.

void array_add(int A[], int B[], int C[], int length) {

Operating on one element at a time

Autumn 2013 Wrap-up 5

void array_add(int A[], int B[], int C[], int length) {

Operating on one element at a time

Autumn 2013 Wrap-up 6

Data Parallelism with SIMD

void array_add(int A[], int B[], int C[], int length) {

Autumn 2013 Wrap-up 7

Is it always that easy?

unsigned sum_array(unsigned *array, int length) {

¢ Is there parallelism here?

Autumn 2013 Wrap-up 8

Restructure the code for SIMD…

What are threads?

¢ Lighter-weight than processes

Autumn 2013 Wrap-up 10

Thread-level parallelism: Multicore Processors

Autumn 2013 Wrap-up 11

Multicores are everywhere. (circa 2013)

Autumn 2013 Wrap-up 12

Why Multicores Now?

Autumn 2013 Wrap-up 13

void array_add(int A[], int B[], int C[], int length) {

Autumn 2013 Wrap-up 14

What if we want one program to run on

¢ There are many parallel/concurrent programming models

Autumn 2013 Wrap-up 15

How does this help performance?

¢ Parallel speedup measures improvement from parallelization:

time for best serial version

¢ What can we realistically expect?

Autumn 2013 Wrap-up 16

Reason #1: Amdahl’s Law

Autumn 2013 Wrap-up 17

Reason #1: Amdahl’s Law

New Execution 1-s

New Execution .9T

Autumn 2013 Wrap-up 18

Reason #2: Overhead

— Forking and joining is not instantaneous

New Execution 1-s

Autumn 2013 Wrap-up 19

Multicore: what should worry us?

¢ Exploiting multicore requires parallel programming

¢ Fork-Join model of parallelism

¢ Take 332, 352, 451 to learn more!

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.