0% found this document useful (0 votes)

32 views

Week 7

This document provides an outline for a lecture on parallelizing applications and performance aspects. It discusses key topics like automatic vs manual parallelization, understanding the problem and program, partitioning, communications, synchronization, data dependencies, load balancing, granularity, and limits of parallel programming. It also defines parallel computing terminology, describes Amdahl's law on speedup limits, and discusses other performance metrics.

Uploaded by

hussmalik69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Week 7

Uploaded by

hussmalik69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Parallelizing Applications and Performance Aspects

(CS 526)

Muhammad Awais,

Department of Computer Science,

The University of Lahore,
Lecture Outline
• Designing Parallel Programs
– Automatic vs. Manual Parallelization
– Understand the Problem and the Program
– Partitioning the Problem
– Communications
– Synchronization
– Data Dependencies
– Load Balancing
– Granularity
– Limits and Costs of Parallel Programming
• Performance Analysis
What is Parallel Computing? (1)
• Traditionally, software has been written for serial
computation:
– To be run on a single computer having a single Central
Processing Unit (CPU);
– A problem is broken into a discrete series of
instructions.
– Instructions are executed one after another.
– Only one instruction may execute at any moment in
time.
What is Parallel Computing? (2)
• In the simplest sense, parallel computing is the simultaneous use of
multiple compute resources to solve a computational problem
– To be run using multiple CPUs
– A problem is broken into discrete parts that can be solved
concurrently
– Each part is further broken down to a series of instructions
• Instructions from each part execute simultaneously on different
CPUs
Some General Parallel Terminologies
• Task
– A logically discrete section of computational work.
– A task is typically a program or program-like set of
instructions that is executed by a processor.

• Parallel Task
– A task that can be executed by multiple processors safely
(producing correct results)

• Serial Execution
– Execution of a program sequentially, one statement at a
time.
– In the simplest sense, this is what happens on a one
processor machine.
Some General Parallel Terminologies
• Parallel Execution
– Execution of a program by more than one task (threads)
– Each task being able to execute the same or different
statement at the same moment in time.

• Shared Memory
– where all processors have direct (usually bus based) access
to common physical memory
– In a programming sense, it describes a model where parallel
tasks all have the same "picture" of memory

• Distributed Memory
– Network based memory access for physical memory that is
not common.
– Tasks can only logically "see" local machine memory and
must use communications to access memory on other
Some General Parallel Terminologies
• Communications
– Parallel tasks typically need to exchange data. This can be
accomplished: shared memory or over a network,
– However the actual event of data exchange is commonly
referred to as communications (regardless of the method
employed).

• Synchronization
– The coordination of parallel tasks in real time, very often
associated with communications
– Often implemented by establishing a synchronization point
within an application where a task may not proceed further
until another task(s) reaches the same or logically
equivalent point.
Some General Parallel Terminologies
• Granularity
– In parallel computing, granularity is a measure of the ratio
of computation to communication.
– Coarse: relatively large amount of computational work are
done between communication events
– Fine: relatively small amounts of computational work are
done between communication events

• Observed Speedup:
– Observed speedup of a code which has been parallelized
wall-clock time of serial execution
wall-clock time of parallel execution

– One of the simplest and most widely used indicators for a

parallel program's performance.
• Automatic vs. Manual Parallelization
• Understand the Problem and the Program
• Partitioning
• Communications
• Synchronization
• Data Dependencies
• Load Balancing
• Granularity
• Limits and Costs of Parallel Programming
Definitions:Granularity
• Computation / Communication Ratio:
– In parallel computing, granularity is a qualitative
measure of the ratio of computation to
communication
– Periods of computation are typically separated from
periods of communication by synchronization events.

1. Fine-grain parallelism
2. Coarse-grain parallelism
Fine-grain Parallelism
• Relatively small amounts of computational work are done between
communication events
• Low computation to communication ratio
• Implies high communication overhead and less opportunity for
performance enhancement
• If granularity is too fine it is possible that the overhead required for
communications and synchronization between tasks takes longer
than the computation.
Coarse-grain Parallelism
• Relatively large amounts of computational
work are done between
communication/synchronization events

• High computation to communication ratio

• Implies more opportunity for performance

increase

• Harder to load balance efficiently

• Automatic vs. Manual Parallelization
• Understand the Problem and the Program
• Partitioning
• Communications
• Synchronization
• Data Dependencies
• Load Balancing
• Granularity
• Limits and Costs of Parallel Programming
Amdahl's Law
Amdahl's Law states that potential program
speedup is defined by the fraction of code (P)
that can be parallelized:

1
Max.speedup = --------
1 - P

• If none of the code can be parallelized, P = 0 and the speedup

= 1 (no speedup). If all of the code is parallelized, P = 1 and
the speedup is infinite (in theory).

• If 50% of the code can be parallelized, maximum speedup = 2,

meaning the code will run twice as fast.
Speedup (with N CPUs or Machines)
• Introducing the number of processors performing the
parallel fraction of work, the relationship can be
modelled by:
1
speedup = ------------
fS + fP

Proc

• where fP = parallel fraction,

Proc = number of processors and
fS = serial fraction
Amdahl's Law
• It soon becomes obvious that there are limits to the
scalability of parallelism.

• For example, at P = .50, .90 and .99 (50%, 90% and

99% of the code is parallelizable)

speedup
-
N P = .50 P = .90 P = .99
-
10 1.82 5.26 9.17
100 1.98 9.17 50.25
1000 1.99 9.91 90.99
10000 1.99 9.91 99.02
Amdahl's Law

F = serial fraction
E.g., 1/0.05 (5% serial) = 20 speedup (maximum)
Maximum Speedup (Amdahl's Law)
Maximum speedup is usually p with p processors
(linear speedup).

Possible to get super-linear speedup (greater than p)

but usually a specific reason such as:
• Extra memory in multiprocessor system
• Nondeterministic algorithm
Maximum Speedup (Amdahl's Law)
Speedup?

where ts is execution time on a single processor and tp is

execution time on a multiprocessor.

• S(p) gives increase in speed by using multiprocessor.

• Use best sequential algorithm with single processor

system instead of parallel program run with 1
processor for ts. Underlying algorithm for parallel
implementation might be (and is usually) different.
Speedup Factor?
Speedup factor can also be used in terms of
computational steps:
Speedup Factor

Here f is the part of the code that is serial:

e.g. if f==1 (all the code is serial, then the speedup will be 1
no matter how may processors are used
Speedup
• However, certain problems demonstrate increased performance
by increasing the problem size. For example:
– 2D Grid Calculations 85 seconds 85%
– Serial fraction 15 seconds 15%

• We can increase the problem size by doubling the grid dimensions

and halving the time step. This results in four times the number of
grid points and twice the number of time steps. The timings then
look like:
– 2D Grid Calculations 680 seconds 97.84%
– Serial fraction 15 seconds 2.16%

• Problems that increase the percentage of parallel time with their

size are more scalable, than problems with a fixed percentage of
parallel time.
Why use parallel processing?
▪ Save time: wall clock time
▪ Solve larger problems: increased extensibility
and configurability
▪ Possible better fault tolerance: Advantage of
non-local resources
▪ Cost savings
▪ Overcoming memory constraints
▪ Scientific interest
Other Metrics for Performance Evaluation
Some General Parallel Terminologies
• Parallel Overhead
– Amount of time required to coordinate parallel tasks, as
opposed to doing useful work. Parallel overhead can
include factors such as:
• Task start-up time
• Synchronisations
• Data communications
• Software overhead imposed by parallel compilers, libraries,
tools, operating system, etc.
• Task termination time

• Massively Parallel
– Refers to the hardware that comprises a given parallel
system - having many processors (over 100’s of processors)
Some General Parallel Terminologies
• Scalability
– Refers to a parallel system's (hardware and/or software)
ability to demonstrate a proportionate increase in
parallel speedup with the addition of more processors.

– Factors that contribute to scalability include:

• Hardware: particularly Memory-CPU bandwidths and
network communications
• Application algorithm
• Parallel overhead related
• Characteristics of your specific application and coding

Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
Determination of Subcatchment and Watershed Boundaries in A Complex and Highly Urbanized Landscape
No ratings yet
Determination of Subcatchment and Watershed Boundaries in A Complex and Highly Urbanized Landscape
34 pages
Week_7 (1)
No ratings yet
Week_7 (1)
27 pages
CS526 3 Design of Parallel Programs
No ratings yet
CS526 3 Design of Parallel Programs
83 pages
E- Notes -HPC-Unit 3-1
No ratings yet
E- Notes -HPC-Unit 3-1
26 pages
PDC ch#5
No ratings yet
PDC ch#5
12 pages
Introduction To Parallel Computing-Dr Nousheen
No ratings yet
Introduction To Parallel Computing-Dr Nousheen
43 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Screenshot 2024-12-05 at 2.01.32 PM
No ratings yet
Screenshot 2024-12-05 at 2.01.32 PM
49 pages
Lecture Week - 3 Amdahl Law 1
No ratings yet
Lecture Week - 3 Amdahl Law 1
19 pages
HPC 4th Unit - 240504 - 160030
No ratings yet
HPC 4th Unit - 240504 - 160030
19 pages
Untitled document (3)
No ratings yet
Untitled document (3)
63 pages
Untitled document (2)
No ratings yet
Untitled document (2)
39 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Lect 02
No ratings yet
Lect 02
51 pages
Parallel Computing
No ratings yet
Parallel Computing
24 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Unit 2 - 2.1(Parallel Approaches) (1)
No ratings yet
Unit 2 - 2.1(Parallel Approaches) (1)
11 pages
BDS-Session-2
No ratings yet
BDS-Session-2
58 pages
Zindagi Zama Da
No ratings yet
Zindagi Zama Da
21 pages
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
No ratings yet
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
58 pages
Lectures5 14
No ratings yet
Lectures5 14
85 pages
Chapter 2: Program and Network Properties
No ratings yet
Chapter 2: Program and Network Properties
94 pages
hpc_parallel
No ratings yet
hpc_parallel
122 pages
PDC Last Min Notes For MCQS - Theory
No ratings yet
PDC Last Min Notes For MCQS - Theory
39 pages
PDC UNIT-2
No ratings yet
PDC UNIT-2
48 pages
4-DesigningParallelPrograms
No ratings yet
4-DesigningParallelPrograms
69 pages
High Performance Computing (HPC) - Lec2
No ratings yet
High Performance Computing (HPC) - Lec2
53 pages
Course Outcome 1:: 15Cs4180 - Parallel Computing
No ratings yet
Course Outcome 1:: 15Cs4180 - Parallel Computing
23 pages
5 - Designing Parallel Programs
No ratings yet
5 - Designing Parallel Programs
52 pages
Lecture 2
No ratings yet
Lecture 2
32 pages
Introduction To Parallel Programming: Linda Woodard CAC 19 May 2010
100% (1)
Introduction To Parallel Programming: Linda Woodard CAC 19 May 2010
38 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
36 pages
Performance Metrices
100% (1)
Performance Metrices
18 pages
HPC Unit2 Part1
No ratings yet
HPC Unit2 Part1
44 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
Parallel Computing Seminar Report
100% (3)
Parallel Computing Seminar Report
35 pages
Pepper Presentation
No ratings yet
Pepper Presentation
38 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
ch2 PC
No ratings yet
ch2 PC
44 pages
Pca Chapter 2 Program & Network Properties
No ratings yet
Pca Chapter 2 Program & Network Properties
71 pages
Unit 4
No ratings yet
Unit 4
64 pages
CS621 Week 14 - Complete
No ratings yet
CS621 Week 14 - Complete
69 pages
Module 1
No ratings yet
Module 1
14 pages
OOAD
No ratings yet
OOAD
67 pages
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
No ratings yet
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
13 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
Module 1 Chapter2
No ratings yet
Module 1 Chapter2
100 pages
001__DDS-IIIT-Jan-10th
No ratings yet
001__DDS-IIIT-Jan-10th
34 pages
Parallel Algorithm Analysis
No ratings yet
Parallel Algorithm Analysis
11 pages
Distributedcomp
No ratings yet
Distributedcomp
13 pages
RS_PDS-OE 3010
No ratings yet
RS_PDS-OE 3010
8 pages
PDS Merged
No ratings yet
PDS Merged
182 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Cours 2
No ratings yet
Cours 2
25 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
PDC Summers Finals Revision Notes
No ratings yet
PDC Summers Finals Revision Notes
50 pages
Introduction to Paralel Procesing
No ratings yet
Introduction to Paralel Procesing
40 pages
Mastering Asynchronous C++: Modern Techniques for High-Performance Concurrent Programming
From Everand
Mastering Asynchronous C++: Modern Techniques for High-Performance Concurrent Programming
Aarav Joshi
No ratings yet
The Complete Future Trait Guide
From Everand
The Complete Future Trait Guide
Hamze Ghalebi
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Lesson 2 The Concept of Logic Circuits
No ratings yet
Lesson 2 The Concept of Logic Circuits
23 pages
Full Game Theory and Public Policy Second Edition Roger A. Mccain PDF All Chapters
100% (5)
Full Game Theory and Public Policy Second Edition Roger A. Mccain PDF All Chapters
62 pages
Maths Activity Class12 (2022-23)
No ratings yet
Maths Activity Class12 (2022-23)
6 pages
Download Complete (Ebook) Music Through Fourier Space: Discrete Fourier Transform in Music Theory by Emmanuel Amiot (auth.) ISBN 9783319455808, 9783319455815, 331945580X, 3319455818 PDF for All Chapters
100% (12)
Download Complete (Ebook) Music Through Fourier Space: Discrete Fourier Transform in Music Theory by Emmanuel Amiot (auth.) ISBN 9783319455808, 9783319455815, 331945580X, 3319455818 PDF for All Chapters
65 pages
Types of Ethernet Topology
No ratings yet
Types of Ethernet Topology
2 pages
Minggu 2 - 3 - Equivalence For Repeated Cash Flows
No ratings yet
Minggu 2 - 3 - Equivalence For Repeated Cash Flows
55 pages
HADM-GCT Semi
No ratings yet
HADM-GCT Semi
24 pages
Data Manipulation Instructions
100% (1)
Data Manipulation Instructions
77 pages
Lse Ppa M4u3 Notes
No ratings yet
Lse Ppa M4u3 Notes
15 pages
Esp
No ratings yet
Esp
4 pages
EC8701 Antennas and Microwave Engineering PDF
0% (1)
EC8701 Antennas and Microwave Engineering PDF
42 pages
Mobile Detector: Text & Image by Aditya Ramane
No ratings yet
Mobile Detector: Text & Image by Aditya Ramane
4 pages
Machine Learning in RNA Structure Prediction - Advances and Challenges
No ratings yet
Machine Learning in RNA Structure Prediction - Advances and Challenges
11 pages
Casio PB-700 - Data Formats
No ratings yet
Casio PB-700 - Data Formats
4 pages
Physics 2 Module 1 2021
No ratings yet
Physics 2 Module 1 2021
160 pages
Fabric Manufacturing - I Unit 4
No ratings yet
Fabric Manufacturing - I Unit 4
57 pages
Project List & KPI - Ahmad Dzikrul Fikri
No ratings yet
Project List & KPI - Ahmad Dzikrul Fikri
3 pages
Group - C - Cosmetics Store
No ratings yet
Group - C - Cosmetics Store
15 pages
MINI PROJECT REPORT (Centrifugal Pump)
No ratings yet
MINI PROJECT REPORT (Centrifugal Pump)
54 pages
Massive MIMO NR
No ratings yet
Massive MIMO NR
25 pages
Full Hall Effect EL Report
No ratings yet
Full Hall Effect EL Report
18 pages
Wireless Energy Meter Reading Using RF Technology
100% (1)
Wireless Energy Meter Reading Using RF Technology
26 pages
Overview of MCS-51 Family of Microcontrollers and Memory Organization
No ratings yet
Overview of MCS-51 Family of Microcontrollers and Memory Organization
21 pages
Addition and Subtraction Kids
100% (1)
Addition and Subtraction Kids
30 pages
5 Solving Simultaneous Equations Graphically
No ratings yet
5 Solving Simultaneous Equations Graphically
4 pages
AHP Catalogue Page PDF2392020134654
No ratings yet
AHP Catalogue Page PDF2392020134654
1 page
A Novel Method of Power Flow Analysis With UPFC Considering Limit Violations of Variable
No ratings yet
A Novel Method of Power Flow Analysis With UPFC Considering Limit Violations of Variable
5 pages
Type of Computers
No ratings yet
Type of Computers
5 pages
Effect of Hole Reinforcement On The Buckling Behaviour of Thin-Walled Beams Subjected To Combined Loading
No ratings yet
Effect of Hole Reinforcement On The Buckling Behaviour of Thin-Walled Beams Subjected To Combined Loading
11 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Week 7

Uploaded by

Week 7

Uploaded by

Parallelizing Applications and Performance Aspects

Department of Computer Science,

– One of the simplest and most widely used indicators for a

• High computation to communication ratio

• Implies more opportunity for performance

• Harder to load balance efficiently

• If none of the code can be parallelized, P = 0 and the speedup

• If 50% of the code can be parallelized, maximum speedup = 2,

• where fP = parallel fraction,

• For example, at P = .50, .90 and .99 (50%, 90% and

Possible to get super-linear speedup (greater than p)

where ts is execution time on a single processor and tp is

• S(p) gives increase in speed by using multiprocessor.

• Use best sequential algorithm with single processor

Here f is the part of the code that is serial:

• We can increase the problem size by doubling the grid dimensions

• Problems that increase the percentage of parallel time with their

– Factors that contribute to scalability include:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.