0% found this document useful (0 votes)
32 views

Week 7

This document provides an outline for a lecture on parallelizing applications and performance aspects. It discusses key topics like automatic vs manual parallelization, understanding the problem and program, partitioning, communications, synchronization, data dependencies, load balancing, granularity, and limits of parallel programming. It also defines parallel computing terminology, describes Amdahl's law on speedup limits, and discusses other performance metrics.

Uploaded by

hussmalik69
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Week 7

This document provides an outline for a lecture on parallelizing applications and performance aspects. It discusses key topics like automatic vs manual parallelization, understanding the problem and program, partitioning, communications, synchronization, data dependencies, load balancing, granularity, and limits of parallel programming. It also defines parallel computing terminology, describes Amdahl's law on speedup limits, and discusses other performance metrics.

Uploaded by

hussmalik69
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Parallelizing Applications and Performance Aspects

(CS 526)

Muhammad Awais,

Department of Computer Science,


The University of Lahore,
Lecture Outline
• Designing Parallel Programs
– Automatic vs. Manual Parallelization
– Understand the Problem and the Program
– Partitioning the Problem
– Communications
– Synchronization
– Data Dependencies
– Load Balancing
– Granularity
– Limits and Costs of Parallel Programming
• Performance Analysis
What is Parallel Computing? (1)
• Traditionally, software has been written for serial
computation:
– To be run on a single computer having a single Central
Processing Unit (CPU);
– A problem is broken into a discrete series of
instructions.
– Instructions are executed one after another.
– Only one instruction may execute at any moment in
time.
What is Parallel Computing? (2)
• In the simplest sense, parallel computing is the simultaneous use of
multiple compute resources to solve a computational problem
– To be run using multiple CPUs
– A problem is broken into discrete parts that can be solved
concurrently
– Each part is further broken down to a series of instructions
• Instructions from each part execute simultaneously on different
CPUs
Some General Parallel Terminologies
• Task
– A logically discrete section of computational work.
– A task is typically a program or program-like set of
instructions that is executed by a processor.

• Parallel Task
– A task that can be executed by multiple processors safely
(producing correct results)

• Serial Execution
– Execution of a program sequentially, one statement at a
time.
– In the simplest sense, this is what happens on a one
processor machine.
Some General Parallel Terminologies
• Parallel Execution
– Execution of a program by more than one task (threads)
– Each task being able to execute the same or different
statement at the same moment in time.

• Shared Memory
– where all processors have direct (usually bus based) access
to common physical memory
– In a programming sense, it describes a model where parallel
tasks all have the same "picture" of memory

• Distributed Memory
– Network based memory access for physical memory that is
not common.
– Tasks can only logically "see" local machine memory and
must use communications to access memory on other
Some General Parallel Terminologies
• Communications
– Parallel tasks typically need to exchange data. This can be
accomplished: shared memory or over a network,
– However the actual event of data exchange is commonly
referred to as communications (regardless of the method
employed).

• Synchronization
– The coordination of parallel tasks in real time, very often
associated with communications
– Often implemented by establishing a synchronization point
within an application where a task may not proceed further
until another task(s) reaches the same or logically
equivalent point.
Some General Parallel Terminologies
• Granularity
– In parallel computing, granularity is a measure of the ratio
of computation to communication.
– Coarse: relatively large amount of computational work are
done between communication events
– Fine: relatively small amounts of computational work are
done between communication events

• Observed Speedup:
– Observed speedup of a code which has been parallelized
wall-clock time of serial execution
wall-clock time of parallel execution

– One of the simplest and most widely used indicators for a


parallel program's performance.
• Automatic vs. Manual Parallelization
• Understand the Problem and the Program
• Partitioning
• Communications
• Synchronization
• Data Dependencies
• Load Balancing
• Granularity
• Limits and Costs of Parallel Programming
Definitions:Granularity
• Computation / Communication Ratio:
– In parallel computing, granularity is a qualitative
measure of the ratio of computation to
communication
– Periods of computation are typically separated from
periods of communication by synchronization events.

1. Fine-grain parallelism
2. Coarse-grain parallelism
Fine-grain Parallelism
• Relatively small amounts of computational work are done between
communication events
• Low computation to communication ratio
• Implies high communication overhead and less opportunity for
performance enhancement
• If granularity is too fine it is possible that the overhead required for
communications and synchronization between tasks takes longer
than the computation.
Coarse-grain Parallelism
• Relatively large amounts of computational
work are done between
communication/synchronization events

• High computation to communication ratio

• Implies more opportunity for performance


increase

• Harder to load balance efficiently


• Automatic vs. Manual Parallelization
• Understand the Problem and the Program
• Partitioning
• Communications
• Synchronization
• Data Dependencies
• Load Balancing
• Granularity
• Limits and Costs of Parallel Programming
Amdahl's Law
Amdahl's Law states that potential program
speedup is defined by the fraction of code (P)
that can be parallelized:

1
Max.speedup = --------
1 - P

• If none of the code can be parallelized, P = 0 and the speedup


= 1 (no speedup). If all of the code is parallelized, P = 1 and
the speedup is infinite (in theory).

• If 50% of the code can be parallelized, maximum speedup = 2,


meaning the code will run twice as fast.
Speedup (with N CPUs or Machines)
• Introducing the number of processors performing the
parallel fraction of work, the relationship can be
modelled by:
1
speedup = ------------
fS + fP

Proc

• where fP = parallel fraction,


Proc = number of processors and
fS = serial fraction
Amdahl's Law
• It soon becomes obvious that there are limits to the
scalability of parallelism.

• For example, at P = .50, .90 and .99 (50%, 90% and


99% of the code is parallelizable)

speedup
-
N P = .50 P = .90 P = .99
-
10 1.82 5.26 9.17
100 1.98 9.17 50.25
1000 1.99 9.91 90.99
10000 1.99 9.91 99.02
Amdahl's Law

F = serial fraction
E.g., 1/0.05 (5% serial) = 20 speedup (maximum)
Maximum Speedup (Amdahl's Law)
Maximum speedup is usually p with p processors
(linear speedup).

Possible to get super-linear speedup (greater than p)


but usually a specific reason such as:
• Extra memory in multiprocessor system
• Nondeterministic algorithm
Maximum Speedup (Amdahl's Law)
Speedup?

where ts is execution time on a single processor and tp is


execution time on a multiprocessor.

• S(p) gives increase in speed by using multiprocessor.

• Use best sequential algorithm with single processor


system instead of parallel program run with 1
processor for ts. Underlying algorithm for parallel
implementation might be (and is usually) different.
Speedup Factor?
Speedup factor can also be used in terms of
computational steps:
Speedup Factor

Here f is the part of the code that is serial:

e.g. if f==1 (all the code is serial, then the speedup will be 1
no matter how may processors are used
Speedup
• However, certain problems demonstrate increased performance
by increasing the problem size. For example:
– 2D Grid Calculations 85 seconds 85%
– Serial fraction 15 seconds 15%

• We can increase the problem size by doubling the grid dimensions


and halving the time step. This results in four times the number of
grid points and twice the number of time steps. The timings then
look like:
– 2D Grid Calculations 680 seconds 97.84%
– Serial fraction 15 seconds 2.16%

• Problems that increase the percentage of parallel time with their


size are more scalable, than problems with a fixed percentage of
parallel time.
Why use parallel processing?
▪ Save time: wall clock time
▪ Solve larger problems: increased extensibility
and configurability
▪ Possible better fault tolerance: Advantage of
non-local resources
▪ Cost savings
▪ Overcoming memory constraints
▪ Scientific interest
Other Metrics for Performance Evaluation
Some General Parallel Terminologies
• Parallel Overhead
– Amount of time required to coordinate parallel tasks, as
opposed to doing useful work. Parallel overhead can
include factors such as:
• Task start-up time
• Synchronisations
• Data communications
• Software overhead imposed by parallel compilers, libraries,
tools, operating system, etc.
• Task termination time

• Massively Parallel
– Refers to the hardware that comprises a given parallel
system - having many processors (over 100’s of processors)
Some General Parallel Terminologies
• Scalability
– Refers to a parallel system's (hardware and/or software)
ability to demonstrate a proportionate increase in
parallel speedup with the addition of more processors.

– Factors that contribute to scalability include:


• Hardware: particularly Memory-CPU bandwidths and
network communications
• Application algorithm
• Parallel overhead related
• Characteristics of your specific application and coding

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy