0% found this document useful (0 votes)

3 views35 pages

PDC Lecture 02

The document discusses modern classifications of parallelism in computer architectures, including data, function, and control parallelism, which help in understanding advanced architecture designs. It also covers performance metrics such as throughput, latency, scalability, and speedup, along with the importance of efficiency and load balancing in parallel and distributed systems. Additionally, it explains parallel computing and programming concepts, computation graphs, scheduling algorithms, and the principles of speedup and scalability in relation to processor utilization.

Uploaded by

arhamkhan4241

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views35 pages

PDC Lecture 02

Uploaded by

arhamkhan4241

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

CS-402 Parallel and Distributed Systems

Fall 2024

Lecture No. 02
Modern classification (Sima, Fountain, Kacsuk)

 The modern classification proposed by Sima, Fountain, and Kacsuk focuses on how
parallelism is achieved in computer architectures. Here are the key categories:

 Data Parallelism: In this approach, the same function operates on many data elements simultaneously. It
emphasizes parallelism at the data level.
 Function Parallelism: Multiple functions are performed in parallel. This category emphasizes parallelism at
the functional level.
 Control Parallelism: It involves task parallelism, where different tasks are executed concurrently based on
control flow.

These classifications help us understand the design spaces of advanced architectures.

Modern classification (Sima, Fountain, Kacsuk)

 Based on how parallelism is achieved

o Data parallelism: same function operating on many data
o Function parallelism: performing many functions in parallel
 Control parallelism, task parallelism depending on the level of the functional
parallelism.

Parallel architectures

Data-parallel Function-parallel
architectures architectures
Functional-parallel architectures

Function-parallel
architectures

Instruction level Thread level Process level

Parallel Arch Parallel Arch Parallel Arch
(ILPs) (MIMDs)

Pipelined VLIWs Superscalar Shared

Distributed
processors processors Memory
Memory MIMD
MIMD
Modern classification (Sima, Fountain, Kacsuk)

• BY OPERATING ON MULTIPLE DATA: DATA PARALLELISM

• BY PERFORMING MANY FUNCTIONS IN PARALLEL: FUNCTION
PARALLELISM
• CONTROL PARALLELISM, TASK PARALLELISM DEPENDING ON THE
LEVEL OF THE FUNCTIONAL PARALLELISM.

Parallel architectures

Data-parallel Function-parallel
architectures architectures
Performance
In the context of parallel and distributed systems, performance refers to how efficiently a system
or application accomplishes its tasks. Let’s delve into the details:

1. Throughput: This measures the rate at which a system processes tasks or transactions. It
indicates how many tasks can be completed per unit of time. High throughput is desirable for
systems handling large workloads.
2. Latency: Latency is the time delay between initiating an action and receiving a response. In
parallel and distributed systems, minimizing latency is crucial for responsiveness. Examples
include network latency (time for data to travel between nodes) and memory access latency.
3. Scalability: Scalability assesses how well a system can handle increased load. It can be vertical
(adding more resources to a single node) or horizontal (adding more nodes). Good scalability
ensures consistent performance as the workload grows.
4. Speedup: Speedup measures the improvement achieved by parallelizing a task. It’s the ratio of
the execution time on a single processor (sequential) to the execution time on multiple
processors (parallel). Ideally, speedup should be close to the number of processors used.
Performance
Efficiency: Efficiency quantifies how effectively resources are utilized in a parallel system. It’s
calculated as the speedup divided by the number of processors. High efficiency means minimal
resource wastage.
Load Balancing: In distributed systems, load balancing ensures that tasks are evenly distributed
among nodes. Balanced loads prevent bottlenecks and maximize system performance.
Overhead: Overhead refers to additional computational costs incurred due to parallelization.
Examples include communication overhead (data exchange between nodes) and synchronization
overhead (ensuring consistency).
Remember that achieving optimal performance involves trade-offs, considering factors like
communication costs, synchronization, and hardware limitations.
Performance

 Time and performance: Machine A is n times faster than Machine B if and

( )
only if =
( )

 Program execution time = × ×

(CPI) (cycle time)

 × can be approximated as instruction per second (IPS).
o In a system with variable instruction cycles, IPS is program dependent – this metric
can be misleading, and not very useful.
Performance

 MIPS (millions instructions per second) – vendors sometimes report this

value as an indication of the computing speed. It is not a good
performance metric.
 MFLOPS (million floating-point operations per second): This is more
meaningful when it is the measured time for completing a complex task.
FLOPS = FP ops in program/execution time
For CPU (or GPU), vendors use this formula

= × × ×
Performance Evaluation
The primary attributes used to measure the performance of a computer system are as
follows. Cycle time(T): It is the unit of time for all the operations of a computer system. It
is the inverse of clock rate (l/f). The cycle time is represented in n sec.
Cycles Per Instruction(CPI): Different instructions takes different number of cycles for
execution. CPI is measurement of number of cycles per instruction.
Instruction count(Ic ): Number of instruction in a program is called instruction count. If we
assume that all instructions have same number of cycles, then the total execution time of
a program = number of instruction in the program * number of cycle required by one
instruction * time of one cycle.

Hence, execution time T=Ic *CPI*Tsec. Practically the clock frequency of the system is
specified in MHz. Also, the processor speed is measured in terms of million instructions
per sec(MIPS).
FLOPS UNITS
Units Order Comments:
KFLOPS (kiloFLOPS) 10
MFLOPS (megaFLOPS) 10
GFLOPS (gigaFLOPS) 10 Intel i9-9900K CPU at 98.88GFLOPS,
AMD Ryzen 9 3950X at 170.56FLOPS
TFLOPS (teraFLOPS) 10 Nvidia GTX 3090 at 36 TFLOPS
(2002 No. 1 supercomputer NEC Earth
Simulator at 36TFLOPS)
PFLOPS (petaFLOPS) 10
EFLOPS (exaFLOPS) 10 This is the next milestone for
supercomputers (exa-scale computing). We
are almost there: Fugaku at 0.442 EFLOPS

You can find the FLOPS for up-to-date CPUs at

https://setiathome.berkeley.edu/cpu_list.php
Peak and sustained performance

 The FLOPS value reported by vendors is the peak performance that no

program can achieve.
 Sustained FLOPS is the FLOPS rate that a program can achieve over the
entire run.
 Peak FLOPS is usually much larger than sustained FLOPS
o =

 A set of standard benchmarks are there to measure the sustained

performance
o LINPACK for supercomputers
Parallel computing and parallel programming

 Parallel computing: using multiple processing elements in parallel to solve

problems more quickly than with a single processing element.
o Fully utilize the computing power in contemporary computing systems.
 Parallel programming:
o Specification of operations that can be executed in parallel
o A parallel program is decomposed into sequential sub-computations (tasks)
o Parallel programming constructs define task creation, termination, and
interaction.
Parallel programming example

A
o “#pragma omp parallel sections” specifies task creation and termination

o “#pragma omp section” specifies the two tasks in the program.

B C
o Notice that the parallel program runs slower when the array size is small.

o sum_omp.cpp consists of four sub-tasks.

 A: the sequential part before the parallel region D

 B, C: the two parallel sub-tasks

 D: the part after the parallel sub-tasks join back.

Parallel programming example

 Notice that this parallel program is non-trivial

sum T[0]
 The calculation of sum in the loop has
+
dependency inside. T[1]

+
 To overcome that,
o Decompose the problem into two tasks, each
compute partial sums T[size-1]

o Combine the results to get the final answer +

Modeling the execution of a parallel program

 The execution of a parallel program can be represented as a

computation graph (CG) or parallel program dependence graph that
allows for reasoning about the execution.
o Nodes are sequential subtasks
o Edges represent the dependency constraints
 Both control dependency and data dependency are captured.
o A CG is a directed acyclic graph (DAG) since a node cannot depend on itself.

 CG describes a set of computational tasks and the dependencies

between them.
Computation graph example

 The computation graph for sum_omp.cpp A

 Node A must be executed before Node B if there is a B C

path from A to B in the graph
 CG can be used as a visualization technique to help
us understand the complexity of the algorithms. D
 CG can also be used as a data structure for the
compiler or the system to schedule the execution of
the sub-tasks.
Complexity with computation graph
 Let T(N) be the execution time of node N
 Work = ∑ is the total work to be executed in CG
o execution time with a single processor
 Let span(CG) be the longest path in CG when adding the execution time of all nodes in
the path.
o span(CG) is the smallest possible executable for the CG regardless of how many processors are
used!
o Note that CG is a DAG.
( )
 CG’s degree of parallelism is defined as parallelism = . The parallelism of a
( )
computation provides a rough estimate of the maximum number of processors that can
be used efficiently.
o Consider two situations: parallelism = 1 and parallelism = N
Let the time for each node be 1, compute the work, span, and
parallelism for the following two computation graphs

A A

B C D E B C D

F E F

G
G
What are the work, span and parallelism of the
following CG?

A: 1

B: 2 C: 4

D: 1 E: 2 F: 1 G: 2

H: 1

I: 2
Scheduling of a computation graph
A  Assume each node takes 1 time steps.
 A task can be allocated only when all of its
B C predecessors have been executed.
Task scheduling:
D E F G Time step P0 P1
1 A -
2 B C
H I 3 D E
4 H F
5 G -
J 6 I -
7 J -
Scheduling of a computation graph
A
 Another schedule: better than the previous one
B C Task scheduling (2 processors)

Time step P0 P1
D E F G 1 A -
2 B C
H I 3 D E
4 F G
5 H I
J 6 J -
Greedy Scheduling
A greedy scheduling algorithm assigns tasks to processors as soon as they become available, without
waiting for other tasks to complete. This ensures that processors are always busy if there are tasks
ready to be executed.
Example: Task Dependency Graph
Consider a task dependency graph where nodes represent tasks and edges represent dependencies
between tasks. Here’s a simple example:
A
/\
B C
• Task A must be completed before tasks B and C can start. /\/\
D E F
• Tasks B and C must be completed before tasks D, E, and F can start.
Greedy Scheduling Steps
1.Initial State:
• Ready tasks: A
• Processors: P1, P2, P3
2.Step 1:
• Assign task A to P1.
• Ready tasks: B, C (after A completes)
3.Step 2:
• Assign task B to P1 (since P1 is free after completing A).
• Assign task C to P2.
• Ready tasks: D, E, F (after B and C complete)
4.Step 3:
• Assign task D to P1 (since P1 is free after completing B).
• Assign task E to P2 (since P2 is free after completing C).
• Assign task F to P3.
Time Step 1: P1: A P2: - P3: -
Visualization Time Step 2: P1: B P2: C P3: -
Here’s a visual representation of the greedy scheduling process: Time Step 3: P1: D P2: E P3: F
Greedy Scheduling
Key Points
•No Idle Processors: At each step, all available processors are assigned tasks if there are ready tasks.
•Minimized Idle Time: Processors are never idle if there are tasks that can be executed.
•Efficient Utilization: This approach ensures efficient utilization of processor resources.
Conclusion

A greedy schedule ensures that processors are always busy if there are tasks ready to be
executed, leading to efficient use of resources and reduced overall execution time. This is
particularly useful in parallel computing where maximizing processor utilization is crucial
for performance.
Greedy schedule

 A greedy schedule is one that never forces a processor to be idle when one or
more nodes are ready for execution.
 A node is ready for execution if all its predecessors have been executed
 With one processor, let be the time to execute CG.
o With any greedy schedule = ( ).
 With an infinite number of process, let be the time to execute CG with
an infinite number of processors.
o With any greedy schedule, = ( )
Greedy schedule

 With P processors, let be the execution of a schedule for

CG.
 For any greedy schedule:
( )
o ≥
o ≥

o Hence, ≥ max( , )
Greedy schedule

 For any greedy schedule:

o ≤ +
o A step is complete when all P processors are used, incomplete otherwise
o Number of complete steps ≤ ( )/
o Number of incomplete steps ≤ ( )
o Total steps ≤ + ( )

o Graham, R. L. “Bounds on Multiprocessing Timing Anomalies.” SIAM Journal on

Applied Mathematics 17, no. 2 (1969): 416–29.
Greedy schedule

 Combine the results:

max( , ) ≤ ≤ +

 Any greedy scheduler achieves that is within a factor of 2 of

the optimal.
Speedup and Scalability
 Speedup, Scalability, strong scaling, weak scaling
 Amdahl’s law
 Gustafson’s law
Performance expectation

 When using 1 processor, the sequential program runs for 100 seconds.
When we use 10 processors, should the program run for 10 times faster?
 This works only for embarrassingly parallel computations – parallel computations that can
be divided into completely independent computations that can be executed simultaneously.
There may have no interaction between separate processes; sometime the results need to
be collected.
 Embarrassingly parallel applications are the kind that can scale up to a very large
number of processors. Examples: Monte Carlo analysis, numerical integration, 3D
graphics rendering, and many more.
 In other types of applications, the computation components interact and have
dependencies, which prevents the applications from using a large number of processors.
Scalability

 Scalability of a program measures how many processors that the

program can make effective use of.
o For a computation represented by a computation graph, parallelism is a
good indicator of scalability.
Speedup and Strong scaling

 Let be the execution time for a computation to run on 1 processor

and be the execution time for the computation (with the same
input – same problem) to run on P processors.

=
o Factor by which the use of P processors speeds up execution time relative to
1 processor for the same problem.
o Since the problem size is fixed, this is referred to as “Strong scaling”.
o Given a computation graph, what is the highest speedup that can be
achieved?
Speedup

 =

 Typically, 1 ≤ ≤
 The speedup is ideal if =
 Linear speedup: = × for some constant 0 < <
1
Efficiency

 The efficiency of an algorithm using P processors is

Efficiency = speedup(P) / P
o Efficiency estimates how well-utilized the processors are in running the
parallel program.
o Ideal speedup means Efficiency = 1 (100% efficiency).

Perfbook-Eb 2023 06 11a
No ratings yet
Perfbook-Eb 2023 06 11a
1,432 pages
The Parallel Book
No ratings yet
The Parallel Book
646 pages
Parallel Programming for Modern High Performance Computing Systems (Czarnul, Pawel)
No ratings yet
Parallel Programming for Modern High Performance Computing Systems (Czarnul, Pawel)
330 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
1 Introduction
No ratings yet
1 Introduction
30 pages
LP V
No ratings yet
LP V
96 pages
Multithreading Algorithms
No ratings yet
Multithreading Algorithms
36 pages
PC_course_notes_May17
No ratings yet
PC_course_notes_May17
123 pages
PDC Lecture 01
No ratings yet
PDC Lecture 01
36 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
PDS Merged
No ratings yet
PDS Merged
182 pages
Mpi
No ratings yet
Mpi
46 pages
CS5204/EE5364 - Advanced Computer Architecture - Performance
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Performance
56 pages
chapter 1
No ratings yet
chapter 1
25 pages
2nd
No ratings yet
2nd
19 pages
U1&U2 PADCOM-25 (2)
No ratings yet
U1&U2 PADCOM-25 (2)
95 pages
hpc_parallel
No ratings yet
hpc_parallel
122 pages
Is Parallel Programming Hard, And, If So, What Can You Do About It V2021.12.22a
No ratings yet
Is Parallel Programming Hard, And, If So, What Can You Do About It V2021.12.22a
630 pages
Wealth GG (1).docx
No ratings yet
Wealth GG (1).docx
31 pages
Aca Unit 1
No ratings yet
Aca Unit 1
34 pages
BDS-Session-2
No ratings yet
BDS-Session-2
58 pages
Perfbook-1c 2019 12 22a PDF
No ratings yet
Perfbook-1c 2019 12 22a PDF
825 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
Co-1 (2)
No ratings yet
Co-1 (2)
66 pages
Is Parallel Programming Hard - and - If So - What Can You Do About It
No ratings yet
Is Parallel Programming Hard - and - If So - What Can You Do About It
533 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
HPC Lecture (1) Summary
No ratings yet
HPC Lecture (1) Summary
8 pages
Module 1
No ratings yet
Module 1
14 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Parallel Programming
No ratings yet
Parallel Programming
692 pages
Lecture-2-06.01.2025
No ratings yet
Lecture-2-06.01.2025
21 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Week1-Parallel-and-Distributed-Computing
No ratings yet
Week1-Parallel-and-Distributed-Computing
55 pages
Algorithms and Parallel Computing: Dr. Fayez Gebali, P.Eng
No ratings yet
Algorithms and Parallel Computing: Dr. Fayez Gebali, P.Eng
17 pages
HPC Lecture 2 Points
No ratings yet
HPC Lecture 2 Points
7 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
p1
No ratings yet
p1
30 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Perfbook 1c E2 rc11
No ratings yet
Perfbook 1c E2 rc11
881 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
No ratings yet
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
58 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
RS_PDS-OE 3010
No ratings yet
RS_PDS-OE 3010
8 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
PLC Second Exam With Answers
No ratings yet
PLC Second Exam With Answers
5 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
CS621-CHEATSHEET.docx
No ratings yet
CS621-CHEATSHEET.docx
11 pages
Parallel Computing
100% (1)
Parallel Computing
12 pages
Gaussian Optics Exercise
No ratings yet
Gaussian Optics Exercise
2 pages
Boq RFQ142 005 2023 Repairing of Potholes 003 8
No ratings yet
Boq RFQ142 005 2023 Repairing of Potholes 003 8
10 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
Mansi Kadam PC Lab Assignment 1
No ratings yet
Mansi Kadam PC Lab Assignment 1
4 pages
P 1
No ratings yet
P 1
44 pages
Sump pit sizing
No ratings yet
Sump pit sizing
5 pages
HDA_Limpopo_Report_lr
No ratings yet
HDA_Limpopo_Report_lr
33 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
PR Group 2
No ratings yet
PR Group 2
21 pages
Aluminum 2024 T6
0% (1)
Aluminum 2024 T6
4 pages
Makalah Language Acquisition
No ratings yet
Makalah Language Acquisition
12 pages
Cosine Similarity
No ratings yet
Cosine Similarity
5 pages
The Occupational Personality Questionnaire
0% (1)
The Occupational Personality Questionnaire
11 pages
Potential Application of Nanotechnology in Transportation Seminar
No ratings yet
Potential Application of Nanotechnology in Transportation Seminar
20 pages
Chapter 5 - Fire Sprinkler System
No ratings yet
Chapter 5 - Fire Sprinkler System
12 pages
Unit 3 Governance 1st Class
No ratings yet
Unit 3 Governance 1st Class
28 pages
Business Studies Grade 12 ATP 2024
No ratings yet
Business Studies Grade 12 ATP 2024
16 pages
Macabeo Me150p E01 Hw1 Chapter14&15
No ratings yet
Macabeo Me150p E01 Hw1 Chapter14&15
24 pages
Business Analytics in Supply Chain Management
No ratings yet
Business Analytics in Supply Chain Management
9 pages
Team-Based Development Faqs: © 2010 Informatica Corporation
No ratings yet
Team-Based Development Faqs: © 2010 Informatica Corporation
10 pages
Chapter 7 Terrestrial Photogrammetry Chapter 8 Digital Photogrammetry
No ratings yet
Chapter 7 Terrestrial Photogrammetry Chapter 8 Digital Photogrammetry
33 pages
Hyperlynx Thermal User
No ratings yet
Hyperlynx Thermal User
116 pages
Dhar - Lab - Module - Planetary Literacy
No ratings yet
Dhar - Lab - Module - Planetary Literacy
5 pages
Appendix 11B: Using Excel Solver
No ratings yet
Appendix 11B: Using Excel Solver
5 pages
7 Features of E-Commerce
No ratings yet
7 Features of E-Commerce
8 pages
EXP-9 To Determine The Material Fringe Constant Using Compression Method in Two
No ratings yet
EXP-9 To Determine The Material Fringe Constant Using Compression Method in Two
3 pages
UHD EET1411 - Direct Current Circuit - Final-Homework Fall 2020 Name of Student - Problem 1 (15pts)
No ratings yet
UHD EET1411 - Direct Current Circuit - Final-Homework Fall 2020 Name of Student - Problem 1 (15pts)
6 pages
CFS Mid2
No ratings yet
CFS Mid2
2 pages
RB2011 Series-Especificaciones
No ratings yet
RB2011 Series-Especificaciones
1 page
P2P Process
No ratings yet
P2P Process
12 pages
English Secondary-Second Grade-Ws-04....
No ratings yet
English Secondary-Second Grade-Ws-04....
4 pages
NVIDIABl Chip
No ratings yet
NVIDIABl Chip
3 pages
Study Guide Cisco 300-535 SPAUTO Automating and Programming Cisco Service Provider Solutions
From Everand
Study Guide Cisco 300-535 SPAUTO Automating and Programming Cisco Service Provider Solutions
Anand Vemula
No ratings yet
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

PDC Lecture 02

Uploaded by

PDC Lecture 02

Uploaded by

CS-402 Parallel and Distributed Systems

These classifications help us understand the design spaces of advanced architectures.

 Based on how parallelism is achieved

Instruction level Thread level Process level

Pipelined VLIWs Superscalar Shared

• BY OPERATING ON MULTIPLE DATA: DATA PARALLELISM

 Time and performance: Machine A is n times faster than Machine B if and

 Program execution time = × ×

(CPI) (cycle time)

 MIPS (millions instructions per second) – vendors sometimes report this

You can find the FLOPS for up-to-date CPUs at

 The FLOPS value reported by vendors is the peak performance that no

 A set of standard benchmarks are there to measure the sustained

 Parallel computing: using multiple processing elements in parallel to solve

o “#pragma omp section” specifies the two tasks in the program.

o sum_omp.cpp consists of four sub-tasks.

 A: the sequential part before the parallel region D

 D: the part after the parallel sub-tasks join back.

 Notice that this parallel program is non-trivial

o Combine the results to get the final answer +

 The execution of a parallel program can be represented as a

 CG describes a set of computational tasks and the dependencies

 The computation graph for sum_omp.cpp A

 Node A must be executed before Node B if there is a B C

 With P processors, let be the execution of a schedule for

 For any greedy schedule:

o Graham, R. L. “Bounds on Multiprocessing Timing Anomalies.” SIAM Journal on

 Combine the results:

 Any greedy scheduler achieves that is within a factor of 2 of

 Scalability of a program measures how many processors that the

 Let be the execution time for a computation to run on 1 processor

 The efficiency of an algorithm using P processors is

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.