0% found this document useful (0 votes)

26 views24 pages

Unit4 Session4 PC Examples Machine Learning

Uploaded by

bhavanabaday

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views24 pages

Unit4 Session4 PC Examples Machine Learning

Uploaded by

bhavanabaday

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Microprocessor & Computer

Architecture (μpCA)

Unit 4: Parallel Computing

examples, Design Constraints

UE22CS251B

Session : 4.4
Microprocessor & Computer Architecture (μpCA)
1. Adding Two 2D Matrices –Using a UniProcessor

• A uniprocessor system consists of a single central processing unit (CPU).

for(i=0;i<row;i++) • Key Characteristics:
• Single Core: The CPU has only one processing core.
for(j=0;j<col;j++) • Sequential Execution: Instructions are executed one after the other in a sequential manner.
• Limited Parallelism: Uniprocessor systems lack inherent parallelism, as they can execute only one
A[i][j]=B[i][j]+C[i][j] instruction at a time.
• Advantages:
• Simplicity: Uniprocessor systems are straightforward and easy to design.
• Cost-Effective: They require less hardware and resources.

P
• Suitable for Many Applications: Uniprocessors handle most everyday tasks efficiently

A[][] B[][] C[][] B[i][j]

= + C[i][j] A[i][j]

.
• Disadvantages:
• Limited Performance: Uniprocessors may not meet the performance demands of complex applications.
• Lack of Parallelism: They cannot exploit parallelism effectively. Uni Processor
• Bottlenecks: Resource bottlenecks occur when multiple tasks compete for the same CPU.
• A multiprocessor system consists of two or more
CPUs that share access to a common memory.
Microprocessor & Computer Architecture (μpCA) • Key Characteristics:
• Multiple Cores: Each CPU may have
Adding Two Matrices on a Multi-Processors system •
multiple cores.
Parallel Execution: Multiple CPUs can
execute instructions simultaneously.
• Enhanced Performance: Multiprocessors
How to Utilize 4 Processors to perform Matrix addition? boost system performance by distributing
the workload.
for(i=0;i<row;i++) • Shared Memory: All CPUs access a
common RAM.
for(j=0;j<col;j++) B[i][j]
A[i][j]=B[i][j]+C[i][j]
C[i][j]
P1 A[i][j]

A[][] B[][] C[][] B[i][j]

P2 A[i][j]
C[i][j]

= + B[i][j]
C[i][j]
P3 A[i][j]

P4
Everything is not Parallel B[i][j]
However, 4 Sequential operation in each Processor A[i][j]
C[i][j]
Microprocessor & Computer Architecture (μpCA)

• Advantages:
• Improved Performance: Multiprocessors execute tasks faster by distributing the
workload.
• Scalability: Additional processors can be added to handle increased workloads.
• Reliability: The system can continue operating even if one processor fails.
• Disadvantages:
• Increased Complexity: Multiprocessor systems require additional hardware and software.
• Higher Power Consumption: Operating multiprocessors consumes more power.
• Synchronization Challenges: Ensuring correct task execution across CPUs can be complex.
Microprocessor & Computer Architecture (μpCA)
2. Compute Euclidian Distance for a given dataset

• Common task used in machine learning which is ripe for parallelization is distance calculation

• Euclidean distance is a very common metric which requires calculation over and over again in
numerous algorithms.

• The individual distance calculations of successive iterations are not dependent on other
calculations of the same iteration, these calculations could be performed in parallel.
Parallel k-Nearest Neighbor

• The Euclidean distance (also known as the L2 distance) measures the straight-line distance
between two points in Euclidean space.
Microprocessor & Computer Architecture (μpCA)
Speed up vs Number of Processors

• Superlinear Speedup:
• The curve labeled “Superlinear Speedup” rises steeply as the
number of processors increases.
• This indicates that adding more processors leads to
a disproportionately high increase in speed.
• In some cases, parallelization can yield unexpected
performance gains beyond linear scaling.
• Linear Speedup:
• The curve labeled “Linear Speedup” has a steady incline.
• It shows that as more processors are added, the speed
increases proportionally.
• Linear speedup means that the performance gain matches the
increase in the number of processors.
• Typical Speedup:
• The curve labeled “Typical Speedup” lies between the
superlinear and linear curves.
• It suggests that adding more processors results in a moderate
increase in speed.
• While not directly proportional, the speedup is still significant.
Microprocessor & Computer Architecture (μpCA)
Can all types of problems be solved using Parallel Computing?

for(j=1;j<n;j++)
A[j]=B[j]+C[j]+A[j-1]

The Above program cannot be fully parallelized due to dependency between the Instructions
Microprocessor & Computer Architecture (μpCA)
Parallel Execution – Design Issues

A program (or algorithm) which can be parallelized can be split up into two parts:
• A part which cannot be parallelized
• A part which can be parallelized

• T = Total time of serial execution

• T - F = Total time of non-parallizable part
• F = Total time of parallizable part (when executed serially, not in parallel)

<----------------------T-------------------------------->
T-F F T= F+ (T-F)
Microprocessor & Computer Architecture (μpCA)
Parallel Execution – Design Issues

<------------T------------->
<---------------T--------------------> <------------T------------->
T-F F/3
T-F F T-F F/2
F/3
F/2
F/3

T= (T-F)+ F (F)/N

T(N)= (T-F)+ F/N

Microprocessor & Computer Architecture (μpCA)
Parallel Computing - SOLVED EXAMPLE 1

The total time to execute a program is set to 1. The parallelizable part of the programs consumes
60% of the execution time. What is the execution time of the program when executed on 2
processor ?

Solution:
The parallelizable part is thus equal = 0.6.
Time for non-parallelizable part is 1-0.6= 0.4 .

The execution time of the program with a parallelization factor of 2

(2 threads or CPUs executing the parallelizable part, so N is 2) would be:

T(N) = (T – F ) + F / N
T(2) = (1-0.6) + 0.6 / 2
= 0.4 + 0.6 / 2
= 0.4 + 0.3
= 0.7
Microprocessor & Computer Architecture (μpCA)
Parallel Computing - SOLVED EXAMPLE 1

Making the same calculation with a parallelization factor

of 5 instead of 2 would look like this:
T(5) = (1-0.6) + 0.6 / 5
= 0.4 + 0.6 / 5
= 0.4 + 0.12
= 0.52

Conclusion, by increasing the number of processing unit will not contribute in improved
execution time. Instead, parallelization in the program need to be improved.
i.e., writing parallel program make sense
Microprocessor & Computer Architecture (μpCA)
Issues with Communication Time & Parallel Computing Topology?

Linear

Tree

Ring Hypercube

Mesh
Microprocessor & Computer Architecture (μpCA)
Parallel Computation - Challenges!!

• Different (Parallel ) Programming Skills required as Topologies changes.

• Communication Cost (Time) changes according to Topologies.

• Through understanding of the Hardware is required to write program to

utilize the computational power fully
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P1 P5
1,2 5,6
P0
7, 8
P7
21, 23
P2 P3
9,10 11,13

P4 P6
3.4 16,19
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P1 P5
P0 3 11
15
P7
44
P2 P3
19 24

P4 P6
7 35
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P1 P5
P0 14
22
P7
P2 P3
54 68

P4 P6
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P1 P5
82
P0
76
P7
P2 P3

P4 P6
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P1 P5
P0
158
P7
P2 P3

P4 P6
Microprocessor & Computer Architecture (μpCA)
Summation (MESH-SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P00 P01 P02 P03

7,8 1,2 9,10 3,4

P10 P11 P12 P13

5,6 11,13 16,19 21.23
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P00 P01 P02 P03

15 3 19 7

P10 P11 P12 P13

11 24 35 44
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P00 P01 P02 P03

18 26

P10 P11 P12 P13

35 79
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P00 P01 P02 P03

P10 P11 P12 P13

114
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P00 P01 P02 P03

P10 P11 P12 P13

158
THANK YOU

Team MPCA
Department of Computer Science and Engineering

1 Introduction
No ratings yet
1 Introduction
30 pages
Cao- Unit 4 - Notes_final
No ratings yet
Cao- Unit 4 - Notes_final
30 pages
HPC Unit 1 Final
No ratings yet
HPC Unit 1 Final
2 pages
Ca Lecture 11
No ratings yet
Ca Lecture 11
10 pages
Lecture 3 Amdahl's Law and Karp Flatt Metric
No ratings yet
Lecture 3 Amdahl's Law and Karp Flatt Metric
42 pages
Untitled document (3)
No ratings yet
Untitled document (3)
63 pages
2nd
No ratings yet
2nd
19 pages
Untitled document (2)
No ratings yet
Untitled document (2)
39 pages
Lect11 12 Parallel
No ratings yet
Lect11 12 Parallel
57 pages
Introduction to Paralel Procesing
No ratings yet
Introduction to Paralel Procesing
40 pages
Unit4 Session1 Intro To Parallel Computing
No ratings yet
Unit4 Session1 Intro To Parallel Computing
24 pages
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
No ratings yet
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
13 pages
Pepper Presentation
No ratings yet
Pepper Presentation
38 pages
PDC - Lecture - No. 2
No ratings yet
PDC - Lecture - No. 2
31 pages
M3
No ratings yet
M3
70 pages
Screenshot 2024-12-05 at 2.01.32 PM
No ratings yet
Screenshot 2024-12-05 at 2.01.32 PM
49 pages
Multiprocessing vs Multithreading 2
No ratings yet
Multiprocessing vs Multithreading 2
16 pages
Unit4 Session3 Parallel Computing Concepts Terminology Design Issues
No ratings yet
Unit4 Session3 Parallel Computing Concepts Terminology Design Issues
30 pages
CS-3006_2_PDC_Overview_compressed
No ratings yet
CS-3006_2_PDC_Overview_compressed
107 pages
Arch13 Multiprocessors Afterlecture
No ratings yet
Arch13 Multiprocessors Afterlecture
70 pages
Unit 1
No ratings yet
Unit 1
54 pages
Lecture-2-06.01.2025
No ratings yet
Lecture-2-06.01.2025
21 pages
Aca Unit 1
No ratings yet
Aca Unit 1
34 pages
Lect 02
No ratings yet
Lect 02
51 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
21 pages
CA PDF
No ratings yet
CA PDF
10 pages
multicore02-2
No ratings yet
multicore02-2
18 pages
Computer Achitecture II - Parallel - Computing
No ratings yet
Computer Achitecture II - Parallel - Computing
46 pages
Parallel Programming: Sathish S. Vadhiyar Course Web Page
No ratings yet
Parallel Programming: Sathish S. Vadhiyar Course Web Page
36 pages
Pemrosesan Parale2l
No ratings yet
Pemrosesan Parale2l
27 pages
Seminar
No ratings yet
Seminar
85 pages
PC 1
No ratings yet
PC 1
53 pages
hpc_parallel
No ratings yet
hpc_parallel
122 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
February 22, 2010
No ratings yet
February 22, 2010
53 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Parallel2 PDF
No ratings yet
Parallel2 PDF
16 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
33 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
PP 1
No ratings yet
PP 1
41 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
23 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
91 pages
Lecture 4 Analytical Modeling of Parallel Programs
No ratings yet
Lecture 4 Analytical Modeling of Parallel Programs
11 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
Parallelism in Computer Architecture
No ratings yet
Parallelism in Computer Architecture
27 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
CSCI 8150 Advanced Computer Architecture
100% (2)
CSCI 8150 Advanced Computer Architecture
18 pages
Operating System Course: Carlos III University of Madrid Computer Science Department
No ratings yet
Operating System Course: Carlos III University of Madrid Computer Science Department
2 pages
Parallelization of A Neural Network Algorithm For Use in Handwriting Recognition
No ratings yet
Parallelization of A Neural Network Algorithm For Use in Handwriting Recognition
6 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
Parallel Computing Main
No ratings yet
Parallel Computing Main
47 pages
Lycaon Body
No ratings yet
Lycaon Body
5 pages
Basics of Parallel Programming: Unit-1
No ratings yet
Basics of Parallel Programming: Unit-1
79 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
6th Semester Elective Student list
No ratings yet
6th Semester Elective Student list
45 pages
HAFTA DIARY FINAL
No ratings yet
HAFTA DIARY FINAL
30 pages
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
Scheduling OS
No ratings yet
Scheduling OS
8 pages
C++ Notes
No ratings yet
C++ Notes
99 pages
AJU190537
No ratings yet
AJU190537
114 pages
D@C
No ratings yet
D@C
52 pages
07-Requirements-based-Functional-Testing-one-slide
No ratings yet
07-Requirements-based-Functional-Testing-one-slide
47 pages
f(10)
No ratings yet
f(10)
25 pages
Unit4 Session2 Parallel Computing Classification
No ratings yet
Unit4 Session2 Parallel Computing Classification
15 pages
Result ?
No ratings yet
Result ?
6 pages
HPE Compute Certification Guide: 444 Practice Questions for the Advanced HPE1-H02 Exam
From Everand
HPE Compute Certification Guide: 444 Practice Questions for the Advanced HPE1-H02 Exam
Steve Brown
No ratings yet
Common Syllabus For Bachelor in Computer Applications (Bca) Preamble
No ratings yet
Common Syllabus For Bachelor in Computer Applications (Bca) Preamble
42 pages
Unit4 Session5 Amdahls Law Gustafsons Law
No ratings yet
Unit4 Session5 Amdahls Law Gustafsons Law
15 pages
Automata Chapter 1
No ratings yet
Automata Chapter 1
20 pages
Inheritance
No ratings yet
Inheritance
20 pages
Ddcolab2 Pes1ug22cs140 Bhavana Baday C
No ratings yet
Ddcolab2 Pes1ug22cs140 Bhavana Baday C
11 pages
5088
No ratings yet
5088
7 pages
Example Questions Answers v2
No ratings yet
Example Questions Answers v2
9 pages
PSTC QUESTION BANK ON MODULE -3_4_5
No ratings yet
PSTC QUESTION BANK ON MODULE -3_4_5
3 pages
PPA Question Bank
No ratings yet
PPA Question Bank
3 pages
New Year 2025 Celebrations at Leonia Resorts - Hyderabad
No ratings yet
New Year 2025 Celebrations at Leonia Resorts - Hyderabad
2 pages
Regular Language Equivalence and DFA Minimization
No ratings yet
Regular Language Equivalence and DFA Minimization
22 pages
Access Modifiers in C++
No ratings yet
Access Modifiers in C++
4 pages
Ijser: Hybrid Data Encryption and Decryption Using Rsa and Rc4
No ratings yet
Ijser: Hybrid Data Encryption and Decryption Using Rsa and Rc4
10 pages
PESU Calendar of Events Jan'25- May'25
No ratings yet
PESU Calendar of Events Jan'25- May'25
1 page
Dynamic Memory Allocation in C
No ratings yet
Dynamic Memory Allocation in C
3 pages
AI Class Assignment 2 - 3 - 4 - 5
No ratings yet
AI Class Assignment 2 - 3 - 4 - 5
2 pages
CompTIA A+ Success Path : Study Guide & Practice Tests
From Everand
CompTIA A+ Success Path : Study Guide & Practice Tests
SUJAN
No ratings yet
Support Resistance Diagonal With EMA
No ratings yet
Support Resistance Diagonal With EMA
6 pages
Physical Design Interview Complete 56
No ratings yet
Physical Design Interview Complete 56
1 page
Ponto Sabin
No ratings yet
Ponto Sabin
2 pages
Format - 1
No ratings yet
Format - 1
1 page
U4L8 Activity Guide - Conditionals Make
No ratings yet
U4L8 Activity Guide - Conditionals Make
2 pages
23cb511 - Cryptography and Network Security
No ratings yet
23cb511 - Cryptography and Network Security
2 pages
Unit 2 Classwork - Docla 24
No ratings yet
Unit 2 Classwork - Docla 24
4 pages
CNC Router Essentials: The Basics for Mastering the Most Innovative Tool in Your Workshop
From Everand
CNC Router Essentials: The Basics for Mastering the Most Innovative Tool in Your Workshop
Randy Johnson
5/5 (3)
DS Unit-5 Searching-Sorting
100% (1)
DS Unit-5 Searching-Sorting
37 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit4 Session4 PC Examples Machine Learning

Uploaded by

Unit4 Session4 PC Examples Machine Learning

Uploaded by

Microprocessor & Computer

Unit 4: Parallel Computing

• A uniprocessor system consists of a single central processing unit (CPU).

A[][] B[][] C[][] B[i][j]

A[][] B[][] C[][] B[i][j]

• T = Total time of serial execution

T(N)= (T-F)+ F/N

The execution time of the program with a parallelization factor of 2

Making the same calculation with a parallelization factor

• Different (Parallel ) Programming Skills required as Topologies changes.

• Communication Cost (Time) changes according to Topologies.

• Through understanding of the Hardware is required to write program to

P00 P01 P02 P03

P10 P11 P12 P13

P00 P01 P02 P03

P10 P11 P12 P13

P00 P01 P02 P03

P10 P11 P12 P13

P00 P01 P02 P03

P10 P11 P12 P13

P00 P01 P02 P03

P10 P11 P12 P13

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.