0% found this document useful (0 votes)
26 views24 pages

Unit4 Session4 PC Examples Machine Learning

Uploaded by

bhavanabaday
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views24 pages

Unit4 Session4 PC Examples Machine Learning

Uploaded by

bhavanabaday
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Microprocessor & Computer

Architecture (μpCA)

Unit 4: Parallel Computing


examples, Design Constraints

UE22CS251B

Session : 4.4
Microprocessor & Computer Architecture (μpCA)
1. Adding Two 2D Matrices –Using a UniProcessor

• A uniprocessor system consists of a single central processing unit (CPU).


for(i=0;i<row;i++) • Key Characteristics:
• Single Core: The CPU has only one processing core.
for(j=0;j<col;j++) • Sequential Execution: Instructions are executed one after the other in a sequential manner.
• Limited Parallelism: Uniprocessor systems lack inherent parallelism, as they can execute only one
A[i][j]=B[i][j]+C[i][j] instruction at a time.
• Advantages:
• Simplicity: Uniprocessor systems are straightforward and easy to design.
• Cost-Effective: They require less hardware and resources.

P
• Suitable for Many Applications: Uniprocessors handle most everyday tasks efficiently

A[][] B[][] C[][] B[i][j]

= + C[i][j] A[i][j]

.
• Disadvantages:
• Limited Performance: Uniprocessors may not meet the performance demands of complex applications.
• Lack of Parallelism: They cannot exploit parallelism effectively. Uni Processor
• Bottlenecks: Resource bottlenecks occur when multiple tasks compete for the same CPU.
• A multiprocessor system consists of two or more
CPUs that share access to a common memory.
Microprocessor & Computer Architecture (μpCA) • Key Characteristics:
• Multiple Cores: Each CPU may have
Adding Two Matrices on a Multi-Processors system •
multiple cores.
Parallel Execution: Multiple CPUs can
execute instructions simultaneously.
• Enhanced Performance: Multiprocessors
How to Utilize 4 Processors to perform Matrix addition? boost system performance by distributing
the workload.
for(i=0;i<row;i++) • Shared Memory: All CPUs access a
common RAM.
for(j=0;j<col;j++) B[i][j]
A[i][j]=B[i][j]+C[i][j]
C[i][j]
P1 A[i][j]

A[][] B[][] C[][] B[i][j]


P2 A[i][j]
C[i][j]

= + B[i][j]
C[i][j]
P3 A[i][j]

P4
Everything is not Parallel B[i][j]
However, 4 Sequential operation in each Processor A[i][j]
C[i][j]
Microprocessor & Computer Architecture (μpCA)

• Advantages:
• Improved Performance: Multiprocessors execute tasks faster by distributing the
workload.
• Scalability: Additional processors can be added to handle increased workloads.
• Reliability: The system can continue operating even if one processor fails.
• Disadvantages:
• Increased Complexity: Multiprocessor systems require additional hardware and software.
• Higher Power Consumption: Operating multiprocessors consumes more power.
• Synchronization Challenges: Ensuring correct task execution across CPUs can be complex.
Microprocessor & Computer Architecture (μpCA)
2. Compute Euclidian Distance for a given dataset

• Common task used in machine learning which is ripe for parallelization is distance calculation

• Euclidean distance is a very common metric which requires calculation over and over again in
numerous algorithms.

• The individual distance calculations of successive iterations are not dependent on other
calculations of the same iteration, these calculations could be performed in parallel.
Parallel k-Nearest Neighbor

• The Euclidean distance (also known as the L2 distance) measures the straight-line distance
between two points in Euclidean space.
Microprocessor & Computer Architecture (μpCA)
Speed up vs Number of Processors

• Superlinear Speedup:
• The curve labeled “Superlinear Speedup” rises steeply as the
number of processors increases.
• This indicates that adding more processors leads to
a disproportionately high increase in speed.
• In some cases, parallelization can yield unexpected
performance gains beyond linear scaling.
• Linear Speedup:
• The curve labeled “Linear Speedup” has a steady incline.
• It shows that as more processors are added, the speed
increases proportionally.
• Linear speedup means that the performance gain matches the
increase in the number of processors.
• Typical Speedup:
• The curve labeled “Typical Speedup” lies between the
superlinear and linear curves.
• It suggests that adding more processors results in a moderate
increase in speed.
• While not directly proportional, the speedup is still significant.
Microprocessor & Computer Architecture (μpCA)
Can all types of problems be solved using Parallel Computing?

for(j=1;j<n;j++)
A[j]=B[j]+C[j]+A[j-1]

The Above program cannot be fully parallelized due to dependency between the Instructions
Microprocessor & Computer Architecture (μpCA)
Parallel Execution – Design Issues

A program (or algorithm) which can be parallelized can be split up into two parts:
• A part which cannot be parallelized
• A part which can be parallelized

• T = Total time of serial execution


• T - F = Total time of non-parallizable part
• F = Total time of parallizable part (when executed serially, not in parallel)

<----------------------T-------------------------------->
T-F F T= F+ (T-F)
Microprocessor & Computer Architecture (μpCA)
Parallel Execution – Design Issues

<------------T------------->
<---------------T--------------------> <------------T------------->
T-F F/3
T-F F T-F F/2
F/3
F/2
F/3

T= (T-F)+ F (F)/N

T(N)= (T-F)+ F/N


Microprocessor & Computer Architecture (μpCA)
Parallel Computing - SOLVED EXAMPLE 1

The total time to execute a program is set to 1. The parallelizable part of the programs consumes
60% of the execution time. What is the execution time of the program when executed on 2
processor ?

Solution:
The parallelizable part is thus equal = 0.6.
Time for non-parallelizable part is 1-0.6= 0.4 .

The execution time of the program with a parallelization factor of 2


(2 threads or CPUs executing the parallelizable part, so N is 2) would be:

T(N) = (T – F ) + F / N
T(2) = (1-0.6) + 0.6 / 2
= 0.4 + 0.6 / 2
= 0.4 + 0.3
= 0.7
Microprocessor & Computer Architecture (μpCA)
Parallel Computing - SOLVED EXAMPLE 1

Making the same calculation with a parallelization factor


of 5 instead of 2 would look like this:
T(5) = (1-0.6) + 0.6 / 5
= 0.4 + 0.6 / 5
= 0.4 + 0.12
= 0.52

Conclusion, by increasing the number of processing unit will not contribute in improved
execution time. Instead, parallelization in the program need to be improved.
i.e., writing parallel program make sense
Microprocessor & Computer Architecture (μpCA)
Issues with Communication Time & Parallel Computing Topology?

Linear

Tree

Ring Hypercube

Mesh
Microprocessor & Computer Architecture (μpCA)
Parallel Computation - Challenges!!

• Different (Parallel ) Programming Skills required as Topologies changes.

• Communication Cost (Time) changes according to Topologies.

• Through understanding of the Hardware is required to write program to


utilize the computational power fully
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P1 P5
1,2 5,6
P0
7, 8
P7
21, 23
P2 P3
9,10 11,13

P4 P6
3.4 16,19
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P1 P5
P0 3 11
15
P7
44
P2 P3
19 24

P4 P6
7 35
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P1 P5
P0 14
22
P7
P2 P3
54 68

P4 P6
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P1 P5
82
P0
76
P7
P2 P3

P4 P6
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P1 P5
P0
158
P7
P2 P3

P4 P6
Microprocessor & Computer Architecture (μpCA)
Summation (MESH-SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P00 P01 P02 P03


7,8 1,2 9,10 3,4

P10 P11 P12 P13


5,6 11,13 16,19 21.23
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P00 P01 P02 P03


15 3 19 7

P10 P11 P12 P13


11 24 35 44
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P00 P01 P02 P03


18 26

P10 P11 P12 P13


35 79
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P00 P01 P02 P03


44

P10 P11 P12 P13


114
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23

P00 P01 P02 P03


44

P10 P11 P12 P13


158
THANK YOU

Team MPCA
Department of Computer Science and Engineering

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy