Unit4 Session4 PC Examples Machine Learning
Unit4 Session4 PC Examples Machine Learning
Architecture (μpCA)
UE22CS251B
Session : 4.4
Microprocessor & Computer Architecture (μpCA)
1. Adding Two 2D Matrices –Using a UniProcessor
P
• Suitable for Many Applications: Uniprocessors handle most everyday tasks efficiently
= + C[i][j] A[i][j]
.
• Disadvantages:
• Limited Performance: Uniprocessors may not meet the performance demands of complex applications.
• Lack of Parallelism: They cannot exploit parallelism effectively. Uni Processor
• Bottlenecks: Resource bottlenecks occur when multiple tasks compete for the same CPU.
• A multiprocessor system consists of two or more
CPUs that share access to a common memory.
Microprocessor & Computer Architecture (μpCA) • Key Characteristics:
• Multiple Cores: Each CPU may have
Adding Two Matrices on a Multi-Processors system •
multiple cores.
Parallel Execution: Multiple CPUs can
execute instructions simultaneously.
• Enhanced Performance: Multiprocessors
How to Utilize 4 Processors to perform Matrix addition? boost system performance by distributing
the workload.
for(i=0;i<row;i++) • Shared Memory: All CPUs access a
common RAM.
for(j=0;j<col;j++) B[i][j]
A[i][j]=B[i][j]+C[i][j]
C[i][j]
P1 A[i][j]
= + B[i][j]
C[i][j]
P3 A[i][j]
P4
Everything is not Parallel B[i][j]
However, 4 Sequential operation in each Processor A[i][j]
C[i][j]
Microprocessor & Computer Architecture (μpCA)
• Advantages:
• Improved Performance: Multiprocessors execute tasks faster by distributing the
workload.
• Scalability: Additional processors can be added to handle increased workloads.
• Reliability: The system can continue operating even if one processor fails.
• Disadvantages:
• Increased Complexity: Multiprocessor systems require additional hardware and software.
• Higher Power Consumption: Operating multiprocessors consumes more power.
• Synchronization Challenges: Ensuring correct task execution across CPUs can be complex.
Microprocessor & Computer Architecture (μpCA)
2. Compute Euclidian Distance for a given dataset
• Common task used in machine learning which is ripe for parallelization is distance calculation
• Euclidean distance is a very common metric which requires calculation over and over again in
numerous algorithms.
• The individual distance calculations of successive iterations are not dependent on other
calculations of the same iteration, these calculations could be performed in parallel.
Parallel k-Nearest Neighbor
• The Euclidean distance (also known as the L2 distance) measures the straight-line distance
between two points in Euclidean space.
Microprocessor & Computer Architecture (μpCA)
Speed up vs Number of Processors
• Superlinear Speedup:
• The curve labeled “Superlinear Speedup” rises steeply as the
number of processors increases.
• This indicates that adding more processors leads to
a disproportionately high increase in speed.
• In some cases, parallelization can yield unexpected
performance gains beyond linear scaling.
• Linear Speedup:
• The curve labeled “Linear Speedup” has a steady incline.
• It shows that as more processors are added, the speed
increases proportionally.
• Linear speedup means that the performance gain matches the
increase in the number of processors.
• Typical Speedup:
• The curve labeled “Typical Speedup” lies between the
superlinear and linear curves.
• It suggests that adding more processors results in a moderate
increase in speed.
• While not directly proportional, the speedup is still significant.
Microprocessor & Computer Architecture (μpCA)
Can all types of problems be solved using Parallel Computing?
for(j=1;j<n;j++)
A[j]=B[j]+C[j]+A[j-1]
The Above program cannot be fully parallelized due to dependency between the Instructions
Microprocessor & Computer Architecture (μpCA)
Parallel Execution – Design Issues
A program (or algorithm) which can be parallelized can be split up into two parts:
• A part which cannot be parallelized
• A part which can be parallelized
<----------------------T-------------------------------->
T-F F T= F+ (T-F)
Microprocessor & Computer Architecture (μpCA)
Parallel Execution – Design Issues
<------------T------------->
<---------------T--------------------> <------------T------------->
T-F F/3
T-F F T-F F/2
F/3
F/2
F/3
T= (T-F)+ F (F)/N
The total time to execute a program is set to 1. The parallelizable part of the programs consumes
60% of the execution time. What is the execution time of the program when executed on 2
processor ?
Solution:
The parallelizable part is thus equal = 0.6.
Time for non-parallelizable part is 1-0.6= 0.4 .
T(N) = (T – F ) + F / N
T(2) = (1-0.6) + 0.6 / 2
= 0.4 + 0.6 / 2
= 0.4 + 0.3
= 0.7
Microprocessor & Computer Architecture (μpCA)
Parallel Computing - SOLVED EXAMPLE 1
Conclusion, by increasing the number of processing unit will not contribute in improved
execution time. Instead, parallelization in the program need to be improved.
i.e., writing parallel program make sense
Microprocessor & Computer Architecture (μpCA)
Issues with Communication Time & Parallel Computing Topology?
Linear
Tree
Ring Hypercube
Mesh
Microprocessor & Computer Architecture (μpCA)
Parallel Computation - Challenges!!
P1 P5
1,2 5,6
P0
7, 8
P7
21, 23
P2 P3
9,10 11,13
P4 P6
3.4 16,19
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23
P1 P5
P0 3 11
15
P7
44
P2 P3
19 24
P4 P6
7 35
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23
P1 P5
P0 14
22
P7
P2 P3
54 68
P4 P6
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23
P1 P5
82
P0
76
P7
P2 P3
P4 P6
Microprocessor & Computer Architecture (μpCA)
Summation (Hypercube SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23
P1 P5
P0
158
P7
P2 P3
P4 P6
Microprocessor & Computer Architecture (μpCA)
Summation (MESH-SIMD)
Input
7 8 1 2 9 10 3 4 5 6 11 13 16 19 21 23
Team MPCA
Department of Computer Science and Engineering