0% found this document useful (0 votes)
17 views10 pages

Ca Lecture 11

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views10 pages

Ca Lecture 11

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

CSE - 313

Computer Architecture

Faculty: Shoib Ahmed Shourav


United International University
Summer 2021
Parallel Processors from
Client to Cloud
Introduction
• Goal: connecting multiple computers to get higher performance
• Multiprocessors
• Scalability, availability, power efficiency
• Task-level (process-level) parallelism
• High throughput for independent jobs
• Parallel processing program
• Single program run on multiple processors
• Multicore microprocessors
• Chips with multiple processors (cores)
Hardware and Software
• Hardware
• Serial: e.g., Pentium 4
• Parallel: e.g., quad-core Xeon e5345
• Software
• Sequential: e.g., matrix multiplication
• Concurrent: e.g., operating system
• Sequential/concurrent software can run on serial/parallel hardware
• Challenge: making effective use of parallel hardware
Parallel Programming

• Parallel software is the problem


• Need to get significant performance improvement
• Otherwise, just use a faster uniprocessor, since it’s easier!
• Difficulties
• Partitioning
• Coordination
• Communications overhead
Amdahl’s Law

• Sequential part can limit speedup


• Example: 100 processors, 90× speedup?
• Tnew = Tparallelizable/100 + Tsequential

•S

• Solving: Fparallelizable = 0.999


• Need sequential part to be 0.1% of original time
• Here T is Time and F represents Fraction of the program.
Scaling Example 1
• Workload: sum of 10 scalars, and 10 × 10 matrix sum
• Speed up from 10 to 100 processors
• Single processor: Time = (10 + 100) × tadd
• 10 processors
• Time = 10 × tadd + 100/10 × tadd = 20 × tadd
• Speedup = 110/20 = 5.5 (55% of potential)
• 100 processors
• Time = 10 × tadd + 100/100 × tadd = 11 × tadd
• Speedup = 110/11 = 10 (10% of potential)
• Assumes load can be balanced across processors
Scaling Example 2
• What if matrix size is 100 × 100?
• Single processor: Time = (10 + 10000) × tadd
• 10 processors
• Time = 10 × tadd + 10000/10 × tadd = 1010 × tadd
• Speedup = 10010/1010 = 9.9 (99% of potential)
• 100 processors
• Time = 10 × tadd + 10000/100 × tadd = 110 × tadd
• Speedup = 10010/110 = 91 (91% of potential)
• Assuming load balanced
Strong vs Weak Scaling

• Strong scaling: problem size fixed as in example


• Weak scaling: problem size proportional to number of processors
• 10 processors, 10 × 10 matrix
• Time = 20 × tadd
• 100 processors, 32 × 32 matrix
• Time = 10 × tadd + 1000/100 × tadd = 20 × tadd
• Constant performance in this example
Any Question?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy