We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10
CSE - 313
Computer Architecture
Faculty: Shoib Ahmed Shourav
United International University Summer 2021 Parallel Processors from Client to Cloud Introduction • Goal: connecting multiple computers to get higher performance • Multiprocessors • Scalability, availability, power efficiency • Task-level (process-level) parallelism • High throughput for independent jobs • Parallel processing program • Single program run on multiple processors • Multicore microprocessors • Chips with multiple processors (cores) Hardware and Software • Hardware • Serial: e.g., Pentium 4 • Parallel: e.g., quad-core Xeon e5345 • Software • Sequential: e.g., matrix multiplication • Concurrent: e.g., operating system • Sequential/concurrent software can run on serial/parallel hardware • Challenge: making effective use of parallel hardware Parallel Programming
• Parallel software is the problem
• Need to get significant performance improvement • Otherwise, just use a faster uniprocessor, since it’s easier! • Difficulties • Partitioning • Coordination • Communications overhead Amdahl’s Law
• Need sequential part to be 0.1% of original time • Here T is Time and F represents Fraction of the program. Scaling Example 1 • Workload: sum of 10 scalars, and 10 × 10 matrix sum • Speed up from 10 to 100 processors • Single processor: Time = (10 + 100) × tadd • 10 processors • Time = 10 × tadd + 100/10 × tadd = 20 × tadd • Speedup = 110/20 = 5.5 (55% of potential) • 100 processors • Time = 10 × tadd + 100/100 × tadd = 11 × tadd • Speedup = 110/11 = 10 (10% of potential) • Assumes load can be balanced across processors Scaling Example 2 • What if matrix size is 100 × 100? • Single processor: Time = (10 + 10000) × tadd • 10 processors • Time = 10 × tadd + 10000/10 × tadd = 1010 × tadd • Speedup = 10010/1010 = 9.9 (99% of potential) • 100 processors • Time = 10 × tadd + 10000/100 × tadd = 110 × tadd • Speedup = 10010/110 = 91 (91% of potential) • Assuming load balanced Strong vs Weak Scaling
• Strong scaling: problem size fixed as in example
• Weak scaling: problem size proportional to number of processors • 10 processors, 10 × 10 matrix • Time = 20 × tadd • 100 processors, 32 × 32 matrix • Time = 10 × tadd + 1000/100 × tadd = 20 × tadd • Constant performance in this example Any Question?
Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition