0% found this document useful (0 votes)

59 views54 pages

CH 3 Multithreading

The document discusses different techniques for achieving parallelism at various levels including application, thread, data, instruction, and hardware levels. It covers concepts like data parallelism, task parallelism, Flynn's classification, and latency hiding techniques like prefetching, coherent caches, and multithreading. It also discusses cache coherence protocols for shared memory systems and how directory-based protocols improve over snoopy protocols.

Uploaded by

digvijay dhole

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views54 pages

CH 3 Multithreading

Uploaded by

digvijay dhole

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 54

Ch 3

Multithreading

PCA:Ch 3 Multithreading Prepared by: Priyanka More 1

Application Level: Data vs Task Parallelism

PCA:Ch 3 Multithreading Prepared by: Priyanka More 2

Thread Level Parallelism

PCA:Ch 3 Multithreading Prepared by: Priyanka More 3

Data Level Parallelism

PCA:Ch 3 Multithreading Prepared by: Priyanka More 4

Computer Hardware Level Parallelism

PCA:Ch 3 Multithreading Prepared by: Priyanka More 5

Flynn’s Classification

PCA:Ch 3 Multithreading Prepared by: Priyanka More 6

PCA:Ch 3 Multithreading Prepared by: Priyanka More 7
PCA:Ch 3 Multithreading Prepared by: Priyanka More 8
PCA:Ch 3 Multithreading Prepared by: Priyanka More 9
PCA:Ch 3 Multithreading Prepared by: Priyanka More 10
Instruction stream and Data stream

• In the entire cycle of instruction execution a flow of instructions from main memory to
Central Processing Unit is established. This flow of instructions is known as instruction
stream.

• There is a flow of operands between processor and memory bi-directionally. This flow of
operands is known as data stream.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 11

Limitations of Memory System Performance

• Memory system, and not processor speed, is often the bottleneck for many
applications.
• Memory system performance is largely captured by two parameters, latency and
bandwidth.
• Latency is the time from the issue of a memory request to the time the data is
available at the processor.
• Bandwidth is the rate at which data can be pumped to the processor by the memory
system.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 12

Memory System Performance: Bandwidth and Latency

• It is very important to understand the difference between latency and bandwidth.

• Consider the example of a fire-hose. If the water comes out of the hose two seconds
after the hydrant is turned on, the latency of the system is two seconds.
• Once the water starts flowing, if the hydrant delivers water at the rate of 5
gallons/second, the bandwidth of the system is 5 gallons/second.
• If you want immediate response from the hydrant, it is important to reduce latency.
• If you want to fight big fires, you want high bandwidth.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 13

Latency Hiding Techniques
• Parallel and scalable systems typically use distributed shared memory. The access of remote
memory significantly increases memory latency
• The processor speed has been increasing at a much faster rate than memory speed
• Three latency hiding mechanisms are used to increase scalability and programmability.
1. Pre-fetching techniques
2. Coherent caches
3. Multiple Context Processors
• Using prefetching technique: which bring instruction or data close to the processor before they
are actually needed.
• Using coherent caches: supported by hardware to reduce caches misses.
• Using multiple context processors: to allow a processor to switch from one context to another.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 14

PCA:Ch 3 Multithreading Prepared by: Priyanka More 15
PCA:Ch 3 Multithreading Prepared by: Priyanka More 16
PCA:Ch 3 Multithreading Prepared by: Priyanka More 17
PCA:Ch 3 Multithreading Prepared by: Priyanka More 18
PCA:Ch 3 Multithreading Prepared by: Priyanka More 19
PCA:Ch 3 Multithreading Prepared by: Priyanka More 20
PCA:Ch 3 Multithreading Prepared by: Priyanka More 21
PCA:Ch 3 Multithreading Prepared by: Priyanka More 22
2. Cache Coherence in Multiprocessor Systems

• Interconnects provide basic mechanisms for data transfer.

• In the case of shared address space machines, additional hardware is required to
coordinate access to data that might have multiple copies in the network.
• The underlying technique must provide some guarantees on the semantics.
• This guarantee is generally one of serializability, i.e., there exists some serial order of
instruction execution that corresponds to the parallel schedule.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 23

Cache Coherence Protocols

1. Snoopy Bus Protocol

2. Directory based Protocol

PCA:Ch 3 Multithreading Prepared by: Priyanka More 24

PCA:Ch 3 Multithreading Prepared by: Priyanka More 25
PCA:Ch 3 Multithreading Prepared by: Priyanka More 26
PCA:Ch 3 Multithreading Prepared by: Priyanka More 27
PCA:Ch 3 Multithreading Prepared by: Priyanka More 28
PCA:Ch 3 Multithreading Prepared by: Priyanka More 29
Performance of Snoopy Caches

• Once copies of data are tagged dirty, all subsequent operations can be performed
locally on the cache without generating external traffic.
• If a data item is read by a number of processors, it transitions to the shared state in the
cache and all subsequent read operations become local.
• If processors read and update data at the same time, they generate coherence
requests on the bus - which is ultimately bandwidth limited.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 30

Directory Based Systems

• In snoopy caches, each coherence operation is sent to all processors. This is an

inherent limitation.
• Why not send coherence requests to only those processors that need to be
notified?
• This is done using a directory, which maintains a presence vector for each data
item (cache line) along with its global state.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 31

PCA:Ch 3 Multithreading Prepared by: Priyanka More 32
PCA:Ch 3 Multithreading Prepared by: Priyanka More 33
PCA:Ch 3 Multithreading Prepared by: Priyanka More 34
PCA:Ch 3 Multithreading Prepared by: Priyanka More 35
PCA:Ch 3 Multithreading Prepared by: Priyanka More 36
PCA:Ch 3 Multithreading Prepared by: Priyanka More 37
Directory Based Systems

Architecture of typical directory based systems: (a) a centralized directory

(b) a distributed directory.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 38

Performance of Directory Based Schemes

• The need for a broadcast media is replaced by the directory.

• The additional bits to store the directory may add significant overhead.
• The underlying network must be able to carry all the coherence requests.
• The directory is a point of contention, therefore, distributed directory schemes must
be used.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 39

3. Multiple context processors

• Consider the problem of browsing the web on a very slow network connection. We
deal with the problem in one of three possible ways:
– we anticipate which pages we are going to browse ahead of time and issue
requests for them in advance;
– we open multiple browsers and access different pages in each browser, thus while
we are waiting for one page to load, we could be reading others; or
– we access a whole bunch of pages in one go - amortizing the latency across
various accesses.
• The first approach is called prefetching, the second multithreading, and the third one

corresponds to spatial locality in accessing memory words .

PCA:Ch 3 Multithreading Prepared by: Priyanka More 40
3. Multithreading for Latency Hiding
A thread is a single stream of control in the flow of a program.
We illustrate threads with a simple example:

for (i = 0; i < n; i++)

c[i] = dot_product(get_row(a, i), b);

Each dot-product is independent of the other, and therefore represents a concurrent unit
of execution. We can safely rewrite the above code segment as:

for (i = 0; i < n; i++)

c[i] = create_thread(dot_product,get_row(a, i), b);

PCA:Ch 3 Multithreading Prepared by: Priyanka More 41

Multithreading for Latency Hiding: Example

• In the code, the first instance of this function accesses a pair of vector elements and
waits for them.
• In the meantime, the second instance of this function can access two other vector
elements in the next cycle, and so on.
• After l units of time, where l is the latency of the memory system, the first function
instance gets the requested data from memory and can perform the required
computation.
• In the next cycle, the data items for the next function instance arrive, and so on. In this
way, in every clock cycle, we can perform a computation.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 42

PCA:Ch 3 Multithreading Prepared by: Priyanka More 43
PCA:Ch 3 Multithreading Prepared by: Priyanka More 44
PCA:Ch 3 Multithreading Prepared by: Priyanka More 45
PCA:Ch 3 Multithreading Prepared by: Priyanka More 46
PCA:Ch 3 Multithreading Prepared by: Priyanka More 47
PCA:Ch 3 Multithreading Prepared by: Priyanka More 48
PCA:Ch 3 Multithreading Prepared by: Priyanka More 49
PCA:Ch 3 Multithreading Prepared by: Priyanka More 50
Communication Model of Parallel Platforms

• There are two primary forms of data exchange between parallel tasks - accessing
a shared data space and exchanging messages.
• Platforms that provide a shared data space are called shared-address-space
machines or multiprocessors.
• Platforms that support messaging are also called message passing platforms or
multicomputers.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 51

Shared-Address-Space Platforms

• Part (or all) of the memory is accessible to all processors.

• Processors interact by modifying data objects stored in this shared-address-space.
• If the time taken by a processor to access any memory word in the system global or
local is identical, the platform is classified as a uniform memory access (UMA),
else, a non-uniform memory access (NUMA) machine.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 52

NUMA and UMA Shared-Address-Space Platforms

Typical shared-address-space architectures: (a) Uniform-memory access shared-address-space

computer; (b) Uniform-memory-access shared-address-space computer with caches and memories;
(c) Non-uniform-memory-access shared-address-space computer with local memory only.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 53

NUMA and UMA Shared-Address-Space Platforms

• The distinction between NUMA and UMA platforms is important from the point of view of
algorithm design. NUMA machines require locality from underlying algorithms for
performance.
• Programming these platforms is easier since reads and writes are implicitly visible to
other processors.
• However, read-write data to shared data must be coordinated (this will be discussed in
greater detail when we talk about threads programming).
• Caches in such machines require coordinated access to multiple copies. This leads to
the cache coherence problem.
• A weaker model of these machines provides an address map, but not coordinated
access. These models are called non cache coherent shared address space machines.
PCA:Ch 3 Multithreading Prepared by: Priyanka More 54

CSCI 8150 Advanced Computer Architecture
100% (2)
CSCI 8150 Advanced Computer Architecture
18 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
Introduction To Distributed Operating Systems Communication in Distributed Systems
No ratings yet
Introduction To Distributed Operating Systems Communication in Distributed Systems
150 pages
10 Multithreading
No ratings yet
10 Multithreading
60 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
91 pages
Thysseen Elevator Service MC2 Manual
100% (1)
Thysseen Elevator Service MC2 Manual
47 pages
Chapter 8 - Parallel Processing
No ratings yet
Chapter 8 - Parallel Processing
50 pages
Module - 6
No ratings yet
Module - 6
89 pages
Lecture 8 Miscellaneous Topics
No ratings yet
Lecture 8 Miscellaneous Topics
52 pages
Distributed System CH3-5
No ratings yet
Distributed System CH3-5
61 pages
NC & CNC E Book
No ratings yet
NC & CNC E Book
16 pages
Chapter 3-Process
No ratings yet
Chapter 3-Process
25 pages
Ds Ch3, Process
No ratings yet
Ds Ch3, Process
27 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
MCS203 Final Revision Notes PYQ
No ratings yet
MCS203 Final Revision Notes PYQ
5 pages
Unit 1
No ratings yet
Unit 1
25 pages
05 Multiprocessor
No ratings yet
05 Multiprocessor
54 pages
1 Introduction
No ratings yet
1 Introduction
30 pages
Cs8083 Unit II Notes
No ratings yet
Cs8083 Unit II Notes
23 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Lec13 Multiprocessors
No ratings yet
Lec13 Multiprocessors
69 pages
Study Guide (1) 2
No ratings yet
Study Guide (1) 2
18 pages
Chapter 02 - Asynchronous and Parallel Programming in
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in
55 pages
Basic Components of A Parallel (Or Serial) Computer: Processors
No ratings yet
Basic Components of A Parallel (Or Serial) Computer: Processors
14 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
Chapter 7
No ratings yet
Chapter 7
97 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
ACA Assignment 4
No ratings yet
ACA Assignment 4
16 pages
Chapter 5 - Shared Memory Multiprocessor
No ratings yet
Chapter 5 - Shared Memory Multiprocessor
96 pages
Intel CPU Socket Types - Intel Processor Socket List With Photos
No ratings yet
Intel CPU Socket Types - Intel Processor Socket List With Photos
9 pages
09 Communication Models of Parallel Platforms
No ratings yet
09 Communication Models of Parallel Platforms
25 pages
Chapter 3 Processes
No ratings yet
Chapter 3 Processes
19 pages
11 IP Study Material
No ratings yet
11 IP Study Material
120 pages
PA Midsem
No ratings yet
PA Midsem
20 pages
Mscs6060 Parallel and Distributed Systems
No ratings yet
Mscs6060 Parallel and Distributed Systems
50 pages
Lecture 16
No ratings yet
Lecture 16
30 pages
Memory Arbitration and Cache Management in Stream-Based Systems
No ratings yet
Memory Arbitration and Cache Management in Stream-Based Systems
6 pages
Concurrency: CS2403 Programming Languages
No ratings yet
Concurrency: CS2403 Programming Languages
44 pages
COA Complete (Shorthand Notes)
No ratings yet
COA Complete (Shorthand Notes)
21 pages
Chapter 3, Process
No ratings yet
Chapter 3, Process
24 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
Chap2 Slides Week3
No ratings yet
Chap2 Slides Week3
28 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Unit 5 (Slides)
No ratings yet
Unit 5 (Slides)
75 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
What Is A Computer?: - Computer: A Collection of Electronic Switches That Can
No ratings yet
What Is A Computer?: - Computer: A Collection of Electronic Switches That Can
28 pages
Topics: Cache Innovations (Sections 2.4, B.4, B.5), Virtual Memory Intro
No ratings yet
Topics: Cache Innovations (Sections 2.4, B.4, B.5), Virtual Memory Intro
20 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
Vdocuments - MX Controlwave Micro Instruction Manual d301392x012
No ratings yet
Vdocuments - MX Controlwave Micro Instruction Manual d301392x012
172 pages
PLC S7 400 - FM458 1DP - e
No ratings yet
PLC S7 400 - FM458 1DP - e
10 pages
Cos Po1 Po2 Po3 Po4 Po5 Po6 Po7 Po8 Po9 Po10 Po11 Po 12 Pso1 Pso2 Co1 Co2 Co3 Co4 Co5 Average (Rounded To Nearest Integer)
No ratings yet
Cos Po1 Po2 Po3 Po4 Po5 Po6 Po7 Po8 Po9 Po10 Po11 Po 12 Pso1 Pso2 Co1 Co2 Co3 Co4 Co5 Average (Rounded To Nearest Integer)
127 pages
SPSC Past Paper Computer Science Feb 2011
No ratings yet
SPSC Past Paper Computer Science Feb 2011
45 pages
Final Report: Multicore Processors
No ratings yet
Final Report: Multicore Processors
12 pages
MultiProcessors Tanenbaum BP
No ratings yet
MultiProcessors Tanenbaum BP
29 pages
Chapter 3 Processes2
No ratings yet
Chapter 3 Processes2
33 pages
Multithreading: An Operating System Analysis
No ratings yet
Multithreading: An Operating System Analysis
10 pages
2 Key Concepts: Assignments
No ratings yet
2 Key Concepts: Assignments
18 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Multi Core Processor
No ratings yet
Multi Core Processor
11 pages
BP 2015 Microsoft SQL Server
No ratings yet
BP 2015 Microsoft SQL Server
55 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
Shared Memory Architecture Concepts and Performance Issues: Outline
No ratings yet
Shared Memory Architecture Concepts and Performance Issues: Outline
7 pages
0 - MPMC Lab Manual A.Y. 2020-21 R-18
No ratings yet
0 - MPMC Lab Manual A.Y. 2020-21 R-18
53 pages
ReadMe STEP7 Professional V14 enUS PDF
No ratings yet
ReadMe STEP7 Professional V14 enUS PDF
74 pages
Performance Tuning An HPC Cluster - FINAL 012009
No ratings yet
Performance Tuning An HPC Cluster - FINAL 012009
41 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
8085 - Signal Description PDF
No ratings yet
8085 - Signal Description PDF
45 pages
Dok TD MRD1 Ge
No ratings yet
Dok TD MRD1 Ge
56 pages
Computer Organization Architecture RCS 302
No ratings yet
Computer Organization Architecture RCS 302
2 pages
(Symbolic Computation - Artificial Intelligence) M. M. Botvinnik (Auth.) - Computers in Chess - Solving Inexact Search Problems-Springer-Verlag New York (1984)
No ratings yet
(Symbolic Computation - Artificial Intelligence) M. M. Botvinnik (Auth.) - Computers in Chess - Solving Inexact Search Problems-Springer-Verlag New York (1984)
169 pages
32 Bit ARM9 Processor Architecture: Programmers Model
No ratings yet
32 Bit ARM9 Processor Architecture: Programmers Model
4 pages
Press University Press: Introduction To Programming
No ratings yet
Press University Press: Introduction To Programming
12 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
9 pages
In Process
No ratings yet
In Process
16 pages
Os Unit III 2018 Ars
No ratings yet
Os Unit III 2018 Ars
102 pages
Difference Between Digital Signal Processor (DSP) and Microprocessor (UP)
No ratings yet
Difference Between Digital Signal Processor (DSP) and Microprocessor (UP)
10 pages
CNC Machine Tools: Salunke M.D
No ratings yet
CNC Machine Tools: Salunke M.D
22 pages
There Are Three Types of Buses
No ratings yet
There Are Three Types of Buses
2 pages
Bell's Law 2013
No ratings yet
Bell's Law 2013
68 pages
Ch.1 - Introduction: Study Guide To Accompany Operating Systems Concepts 9
No ratings yet
Ch.1 - Introduction: Study Guide To Accompany Operating Systems Concepts 9
7 pages
Horizontally Scaling and Vertically Scaling
No ratings yet
Horizontally Scaling and Vertically Scaling
4 pages
PracticeAssignment5 Sol 15573 PDF
No ratings yet
PracticeAssignment5 Sol 15573 PDF
8 pages
Computer Architecture Notes VIII
No ratings yet
Computer Architecture Notes VIII
10 pages
Mastering System Programming with C: Files, Processes, and IPC
From Everand
Mastering System Programming with C: Files, Processes, and IPC
Larry Jones
No ratings yet
Accelerated Computing With HIP: Second Edition
From Everand
Accelerated Computing With HIP: Second Edition
Yifan Sun
No ratings yet
Confluent Certified Developer for Apache Kafka® Exam kit
From Everand
Confluent Certified Developer for Apache Kafka® Exam kit
PRIYANKA
No ratings yet
Kafka Developer Certified: The Essential Guide
From Everand
Kafka Developer Certified: The Essential Guide
SUJAN
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CH 3 Multithreading

Uploaded by

CH 3 Multithreading

Uploaded by

Ch 3

PCA:Ch 3 Multithreading Prepared by: Priyanka More 1

PCA:Ch 3 Multithreading Prepared by: Priyanka More 2

PCA:Ch 3 Multithreading Prepared by: Priyanka More 3

PCA:Ch 3 Multithreading Prepared by: Priyanka More 4

PCA:Ch 3 Multithreading Prepared by: Priyanka More 5

PCA:Ch 3 Multithreading Prepared by: Priyanka More 6

PCA:Ch 3 Multithreading Prepared by: Priyanka More 11

PCA:Ch 3 Multithreading Prepared by: Priyanka More 12

• It is very important to understand the difference between latency and bandwidth.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 13

PCA:Ch 3 Multithreading Prepared by: Priyanka More 14

• Interconnects provide basic mechanisms for data transfer.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 23

1. Snoopy Bus Protocol

PCA:Ch 3 Multithreading Prepared by: Priyanka More 24

PCA:Ch 3 Multithreading Prepared by: Priyanka More 30

• In snoopy caches, each coherence operation is sent to all processors. This is an

PCA:Ch 3 Multithreading Prepared by: Priyanka More 31

Architecture of typical directory based systems: (a) a centralized directory

PCA:Ch 3 Multithreading Prepared by: Priyanka More 38

• The need for a broadcast media is replaced by the directory.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 39

corresponds to spatial locality in accessing memory words .

for (i = 0; i < n; i++)

for (i = 0; i < n; i++)

PCA:Ch 3 Multithreading Prepared by: Priyanka More 41

PCA:Ch 3 Multithreading Prepared by: Priyanka More 42

PCA:Ch 3 Multithreading Prepared by: Priyanka More 51

• Part (or all) of the memory is accessible to all processors.

PCA:Ch 3 Multithreading Prepared by: Priyanka More 52

Typical shared-address-space architectures: (a) Uniform-memory access shared-address-space

PCA:Ch 3 Multithreading Prepared by: Priyanka More 53

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.