0% found this document useful (0 votes)

46 views23 pages

Cs8083 Unit II Notes

Uploaded by

anithakrishnakumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views23 pages

Cs8083 Unit II Notes

Uploaded by

anithakrishnakumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

CS8083 MCP UNIT II

UNIT II PARALLEL PROGRAM CHALLENGES

Performance – Scalability – Synchronization and data sharing – Data races – Synchronization
primitives (mutexes, locks, semaphores, barriers) – deadlocks and livelocks – communication
between threads (condition variables, signals, message queues and pipes).

2.1 PERFORMANCE
The performance of the system are basically computed from the performance of the
processor and memory. Performance can be determined as how quickly the instructions
are executed by the machine.
Taking performance into consideration in the initial phases of application design will lead
to a better final product
Two approaches for improving the performance of the application
✓ Performance is the problem to be solved once the program is functionally correct
✓ Performance is considered as one of the upfront specifications for the application
2.1.1 Common metrics for performance
Items per unit time
✓ Ability of the system to complete tasks rather than on the duration of each
individual task
✓ This is a measure of bandwidth
Ex: Transactions per second, jobs per hour
Time per item
✓ Measure of the time to complete a single task
✓ Measure of latency or response time
QOS Metric
Specify the expectations of the users of the system as well as penalties if the system fails
to meet these expectations
Example:
• Number of transactions of latency greater than some threshold
• Amount of time that the system is unavailable.
• Called downtime or availability

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 1

CS8083 MCP UNIT II

2.1.2 Algorithmic Complexity

• Measure of how much computation a program will perform when using a
particular algorithm
• Select the most efficient algorithms for every aspect of the code
• Algorithmic complexity is concerned with the operation count
• Does not consider the cost of the instructions
• Selection of algorithms may be whether the algorithms scale to multiple
processors
• An algorithm with low algorithmic complexity is likely to be more difficult to
implement than an algorithm with higher algorithmic complexity

2.1.3 How Structure Impacts performance

Three attributes:
▪ Build structure
▪ How the source files are combined into applications and supporting
libraries
▪ The way the data is organized in the application
Performance and Convenience Trade-Offs in Source Code and Build Structures
o Developer’s convenience is the main criteria for structuring the sources.
o Should not cause inconvenience to the user of an application
o Performance opportunities are lost when the compiler sees only a single file at a
time.
o The single file may not present the compiler with all the opportunities for
optimizations that it might have had if it were to see more of the source code.
o Compilers support inlining optimizations within the same source file
o Cross file Optimization
• To improve performance by inlining a routine from one source file into the
place where it is called in another source file
• Use either static or archive libraries as part of the build process

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 2

CS8083 MCP UNIT II

Using Libraries to Structure Applications

o Extracted Common functionality can be shared between different projects or
applications
o Only the library is upgraded instead of replacing all the executable that use the
library
o Provide better separation between interface and implementation
o Libraries can be used as a mechanism to dynamically provide enhanced
functionality
o Functionality can be made available without having to change or even restart the
application
o Functionality can be selected based on the runtime environment characteristics of
the system

Impact of Data Structures on Performance

o When an application needs an item of data, it fetches it from memory and installs
it in the cache.
o Caches hold the data that is frequently accessed. Cost of fetching data from the
cache is substantially lower than the cost of fetching it from the memory.
o Amount of data loaded into each level of cache depends on the size of the cache
line

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 3

CS8083 MCP UNIT II

Fig 2.1 Fetching data from memory into caches

2.2 SCALABILITY
• It is a measure of a parallel systems capacity to increase speed up in proportion to
the number of processors.
• A Technology is scalable if it can handle ever increasing problem sizes.
• If we increase the problem size at the same rate that we increase the number of
processes/threads, then the efficiency will be unchanged and our program is
scalable.
• Program development may take timings in order to determine if the program is
behaving as we intend.
Hardware constraints to Scaling
• Bandwidth is another resource shared between threads.
• The bandwidth capacity of a system depends on the design of the processor and
the memory system as well as the memory chips and their location in the system.
• Two systems can contain the same processor and same motherboard yet have two
different measurements for the bandwidth.
False Sharing
• Situation where multiple threads are accessing items of data held on a single cache
line.

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 4

CS8083 MCP UNIT II

• To solve false sharing, Accessed data structures can use padding so that the
variable used by each thread resides on a separate cache line.
Operating system constraints to scaling
• When there are too many active threads on a system, its not a constraint on the
absolute number of threads on the system, constraint only on the number of active
threads.
• If there are continuously more threads requiring CPU time than there are virtual
CPUs, then the system may be considered to be oversubscribed.
• Multiple threads share the same virtual CPU will typically lead to a greater than
linear reduction in performance because of a number of factors
• Overheads due to the cost of switching context between the active threads and also
the cost associated with the migration of threads between virtual CPUs.
• Threads that are sleeping or blocked waiting for a resource do not consume CPU.
Should not impact the performance of the active threads.
• Using Processor Binding to improve Memory Locality
• Thread migration is the situation where a thread starts running on one virtual
processor but ends up migrated to a different virtual processor.
• If the new virtual processor shares the same core or if its shares cache with the
original virtual processor, then the cost of migration is low.
• For a multicore processor, effect of thread migration tends to be minor since the
new location of the thread shares a close level of cache with the old location of the
thread.
• If the migration is from one physical chip to another, cost can be quite high`
• Processor binding is undertaken to improve performance
• Also restricts the freedom of the operating system to dynamically schedule for the
best performance. It lead to the bound application taking many times longer than
its unbound run time.
• Priority Inversion

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 5

CS8083 MCP UNIT II

• All the processes and threads can have some associated priority. Operating system
used to determine how much CPU time to assign to the thread.
• Thread with higher priority will get more CPU time than a thread with a lower
priority.
• Also used to determine how time is distributed between the threads that comprise
the application
2.3 SYNCHRONIZATION AND DATA SHARING
• Situations where multiple threads are updating the same data in an unsafe way.
• Most common programming error found in parallel code.
• One way to avoid data races is by utilizing proper synchronization between
threads
2.3.1 Data races
Data races are the most common error found in parallel code. A data race occurs when
multiple threads use the same data item and one or more of them are updating it. It is best
illustrated by an example:

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 6

CS8083 MCP UNIT II

In the example, each thread adds 4 to the variable, because they do it at exactly the same
time, the value 14 ends up being stored into the variable. If the two threads had executed
the code at different times, then the variable would have ended up with the value of 18.
This is the situation where both threads are running simultaneously. This illustrates a
common kind of data race and possibly the easiest one to visualize.

2.3.2 The tools used for detecting data races

“Using POSIX Threads.” The code creates two threads, both of which execute the routine
func(). The main thread then waits for both the child threads to complete their work. Both
threads will attempt to increment the variable counter. We can compile this code with
GNU gcc and then use Helgrind, which is part of the Valgrind1 suite, to identify the
data race. Valgrind is a tool that enables an application to be instrumented and its
runtime behavior examined. The Helgrind tool uses this instrumentation to gather data
about data races. Prgram below shows the output from Helgrind.

Fig 2.2 Code containing data race

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 7

CS8083 MCP UNIT II

Fig 2.3 Using Helgrind to detect data races

The output from Helgrind shows that there is a potential data race between two threads,
both executing line 7 in the file race.c.

Another tool that is able to detect potential data races is the Thread Analyzer in Oracle
Solaris Studio. This tool requires an instrumented build of the application, data collection
is done by the collect tool, and the graphical interface is launched with the command tha.
Listing 4.5 shows the steps to do this.

Listing 4.5 Detecting data races using the Sun studio Thread Analyzer

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 8

CS8083 MCP UNIT II

Fig 2.4 List of data races detected by the Solaris Studio Thread Analyzer
The initial screen of the tool displays a list of data races, as shown in Figure 2.4. Once the
user has identified the data race they are interested in, they can view the source code for
the two locations in the code where the problem occurs.
Thread Analyzer GUI
Races tab – Shows a list of data races detected in the program and associated call stack
traces. Tab is selected by default
Dual source tab – Shows the two source locations corresponding to the two accesses of a
selected data race.
Experiments tab – Shows the load objects in the experiment, and lists error and warning
messages
2.3.3 Avoiding data races
• Place a Synchronization lock around all accesses to that variable.
• Ensure the thread must acquire the lock before referencing the variable
• Example
• Void *func(void *params)
• { pthread_mutex_lock(&mutex);
• Counter++;
• Pthread_mutex_unlock(&mutex);
• }
• Uses a mutex lock to protect accesses to the variable counter

2.4 SYNCHRONIZATION PRIMITIVES

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 9

CS8083 MCP UNIT II

Synchronization is used to coordinate the activity of multiple threads. Most operating

systems provide a rich set of synchronization primitives. It is usually most appropriate to
use these rather than attempting to write custom methods of synchronization. The tools
will be able to do a better job of detecting data races or correctly labeling synchronization
costs.

2.4.1 Mutexes and Critical regions

• Mutex stands for mutually exclusive
• Only one thread at a time can acquire a lock
• Placed around a data structure to ensure that the data structure is modified
by only one thread a time.
The following code shows how a mutex lock could be used to protect access to a
variable.
In the example, the two routines Increment() and Decrement() will either increment or
decrement the variable counter. To modify the variable, a thread has to first acquire the
mutex lock. Only one thread at a time can do this; all the other threads that want to
acquire the lock need to wait until the thread holding the lock releases it.
Both routines use the same mutex; consequently, only one thread at a time can either
increment or decrement the variable counter. If multiple threads are attempting to acquire
the same mutex at the same time, then only one thread will succeed, and the other threads
will have to wait. This situation is known as a contended mutex.

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 10

CS8083 MCP UNIT II

Placing Mutex locks around accesses to variables

In the example, the two routines Increment() and Decrement() will either increment or
decrement the variable counter. To modify the variable, a thread has to first acquire the
mutex lock. Only one thread at a time can do this; all the other threads that want to
acquire the lock need to wait until the thread holding the lock releases it.
Both routines use the same mutex; consequently, only one thread at a time can either
increment or decrement the variable counter. If multiple threads are attempting to acquire
the same mutex at the same time, then only one thread will succeed, and the other threads
will have to wait. This situation is known as a contended mutex.

If all the calls to malloc() are replaced with the threadSafeMalloc() call, then only one

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 11

CS8083 MCP UNIT II

thread at a time can be in the original malloc() code, and the calls to malloc() become
thread-safe. Threads block if they attempt to acquire a mutex lock that is already held by
another thread.
Blocking means that the threads are sent to sleep either immediately or after a few
unsuccessful attempts to acquire the mutex. One problem with this approach is that it can
serialize a program.
If multiple threads simultaneously call threadSafeMalloc(), only one thread at a time will
make progress. This causes the multithreaded program to have only a single executing
thread, which stops the program from taking advantage of multiple cores.
Spin Locks
Spin locks are essentially mutex locks. The difference between a mutex lock and a spin
lock is that a thread waiting to acquire a spin lock will keep trying to acquire the lock
without sleeping.
The advantage of using spin locks is that they will acquire the lock as soon as it is
released, whereas a mutex lock will need to be woken by the operating system before it
can get the lock.
The disadvantage is that a spin lock will spin on a virtual CPU monopolizing that
resource. In comparison, a mutex lock will sleep and free the virtual CPU for another
thread to use.
Semaphores
Semaphores are counters that can be either incremented or decremented. An example
might be a buffer that has a fixed size. Every time an element is added to a buffer, the
number of available positions is decreased. Every time an element is removed, the
number available is increased.
Semaphores can also be used to mimic mutexes; if there is only one element in the
semaphore, then it can be either acquired or available, exactly as a mutex can be either
locked or unlocked.

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 12

CS8083 MCP UNIT II

Semaphores will also signal or wake up threads that are waiting on them to use available
resources; hence, they can be used for signaling between threads. For example, a thread
might set a semaphore once it has completed some initialization.
Other threads could wait on the semaphore and be signaled to start work once the
initialization is complete.
Semaphore can be acquired by wait operation and can be released by signal operation.
The wait operation decrements the value of semaphore variable by 1. If the value is less
than or equal to zero, then the operation blocks until the value of the semaphore becomes
positive. The signal or post operation increments the value of the semaphore by 1.
Readers-Writer Locks
A readerswriter lock (or multiple-reader lock) allows many threads to read the shared
data but can then lock the readers threads out to allow one thread to acquire a writer lock
to modify the data. A writer cannot acquire the write lock until all the readers have
released their reader locks.
For this reason, the locks tend to be biased toward writers; as soon as one is queued, the
lock stops allowing further readers to enter. This action causes the number of readers
holding the lock to diminish and will eventually allow the writer to get exclusive access
to the lock.

Barriers

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 13

CS8083 MCP UNIT II

There are situations where a number of threads have to all complete their work before
any of the threads can start on the next task.
For example, suppose a number of threads compute the values stored in a matrix. The
variable total needs to be calculated using the values stored in the matrix. A barrier can
be used to ensure that all the threads complete their computation of the matrix before the
variable total is calculated. The code below shows a situation using a barrier to separate
the calculation of a variable from its use.

The variable total can be computed only when all threads have reached the barrier. This
avoids the situation where one of the threads is still completing its computations while
the other threads start using the results of the calculations.
Notice that another barrier could well be needed after the computation of the value for
total if that value is then used in further calculations

2.5 DEAD LOCKS AND LIVELOCKS

Deadlocks – occurs when two or more threads cannot proceed its task because of the
resources required by the thread, may be held by other thread, that thread may also be
waiting for the resources which may be held by some other thread.

Fig 2.5 Deadlocks

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 14

CS8083 MCP UNIT II

Example:
• Two threads are deadlocked
• If thread1 has already acquired lock A and thread2 has already acquired lock B,
then
• A cannot make forward progress because it is waiting for lock B
• Thread2 cannot make progress because it is waiting for lock A

Deadlock occurs
• Two routines update1() and update2() each have an outer loop
• Routine update1() acquires lock A and then attempts to acquire lock B
• Routine update2() acquires lock B and then attempts to acquire lock A
Avoiding deadlocks
• Ensure that threads always acquire the locks in the same order
• If thread2 acquired the locks in the order A and then B,
• Stall while waiting for lock A without having first acquired lock B
• Enable thread1 to acquire B and then eventually release both locks allowing
thread2 to make progress
LIVELOCKS
• A condition that occurs when two or more threads continually change their state in
response to changes in the other threads.

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 15

CS8083 MCP UNIT II

• Livelock is different from deadlock in a way that both the processes involved in
livelock are repeatedly changing their states with regard to each other and not
progressing.
• Traps threads in an unending loop releasing and acquiring locks
• Caused by code to back out of deadlocks.
• Threads are unable to make progress although they are not blocked
• Tasks enter infinite loop of operation that lead to nothing
Example:
• Two people on Landing
• Each time one takes a step to the side, the other mirrors that step, neither moves
forward
This can be avoided by ensuring that only one process (chosen randomly or by priority)
takes action.
Mechanism to avoid Deadlock
▪ If the thread cannot obtain the second lock it requires, it releases the lock that it
already holds.
▪ canAcquire() returns immediately either having acquired the lock or having failed
to acquire the lock.
▪ Each thread acquires a lock and then attempts to acquire the second lock that it
needs.
▪ If it fails to acquire the second lock
o Releases the holding lock before attempting to acquire both locks again
▪ When the thread successfully acquire both the locks
o Exits the loop
o Application will make no forward progress

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 16

CS8083 MCP UNIT II

2.6 COMMUNICATION BETWEEN THREADS AND PROCESSES

The parallel processing requires communication between either the threads or processes.
The popular mechanisms are:
1. Memory Shared memory and Memory mapped files
2. Condition Variables
3. Signals and events
4. Message Queues
5. Named Pipes
1. Memory Shared memory and Memory mapped files
▪ The easiest way for multiple threads to communicate is through memory
▪ Multiple threads can access the data available in the shared memory space.

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 17

CS8083 MCP UNIT II

▪ Same memory region can be shared by two or more processes.

▪ One process will create a shared memory portion and the other process can access
it.
▪ The Operating system will involve in mapping a memory segment in the address
space of processes
▪ After mapping, all the processes can perform read and write operations in the
shared memory segment or region
▪ The process can create, delete or open this memory using a shared memory object.
▪ One process will create a shared memory region using a library call
Open_shared_Memory() with unique descriptor.
▪ The descriptor can be the name of the file in the file system. This function call
return an identifier, also called as shared memory object, used to map the shared
memory region into the address space of the client application.
▪ If more processes are accessing the share memory segment, then the last process
will delete the shared region after accessing it using Delete_shared_Memory()
ID=Open shared memory (Descriptor);
Memory=Map Shared Memory(ID);
---
Memory[100]++
---
Close Shared Memory(ID);
Delete Shared Memory(Descriptor);

The pseudo code below depicts the procedure for using an existing shared memory
region. Each process should close the shared memory segment after using it
ID = Open Shared Memory( Descriptor );
Memory = Map Shared Memory( ID );
...
Close Shared Memory( ID );

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 18

CS8083 MCP UNIT II

2. Condition Variables
• Communicate readiness between threads by enabling a thread to be woken up
when a condition becomes true.
• Without condition variables, the waiting thread would have to use some form of
polling to check whether the condition had become true
• Example: Producer Consumer problem
The condition variable is used in solving Producer-consumer problem. The producer
thread produces data item and add into a queue(Enqueue). The consumer thread
consumes data from the queue. Before adding data, the producer has to check whether the
queue has an empty position to accommodate the new data item. Before consuming data
item, the consumer will check whether the queue contains data item in the queue or not.
These conditions are checked using condition variables.
A producer can produce one item while the consumer is consuming another item. The
process of producer and consumer must be synchronized. Therefore before adding an
item in the queue, the producer thread will acquire mutex lock. After adding the item in
the queue, the thread will release the mutex lock. If there is only one item exists in the
queue, then the producer thread will wakeup only one consumer thread by signalling it.
Producer Thread Adding an Item to the Queue
Acquire Mutex();
Add Item to Queue();
If ( Only One Item on Queue )
{
Signal Conditions Met();
}
Release Mutex();
If there are no items in the queue, then the consumer thread will go to sleep mode and has
to be woken up when item is added in the queue by signalling the mutex lock by the
producer thread. If there are more items exist in the queue, then the consumer thread can

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 19

CS8083 MCP UNIT II

consume the items one by one. The consumer thread will release the mutex lock either if
there are no items in the queue or all the items are consumed by the consumer thread.
Consumer Thread removing Items from Queue
Acquire Mutex();
Repeat Item = 0;
If ( No Items on Queue() )
{
Wait on Condition Variable();
}
If (Item on Queue())
{
Item = remove from Queue();
}
Until ( Item != 0 );
Release Mutex();

Consumer Thread Code with Potential Lost Wake-Up Problem

Repeat
Item = 0;
If ( No Items on Queue() )
{
Acquire Mutex();
Wait on Condition Variable();
Release Mutex();
}
Acquire Mutex();
If ( Item on Queue() )
{
Item = remove from Queue();

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 20

CS8083 MCP UNIT II

}
Release Mutex();
Until ( Item!=0 );
▪ The problem with the code is the first if condition. If there are no items on the
queue, then the mutex lock is acquired, and the thread waits on the condition
variable.
▪ However, the producer thread could have placed an item and signaled the
consumer thread between the consumer thread executing the if statement and
acquiring the mutex.
▪ When this happens, the consumer thread waits on the condition variable
indefinitely because the producer thread, signals only when it places the first item
into the queue.
▪ Lost wake-up Occurs when the signal to wakeup the waiting thread is sent before
the thread is ready to receive it
3. Signals and events
▪ UNIX mechanism where one process can send a signal to another process when an
event occur.
▪ Have a handler in the receiving process to perform some task upon the receipt of
the message
Examples:
▪ Signals
o Pressing ^c
o Causes a SIGKILL signal to be sent to the process
o Stop the running application
▪ Events
o Handling of Keyboard presses
o Pressing one of the mouse buttons will send a click event to the target
window

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 21

CS8083 MCP UNIT II

4. Message queues
▪ A structure that can be shared between multiple processes
▪ Messages can be placed into the queue and will be removed in the same order
in which they were added
▪ Like shared memory segment, message queue also uses a descriptor which is
the location of the file in the file system.
▪ The descriptor is mainly used to create the message queue and also used to
attach with the existing message queue.
▪ Messages that to be shared is put into message queue. The first message is
located in the first position of the queue and can be retrieved first.
Example:
Creating and Placing Messages into a Queue
ID = Open Message Queue Queue(Descriptor );
Put Message in Queue( ID, Message );
...
Close Message Queue( ID );
Delete Message Queue( Description );
Opening a Queue and Receiving Messages
ID=Open Message Queue ID(Descriptor);
Message=Remove Message from Queue(ID);
...
Close Message Queue(ID);

5. Named pipes
▪ Pipe – Used to pass data from one process to another
▪ Example
▪ ls- lists all files in a directory
▪ wc – counts the number of lines, words, and characters in the input
▪ command ls is piped into wc command

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 22

CS8083 MCP UNIT II

o count the number of files in the directory

▪ Named pipes
▪ File like objects that are given a specific name that can be shared between
processes
▪ Concept
▪ Pipe is opened
▪ Data is written into it or read from it.
▪ Then pipe is closed
Setting Up and Writing into a Pipe
Make Pipe( Descriptor );
ID = Open Pipe(Descriptor );
Write Pipe( ID, Message, sizeof(Message) ); ...
Close Pipe( ID );
Delete Pipe( Descriptor );
Opening an Existing Pipe to Receive Messages ID=Open Pipe( Descriptor );
Read Pipe( ID, buffer, sizeof(buffer) );
...
Close Pipe( ID );

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 23

Updated - CS8083 MCP UNIT II Notes
No ratings yet
Updated - CS8083 MCP UNIT II Notes
23 pages
CP4253 Map Unit Ii
No ratings yet
CP4253 Map Unit Ii
23 pages
Unit II
No ratings yet
Unit II
17 pages
Lec01 1 Introduction
No ratings yet
Lec01 1 Introduction
36 pages
Chapter 8 - Parallel Processing
No ratings yet
Chapter 8 - Parallel Processing
50 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
MCP-Unit 2
No ratings yet
MCP-Unit 2
77 pages
1st Ia Preparation
No ratings yet
1st Ia Preparation
15 pages
Lec13 Multiprocessors
No ratings yet
Lec13 Multiprocessors
69 pages
High Performance Computing (HPC) - Lec2
No ratings yet
High Performance Computing (HPC) - Lec2
53 pages
Module 2
No ratings yet
Module 2
127 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
HPC Lecture 2 Points
No ratings yet
HPC Lecture 2 Points
7 pages
Lecture1 Introduction PDF
No ratings yet
Lecture1 Introduction PDF
43 pages
High-Level Optimizations: Embedded System Optimization
No ratings yet
High-Level Optimizations: Embedded System Optimization
5 pages
Operating System 6
No ratings yet
Operating System 6
16 pages
OS Module2 Unit2
No ratings yet
OS Module2 Unit2
43 pages
24csppc202 Multicore Architecture and Programming
No ratings yet
24csppc202 Multicore Architecture and Programming
21 pages
Amdahl's Law: S (N) T (1) /T (N)
No ratings yet
Amdahl's Law: S (N) T (1) /T (N)
46 pages
Parallel Computing Platforms-Dr Nausheen
No ratings yet
Parallel Computing Platforms-Dr Nausheen
47 pages
Unit 4
No ratings yet
Unit 4
42 pages
Prebook MCAP
No ratings yet
Prebook MCAP
11 pages
Parallel Computing
100% (1)
Parallel Computing
12 pages
OpenMP Performance Consideration
No ratings yet
OpenMP Performance Consideration
49 pages
Parallel Processing
No ratings yet
Parallel Processing
127 pages
Chap2 Slides
No ratings yet
Chap2 Slides
127 pages
CA Final PDF
No ratings yet
CA Final PDF
13 pages
CSE211 Computer Architecturemodule 18-21
No ratings yet
CSE211 Computer Architecturemodule 18-21
19 pages
HPC Unit 2
No ratings yet
HPC Unit 2
72 pages
Parallel Programming Platforms: Alexandre David 1.2.05
No ratings yet
Parallel Programming Platforms: Alexandre David 1.2.05
30 pages
Perfbook-Eb 2023 06 11a
No ratings yet
Perfbook-Eb 2023 06 11a
1,432 pages
Lecture (2) .PPT-1
100% (1)
Lecture (2) .PPT-1
19 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
HPC Unit 1
No ratings yet
HPC Unit 1
65 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Perfbook 1c E2 rc11
No ratings yet
Perfbook 1c E2 rc11
881 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
OS Chapter 4 Final
No ratings yet
OS Chapter 4 Final
27 pages
Unit-5 Part1
No ratings yet
Unit-5 Part1
85 pages
05 Multiprocessor
No ratings yet
05 Multiprocessor
54 pages
Threads
No ratings yet
Threads
22 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
The Parallel Book
No ratings yet
The Parallel Book
646 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
21 pages
Chapter 2 or 3 or 4 (OS)
No ratings yet
Chapter 2 or 3 or 4 (OS)
18 pages
Multithreading Analysis and Its Challanges: Ms-Cs 3 Semester
No ratings yet
Multithreading Analysis and Its Challanges: Ms-Cs 3 Semester
10 pages
CH17 COA9e Parallel Processing
No ratings yet
CH17 COA9e Parallel Processing
52 pages
1 Introduction
No ratings yet
1 Introduction
30 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
33 pages
ISA 2 Regular Solution
No ratings yet
ISA 2 Regular Solution
4 pages
Lecture 2
No ratings yet
Lecture 2
32 pages
CS 61C: Great Ideas in Computer Architecture (Machine Structures)
No ratings yet
CS 61C: Great Ideas in Computer Architecture (Machine Structures)
32 pages
Advanced Operating System: Unit I
No ratings yet
Advanced Operating System: Unit I
27 pages
Lecture 2 (Parallelism)
No ratings yet
Lecture 2 (Parallelism)
7 pages
09 Communication Models of Parallel Platforms
No ratings yet
09 Communication Models of Parallel Platforms
25 pages
ISA 2 Regular Solution-1
No ratings yet
ISA 2 Regular Solution-1
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Cs8083 Unit II Notes

Uploaded by

Cs8083 Unit II Notes

Uploaded by

CS8083 MCP UNIT II

UNIT II PARALLEL PROGRAM CHALLENGES

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 1

2.1.2 Algorithmic Complexity

2.1.3 How Structure Impacts performance

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 2

Using Libraries to Structure Applications

Impact of Data Structures on Performance

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 3

Fig 2.1 Fetching data from memory into caches

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 4

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 5

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 6

2.3.2 The tools used for detecting data races

Fig 2.2 Code containing data race

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 7

Fig 2.3 Using Helgrind to detect data races

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 8

2.4 SYNCHRONIZATION PRIMITIVES

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 9

Synchronization is used to coordinate the activity of multiple threads. Most operating

2.4.1 Mutexes and Critical regions

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 10

Placing Mutex locks around accesses to variables

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 11

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 12

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 13

2.5 DEAD LOCKS AND LIVELOCKS

Fig 2.5 Deadlocks

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 14

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 15

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 16

2.6 COMMUNICATION BETWEEN THREADS AND PROCESSES

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 17

▪ Same memory region can be shared by two or more processes.

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 18

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 19

Consumer Thread Code with Potential Lost Wake-Up Problem

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 20

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 21

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 22

o count the number of files in the directory

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 23

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.