0% found this document useful (0 votes)
46 views23 pages

Cs8083 Unit II Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views23 pages

Cs8083 Unit II Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

CS8083 MCP UNIT II

UNIT II PARALLEL PROGRAM CHALLENGES


Performance – Scalability – Synchronization and data sharing – Data races – Synchronization
primitives (mutexes, locks, semaphores, barriers) – deadlocks and livelocks – communication
between threads (condition variables, signals, message queues and pipes).

2.1 PERFORMANCE
The performance of the system are basically computed from the performance of the
processor and memory. Performance can be determined as how quickly the instructions
are executed by the machine.
Taking performance into consideration in the initial phases of application design will lead
to a better final product
Two approaches for improving the performance of the application
✓ Performance is the problem to be solved once the program is functionally correct
✓ Performance is considered as one of the upfront specifications for the application
2.1.1 Common metrics for performance
Items per unit time
✓ Ability of the system to complete tasks rather than on the duration of each
individual task
✓ This is a measure of bandwidth
Ex: Transactions per second, jobs per hour
Time per item
✓ Measure of the time to complete a single task
✓ Measure of latency or response time
QOS Metric
Specify the expectations of the users of the system as well as penalties if the system fails
to meet these expectations
Example:
• Number of transactions of latency greater than some threshold
• Amount of time that the system is unavailable.
• Called downtime or availability

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 1


CS8083 MCP UNIT II

2.1.2 Algorithmic Complexity


• Measure of how much computation a program will perform when using a
particular algorithm
• Select the most efficient algorithms for every aspect of the code
• Algorithmic complexity is concerned with the operation count
• Does not consider the cost of the instructions
• Selection of algorithms may be whether the algorithms scale to multiple
processors
• An algorithm with low algorithmic complexity is likely to be more difficult to
implement than an algorithm with higher algorithmic complexity

2.1.3 How Structure Impacts performance


Three attributes:
▪ Build structure
▪ How the source files are combined into applications and supporting
libraries
▪ The way the data is organized in the application
Performance and Convenience Trade-Offs in Source Code and Build Structures
o Developer’s convenience is the main criteria for structuring the sources.
o Should not cause inconvenience to the user of an application
o Performance opportunities are lost when the compiler sees only a single file at a
time.
o The single file may not present the compiler with all the opportunities for
optimizations that it might have had if it were to see more of the source code.
o Compilers support inlining optimizations within the same source file
o Cross file Optimization
• To improve performance by inlining a routine from one source file into the
place where it is called in another source file
• Use either static or archive libraries as part of the build process

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 2


CS8083 MCP UNIT II

Using Libraries to Structure Applications


o Extracted Common functionality can be shared between different projects or
applications
o Only the library is upgraded instead of replacing all the executable that use the
library
o Provide better separation between interface and implementation
o Libraries can be used as a mechanism to dynamically provide enhanced
functionality
o Functionality can be made available without having to change or even restart the
application
o Functionality can be selected based on the runtime environment characteristics of
the system

Impact of Data Structures on Performance


o When an application needs an item of data, it fetches it from memory and installs
it in the cache.
o Caches hold the data that is frequently accessed. Cost of fetching data from the
cache is substantially lower than the cost of fetching it from the memory.
o Amount of data loaded into each level of cache depends on the size of the cache
line

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 3


CS8083 MCP UNIT II

Fig 2.1 Fetching data from memory into caches


2.2 SCALABILITY
• It is a measure of a parallel systems capacity to increase speed up in proportion to
the number of processors.
• A Technology is scalable if it can handle ever increasing problem sizes.
• If we increase the problem size at the same rate that we increase the number of
processes/threads, then the efficiency will be unchanged and our program is
scalable.
• Program development may take timings in order to determine if the program is
behaving as we intend.
Hardware constraints to Scaling
• Bandwidth is another resource shared between threads.
• The bandwidth capacity of a system depends on the design of the processor and
the memory system as well as the memory chips and their location in the system.
• Two systems can contain the same processor and same motherboard yet have two
different measurements for the bandwidth.
False Sharing
• Situation where multiple threads are accessing items of data held on a single cache
line.

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 4


CS8083 MCP UNIT II

• To solve false sharing, Accessed data structures can use padding so that the
variable used by each thread resides on a separate cache line.
Operating system constraints to scaling
• When there are too many active threads on a system, its not a constraint on the
absolute number of threads on the system, constraint only on the number of active
threads.
• If there are continuously more threads requiring CPU time than there are virtual
CPUs, then the system may be considered to be oversubscribed.
• Multiple threads share the same virtual CPU will typically lead to a greater than
linear reduction in performance because of a number of factors
• Overheads due to the cost of switching context between the active threads and also
the cost associated with the migration of threads between virtual CPUs.
• Threads that are sleeping or blocked waiting for a resource do not consume CPU.
Should not impact the performance of the active threads.
• Using Processor Binding to improve Memory Locality
• Thread migration is the situation where a thread starts running on one virtual
processor but ends up migrated to a different virtual processor.
• If the new virtual processor shares the same core or if its shares cache with the
original virtual processor, then the cost of migration is low.
• For a multicore processor, effect of thread migration tends to be minor since the
new location of the thread shares a close level of cache with the old location of the
thread.
• If the migration is from one physical chip to another, cost can be quite high`
• Processor binding is undertaken to improve performance
• Also restricts the freedom of the operating system to dynamically schedule for the
best performance. It lead to the bound application taking many times longer than
its unbound run time.
• Priority Inversion

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 5


CS8083 MCP UNIT II

• All the processes and threads can have some associated priority. Operating system
used to determine how much CPU time to assign to the thread.
• Thread with higher priority will get more CPU time than a thread with a lower
priority.
• Also used to determine how time is distributed between the threads that comprise
the application
2.3 SYNCHRONIZATION AND DATA SHARING
• Situations where multiple threads are updating the same data in an unsafe way.
• Most common programming error found in parallel code.
• One way to avoid data races is by utilizing proper synchronization between
threads
2.3.1 Data races
Data races are the most common error found in parallel code. A data race occurs when
multiple threads use the same data item and one or more of them are updating it. It is best
illustrated by an example:

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 6


CS8083 MCP UNIT II

In the example, each thread adds 4 to the variable, because they do it at exactly the same
time, the value 14 ends up being stored into the variable. If the two threads had executed
the code at different times, then the variable would have ended up with the value of 18.
This is the situation where both threads are running simultaneously. This illustrates a
common kind of data race and possibly the easiest one to visualize.

2.3.2 The tools used for detecting data races


“Using POSIX Threads.” The code creates two threads, both of which execute the routine
func(). The main thread then waits for both the child threads to complete their work. Both
threads will attempt to increment the variable counter. We can compile this code with
GNU gcc and then use Helgrind, which is part of the Valgrind1 suite, to identify the
data race. Valgrind is a tool that enables an application to be instrumented and its
runtime behavior examined. The Helgrind tool uses this instrumentation to gather data
about data races. Prgram below shows the output from Helgrind.

Fig 2.2 Code containing data race

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 7


CS8083 MCP UNIT II

Fig 2.3 Using Helgrind to detect data races


The output from Helgrind shows that there is a potential data race between two threads,
both executing line 7 in the file race.c.

Another tool that is able to detect potential data races is the Thread Analyzer in Oracle
Solaris Studio. This tool requires an instrumented build of the application, data collection
is done by the collect tool, and the graphical interface is launched with the command tha.
Listing 4.5 shows the steps to do this.

Listing 4.5 Detecting data races using the Sun studio Thread Analyzer

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 8


CS8083 MCP UNIT II

Fig 2.4 List of data races detected by the Solaris Studio Thread Analyzer
The initial screen of the tool displays a list of data races, as shown in Figure 2.4. Once the
user has identified the data race they are interested in, they can view the source code for
the two locations in the code where the problem occurs.
Thread Analyzer GUI
Races tab – Shows a list of data races detected in the program and associated call stack
traces. Tab is selected by default
Dual source tab – Shows the two source locations corresponding to the two accesses of a
selected data race.
Experiments tab – Shows the load objects in the experiment, and lists error and warning
messages
2.3.3 Avoiding data races
• Place a Synchronization lock around all accesses to that variable.
• Ensure the thread must acquire the lock before referencing the variable
• Example
• Void *func(void *params)
• { pthread_mutex_lock(&mutex);
• Counter++;
• Pthread_mutex_unlock(&mutex);
• }
• Uses a mutex lock to protect accesses to the variable counter

2.4 SYNCHRONIZATION PRIMITIVES

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 9


CS8083 MCP UNIT II

Synchronization is used to coordinate the activity of multiple threads. Most operating


systems provide a rich set of synchronization primitives. It is usually most appropriate to
use these rather than attempting to write custom methods of synchronization. The tools
will be able to do a better job of detecting data races or correctly labeling synchronization
costs.

2.4.1 Mutexes and Critical regions


• Mutex stands for mutually exclusive
• Only one thread at a time can acquire a lock
• Placed around a data structure to ensure that the data structure is modified
by only one thread a time.
The following code shows how a mutex lock could be used to protect access to a
variable.
In the example, the two routines Increment() and Decrement() will either increment or
decrement the variable counter. To modify the variable, a thread has to first acquire the
mutex lock. Only one thread at a time can do this; all the other threads that want to
acquire the lock need to wait until the thread holding the lock releases it.
Both routines use the same mutex; consequently, only one thread at a time can either
increment or decrement the variable counter. If multiple threads are attempting to acquire
the same mutex at the same time, then only one thread will succeed, and the other threads
will have to wait. This situation is known as a contended mutex.

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 10


CS8083 MCP UNIT II

Placing Mutex locks around accesses to variables

In the example, the two routines Increment() and Decrement() will either increment or
decrement the variable counter. To modify the variable, a thread has to first acquire the
mutex lock. Only one thread at a time can do this; all the other threads that want to
acquire the lock need to wait until the thread holding the lock releases it.
Both routines use the same mutex; consequently, only one thread at a time can either
increment or decrement the variable counter. If multiple threads are attempting to acquire
the same mutex at the same time, then only one thread will succeed, and the other threads
will have to wait. This situation is known as a contended mutex.

If all the calls to malloc() are replaced with the threadSafeMalloc() call, then only one

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 11


CS8083 MCP UNIT II

thread at a time can be in the original malloc() code, and the calls to malloc() become
thread-safe. Threads block if they attempt to acquire a mutex lock that is already held by
another thread.
Blocking means that the threads are sent to sleep either immediately or after a few
unsuccessful attempts to acquire the mutex. One problem with this approach is that it can
serialize a program.
If multiple threads simultaneously call threadSafeMalloc(), only one thread at a time will
make progress. This causes the multithreaded program to have only a single executing
thread, which stops the program from taking advantage of multiple cores.
Spin Locks
Spin locks are essentially mutex locks. The difference between a mutex lock and a spin
lock is that a thread waiting to acquire a spin lock will keep trying to acquire the lock
without sleeping.
The advantage of using spin locks is that they will acquire the lock as soon as it is
released, whereas a mutex lock will need to be woken by the operating system before it
can get the lock.
The disadvantage is that a spin lock will spin on a virtual CPU monopolizing that
resource. In comparison, a mutex lock will sleep and free the virtual CPU for another
thread to use.
Semaphores
Semaphores are counters that can be either incremented or decremented. An example
might be a buffer that has a fixed size. Every time an element is added to a buffer, the
number of available positions is decreased. Every time an element is removed, the
number available is increased.
Semaphores can also be used to mimic mutexes; if there is only one element in the
semaphore, then it can be either acquired or available, exactly as a mutex can be either
locked or unlocked.

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 12


CS8083 MCP UNIT II

Semaphores will also signal or wake up threads that are waiting on them to use available
resources; hence, they can be used for signaling between threads. For example, a thread
might set a semaphore once it has completed some initialization.
Other threads could wait on the semaphore and be signaled to start work once the
initialization is complete.
Semaphore can be acquired by wait operation and can be released by signal operation.
The wait operation decrements the value of semaphore variable by 1. If the value is less
than or equal to zero, then the operation blocks until the value of the semaphore becomes
positive. The signal or post operation increments the value of the semaphore by 1.
Readers-Writer Locks
A readerswriter lock (or multiple-reader lock) allows many threads to read the shared
data but can then lock the readers threads out to allow one thread to acquire a writer lock
to modify the data. A writer cannot acquire the write lock until all the readers have
released their reader locks.
For this reason, the locks tend to be biased toward writers; as soon as one is queued, the
lock stops allowing further readers to enter. This action causes the number of readers
holding the lock to diminish and will eventually allow the writer to get exclusive access
to the lock.

Barriers

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 13


CS8083 MCP UNIT II

There are situations where a number of threads have to all complete their work before
any of the threads can start on the next task.
For example, suppose a number of threads compute the values stored in a matrix. The
variable total needs to be calculated using the values stored in the matrix. A barrier can
be used to ensure that all the threads complete their computation of the matrix before the
variable total is calculated. The code below shows a situation using a barrier to separate
the calculation of a variable from its use.

The variable total can be computed only when all threads have reached the barrier. This
avoids the situation where one of the threads is still completing its computations while
the other threads start using the results of the calculations.
Notice that another barrier could well be needed after the computation of the value for
total if that value is then used in further calculations

2.5 DEAD LOCKS AND LIVELOCKS


Deadlocks – occurs when two or more threads cannot proceed its task because of the
resources required by the thread, may be held by other thread, that thread may also be
waiting for the resources which may be held by some other thread.

Fig 2.5 Deadlocks

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 14


CS8083 MCP UNIT II

Example:
• Two threads are deadlocked
• If thread1 has already acquired lock A and thread2 has already acquired lock B,
then
• A cannot make forward progress because it is waiting for lock B
• Thread2 cannot make progress because it is waiting for lock A

Deadlock occurs
• Two routines update1() and update2() each have an outer loop
• Routine update1() acquires lock A and then attempts to acquire lock B
• Routine update2() acquires lock B and then attempts to acquire lock A
Avoiding deadlocks
• Ensure that threads always acquire the locks in the same order
• If thread2 acquired the locks in the order A and then B,
• Stall while waiting for lock A without having first acquired lock B
• Enable thread1 to acquire B and then eventually release both locks allowing
thread2 to make progress
LIVELOCKS
• A condition that occurs when two or more threads continually change their state in
response to changes in the other threads.

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 15


CS8083 MCP UNIT II

• Livelock is different from deadlock in a way that both the processes involved in
livelock are repeatedly changing their states with regard to each other and not
progressing.
• Traps threads in an unending loop releasing and acquiring locks
• Caused by code to back out of deadlocks.
• Threads are unable to make progress although they are not blocked
• Tasks enter infinite loop of operation that lead to nothing
Example:
• Two people on Landing
• Each time one takes a step to the side, the other mirrors that step, neither moves
forward
This can be avoided by ensuring that only one process (chosen randomly or by priority)
takes action.
Mechanism to avoid Deadlock
▪ If the thread cannot obtain the second lock it requires, it releases the lock that it
already holds.
▪ canAcquire() returns immediately either having acquired the lock or having failed
to acquire the lock.
▪ Each thread acquires a lock and then attempts to acquire the second lock that it
needs.
▪ If it fails to acquire the second lock
o Releases the holding lock before attempting to acquire both locks again
▪ When the thread successfully acquire both the locks
o Exits the loop
o Application will make no forward progress

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 16


CS8083 MCP UNIT II

2.6 COMMUNICATION BETWEEN THREADS AND PROCESSES


The parallel processing requires communication between either the threads or processes.
The popular mechanisms are:
1. Memory Shared memory and Memory mapped files
2. Condition Variables
3. Signals and events
4. Message Queues
5. Named Pipes
1. Memory Shared memory and Memory mapped files
▪ The easiest way for multiple threads to communicate is through memory
▪ Multiple threads can access the data available in the shared memory space.

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 17


CS8083 MCP UNIT II

▪ Same memory region can be shared by two or more processes.


▪ One process will create a shared memory portion and the other process can access
it.
▪ The Operating system will involve in mapping a memory segment in the address
space of processes
▪ After mapping, all the processes can perform read and write operations in the
shared memory segment or region
▪ The process can create, delete or open this memory using a shared memory object.
▪ One process will create a shared memory region using a library call
Open_shared_Memory() with unique descriptor.
▪ The descriptor can be the name of the file in the file system. This function call
return an identifier, also called as shared memory object, used to map the shared
memory region into the address space of the client application.
▪ If more processes are accessing the share memory segment, then the last process
will delete the shared region after accessing it using Delete_shared_Memory()
ID=Open shared memory (Descriptor);
Memory=Map Shared Memory(ID);
---
Memory[100]++
---
Close Shared Memory(ID);
Delete Shared Memory(Descriptor);

The pseudo code below depicts the procedure for using an existing shared memory
region. Each process should close the shared memory segment after using it
ID = Open Shared Memory( Descriptor );
Memory = Map Shared Memory( ID );
...
Close Shared Memory( ID );

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 18


CS8083 MCP UNIT II

2. Condition Variables
• Communicate readiness between threads by enabling a thread to be woken up
when a condition becomes true.
• Without condition variables, the waiting thread would have to use some form of
polling to check whether the condition had become true
• Example: Producer Consumer problem
The condition variable is used in solving Producer-consumer problem. The producer
thread produces data item and add into a queue(Enqueue). The consumer thread
consumes data from the queue. Before adding data, the producer has to check whether the
queue has an empty position to accommodate the new data item. Before consuming data
item, the consumer will check whether the queue contains data item in the queue or not.
These conditions are checked using condition variables.
A producer can produce one item while the consumer is consuming another item. The
process of producer and consumer must be synchronized. Therefore before adding an
item in the queue, the producer thread will acquire mutex lock. After adding the item in
the queue, the thread will release the mutex lock. If there is only one item exists in the
queue, then the producer thread will wakeup only one consumer thread by signalling it.
Producer Thread Adding an Item to the Queue
Acquire Mutex();
Add Item to Queue();
If ( Only One Item on Queue )
{
Signal Conditions Met();
}
Release Mutex();
If there are no items in the queue, then the consumer thread will go to sleep mode and has
to be woken up when item is added in the queue by signalling the mutex lock by the
producer thread. If there are more items exist in the queue, then the consumer thread can

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 19


CS8083 MCP UNIT II

consume the items one by one. The consumer thread will release the mutex lock either if
there are no items in the queue or all the items are consumed by the consumer thread.
Consumer Thread removing Items from Queue
Acquire Mutex();
Repeat Item = 0;
If ( No Items on Queue() )
{
Wait on Condition Variable();
}
If (Item on Queue())
{
Item = remove from Queue();
}
Until ( Item != 0 );
Release Mutex();

Consumer Thread Code with Potential Lost Wake-Up Problem


Repeat
Item = 0;
If ( No Items on Queue() )
{
Acquire Mutex();
Wait on Condition Variable();
Release Mutex();
}
Acquire Mutex();
If ( Item on Queue() )
{
Item = remove from Queue();

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 20


CS8083 MCP UNIT II

}
Release Mutex();
Until ( Item!=0 );
▪ The problem with the code is the first if condition. If there are no items on the
queue, then the mutex lock is acquired, and the thread waits on the condition
variable.
▪ However, the producer thread could have placed an item and signaled the
consumer thread between the consumer thread executing the if statement and
acquiring the mutex.
▪ When this happens, the consumer thread waits on the condition variable
indefinitely because the producer thread, signals only when it places the first item
into the queue.
▪ Lost wake-up Occurs when the signal to wakeup the waiting thread is sent before
the thread is ready to receive it
3. Signals and events
▪ UNIX mechanism where one process can send a signal to another process when an
event occur.
▪ Have a handler in the receiving process to perform some task upon the receipt of
the message
Examples:
▪ Signals
o Pressing ^c
o Causes a SIGKILL signal to be sent to the process
o Stop the running application
▪ Events
o Handling of Keyboard presses
o Pressing one of the mouse buttons will send a click event to the target
window

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 21


CS8083 MCP UNIT II

4. Message queues
▪ A structure that can be shared between multiple processes
▪ Messages can be placed into the queue and will be removed in the same order
in which they were added
▪ Like shared memory segment, message queue also uses a descriptor which is
the location of the file in the file system.
▪ The descriptor is mainly used to create the message queue and also used to
attach with the existing message queue.
▪ Messages that to be shared is put into message queue. The first message is
located in the first position of the queue and can be retrieved first.
Example:
Creating and Placing Messages into a Queue
ID = Open Message Queue Queue(Descriptor );
Put Message in Queue( ID, Message );
...
Close Message Queue( ID );
Delete Message Queue( Description );
Opening a Queue and Receiving Messages
ID=Open Message Queue ID(Descriptor);
Message=Remove Message from Queue(ID);
...
Close Message Queue(ID);

5. Named pipes
▪ Pipe – Used to pass data from one process to another
▪ Example
▪ ls- lists all files in a directory
▪ wc – counts the number of lines, words, and characters in the input
▪ command ls is piped into wc command

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 22


CS8083 MCP UNIT II

o count the number of files in the directory


▪ Named pipes
▪ File like objects that are given a specific name that can be shared between
processes
▪ Concept
▪ Pipe is opened
▪ Data is written into it or read from it.
▪ Then pipe is closed
Setting Up and Writing into a Pipe
Make Pipe( Descriptor );
ID = Open Pipe(Descriptor );
Write Pipe( ID, Message, sizeof(Message) ); ...
Close Pipe( ID );
Delete Pipe( Descriptor );
Opening an Existing Pipe to Receive Messages ID=Open Pipe( Descriptor );
Read Pipe( ID, buffer, sizeof(buffer) );
...
Close Pipe( ID );

B. Shanmuga Sundari, AP/CSE, 9632 PETEC Vallioor 23

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy