UNIT-2 Parallel Programming Challenges
UNIT-2 Parallel Programming Challenges
PERFORMANCE
Speed and efficiency is to equally divide the work among the cores,while at the same time
introducing no additional work for the cores.
If we succeeding doing this, and we run our program with p cores, one thread or process on
eachcore, then our parallel program will run p times faster than the serial program. If we call the
serial run-time Tserial and our parallel run-time Tparallel, then the best we can hope for is
Tparallel D Tserial=p. When this happens, we say that our parallel program has linear speedup.
In practice, we’re unlikely to get linear speedup because the use of multiple processes/threads
almost invariably introduces some overhead. For example, shared memory programs will almost
always have critical sections, which will require that we use some mutual exclusion mechanism
such as a mutex.
The calls to the mutex functions are overhead that’s not present in the serial program, and the use
of the mutex forces the parallel program to serialize execution of the critical section.
Distributed-memory programs will almost always need to transmit data across the network,
which is usually much slower than local memory access. Serial programs,on the other hand,
won’t have these overheads.
Thus, it will be very unusual for us to find that our parallel programs get linear speedup.
Furthermore, it’s likely that the overheads will increase as we increase the number of processes
or threads, that is, more threads will probably mean more threads need to access a critical
section. More processes will probably mean more data needs to be transmitted across the
network.So if we define the speedup of a parallel program to be
,
. Furthermore, as p increases, we expect S to become a smaller and smaller fraction of the ideal,
linear speedup p.
Another way of saying this is that S=p will probably get smaller and smaller as p increases. An
example of the changes in S and S=p as p increases.This value, S=p, is sometimes called the
efficiency of the parallel program. If we substitute the formula for S, we see that the efficiency is
E==
when we increase the problem size, the speedups and the efficiencies increase, while they
decrease when we decrease the problem size.
This behavior is quite common. Many parallel programs are developed by dividingthe work of
the serial program among the processes/threads and adding in the necessary “parallel overhead”
such as mutual exclusion or communication. Therefore, if Toverhead denotes this parallel
overhead, it’s often the case that
The speedup and the efficiency will increase. This is what your intuition should tell you: there’s
more work for the processes/threads to do, so the relative amount of time spent coordinating the
work of the processes/threads should be less.
A final issue to consider is what values of Tserial should be used when reporting speedups and
efficiencies. Some authors say that Tserial should be the run-time of the fastest program on the
fastest processor available.
the performance of a parallel shell sort program, authors in the first group might use a serial
radix sort or quicksort on a single core of the fastest system available, while authors in the
second group would use a serial shell sort on a single processor of the parallel system.
+ 0.1 + 2
Now as p gets larger and larger, 0.9Tserial/p = 18/p gets closer and closer to 0, so
the total parallel run-time can’t be smaller than 0.1Tserial =2. That is, the
denominator in S can’t be smaller than 0.1Tserial = 2. The fraction S must therefore be smaller
than
= = 10
That is, S 10. This is saying that even though we’ve done a perfect job in parallelizing
90% of the program, and even if we have, say, 1000 cores, we’ll never get a
More generally, if a fraction r of our serial program remains unparallelized, then Amdahl’s law
says we can’t get a speedup better than 1/r. In our example, r = 1� 0.9 = 1/10, so we
couldn’t get a speedup better than 10. Therefore, if a fraction r of our serial program is
“inherently serial,” that is, cannot possibly be parallelized, then we can’t possibly get a speedup
better than 1/r. Thus, even if r is quite small— say 1/100—and we have a system with thousands
of cores, we can’t possibly get a speedup better than 100.
There are several reasons not to be too worried by Amdahl’s law. First, it doesn’t take into
consideration the problem size. For many problems, as we increase the problem size the
“inherently serial” fraction of the program decreases in size; a more mathematical version of this
statement is known as Gustafson’s law [25]. Second, there are thousands of programs used by
scientists and engineers that routinely obtain huge speedups on large distributed-memory
systems.
(3) Scalability
The word “scalable” has a wide variety of informal uses. Indeed, we’ve used it
Suppose we run a parallel program with a fixed number of processes/threads and a fixed input
size, and we obtain an efficiency E. Suppose we now increase the number of processes/threads
that are used by the program. If we can find a corresponding rate of increase in the problem size
so that the program always has efficiency E, then the program is scalable.
As an example, suppose that Tserial =n, where the units of Tserial are in microseconds, and n is
also the problem size. Also suppose that Tparallel / n=p+1. Then
If x = k, there will be a common factor of k in the denominator xn+kp =kn+kp = k(.n+p), and we
can reduce the fraction to get,
If when we increase the number of processes/threads, we can keep the efficiency fixed without
increasing the problem size, the program is said to be strongly scalable. If we can keep the
efficiency fixed by increasing the problem size at the same rate as we increase the number of
processes/threads, then the program is said to be weakly scalable. The program in our example
would be weakly scalable.
. On the other hand, once we’ve completed development of the program, we’re often interested
in determining how good its performance is. Perhaps surprisingly, the way we take these two
timings is usually different. For the first timing, we usually need very detailed information
Second, we’re usually not interested in the time that elapses between the program’s start and the
program’s finish. We’re usually interested only in some part of the program. For example, if we
write a program that implements bubble sort, we’re probably only interested in the time it takes
to sort the keys, not the time it takes to read them in and print them out. We probably can’t use
something like the Unix shell command time, which reports the time taken to run a program
from start to finish. Third, we’re usually not interested in “CPU time.” This is the time reported
by the standard C function clock. It’s the total time the program spends in code executed as part
of the program. It would include the time for code we’ve written; it would include the time we
spend in library functions such as pow or sin; and it would include the time the operating system
spends in functions we call, such as printf and scanf. It would not include time the program was
idle, and this could be a problem.
The function Get current time() is a hypothetical function that’s supposed to return the number of
seconds that have elapsed since some fixed time in the past. It’s just a placeholder. The actual
function that is used will depend on the API. For example, MPI has a function MPI Wtime that
could be used here, and the OpenMP API for shared-memory programming has a function omp
get wtime. Both functions return wall clock time instead of CPU time.
There may be an issue with the resolution of the timer function. The resolution is the unit of
measurement on the timer. It’s the duration of the shortest event that can
When we’re timing parallel programs, we need to be a little more careful about how the timings
are taken. In our example, the code that we want to time is probably being executed by multiple
processes or threads and our original timing will result in the output of p elapsed times.
...
...
However, what we’re usually interested in is a single time: the time that has elapsed from when
the first process/thread began execution of the code to the time the last process/thread finished
execution of the code.We often can’t obtain this exactly, since there may not be any
correspondence between the clock on one node and the clock on another node. We usually settle
for a compromise that looks something like this:
Barrier();
...
if (my rank == 0)
Here, we first execute a barrier function that approximately synchronizes all of the
processes/threads. We would like for all the processes/threads to return from the call
simultaneously, but such a function usually can only guarantee that all the processes/ threads
have started the call when the first process/thread returns.We then execute the code as before and
each process/thread finds the time it took. Then all the processes/threads call a global maximum
function, which returns the largest of the elapsed times, and process/thread 0 prints it out.
We also need to be aware of the variability in timings. When we run a program several times, it’s
extremely likely that the elapsed time will be different for each run. This will be true even if each
time we run the program we use the same input and the same systems.
It might seem that the best way to deal with this would be to report either a mean or a median
run-time. However, it’s unlikely that some outside event could actually make our program run
faster than its best possible run-time. So instead of reporting the mean or median time, we
usually report the minimum time.
Running more than one thread per core can cause dramatic increases in the variability of timings.
More importantly, if we run more than one thread per core, the system will have to take extra
time to schedule and deschedule cores, and this will add to the overall run-time. Therefore, we
rarely run more than one thread per core.
Finally, as a practical matter, since our programs won’t be designed for highperformance I/O,
we’ll usually not include I/O in our reported run-times
Data Races
Data races are the most common programming error found in parallel code. A data race
occurs when multiple threads use the same data item and one or more of those threads
are updating it. It is best illustrated by an example. Suppose you have the code shown in
Listing 4.1, where a pointer to an integer variable is passed in and the function increments
void update(int * a)
*a = *a + 4;
Suppose this code occurs in a multithreaded application and two threads try to increment
the same variable at the same time. Table 4.1 shows the resulting instruction stream.
Value of variable a = 10
Thread 1 Thread 2
add %01, 4, %o1 // Add %o1 = 14 add %01, 4, %o1 // Add %o1 = 14
Value of variable a = 14
In the example, each thread adds 4 to the variable, but because they do it at exactly
the same time, the value 14 ends up being stored into the variable. If the two threads had
executed the code at different times, then the variable would have ended up with the
value of 18.
This is the situation where both threads are running simultaneously. This illustrates a
common kind of data race and possibly the easiest one to visualize.
Another situation might be when one thread is running, but the other thread has
been context switched off of the processor. Imagine that the first thread has loaded the
value of the variable a and then gets context switched off the processor. When it eventually
runs again, the value of the variable a will have changed, and the final store of the
restored thread will cause the value of the variable a to regress to an old value.
Consider the situation where one thread holds the value of a variable in a register and
a second thread comes in and modifies this variable in memory while the first thread is
running through its code. The value held in the register is now out of sync with the
value held in memory.
The point is that a data race situation is created whenever a variable is loaded and
another thread stores a new value to the same variable: One of the threads is now working
Data races can be hard to find. Take the previous code example to increment a variable.
It might reside in the context of a larger, more complex routine. It can be hard to
identify the sequence of problem instructions just by inspecting the code. The sequence
of instructions causing the data race is only three long, and it could be located within a
Not only is the problem hard to see from inspection, but the problem would occur
only when both threads happen to be executing the same small region of code. So even
if the data race is readily obvious and can potentially happen every time, it is quite possible
that an application with a data race may run for a long time before errors are observed.
In the example, unless you were printing out every value of the variable a and actually
saw the variable take the same value twice, the data race would be hard to detect.
The potential for data races is part of what makes parallel programming hard. It is a
common error to introduce data races into a code, and it is hard to determine, by
inspection, that one exists. Fortunately, there are tools to detect data races.
“Using POSIX Threads.” The code creates two threads, both of which execute the routine
func(). The main thread then waits for both the child threads to complete their work
#include <pthread.h>
int counter = 0;
counter++;
void main()
pthread_join( thread1, 0 );
pthread_join( thread2, 0 );
Both threads will attempt to increment the variable counter. We can compile this
code with GNU gcc and then use Helgrind, which is part of the Valgrind1 suite, to
identify the data race. Valgrind is a tool that enables an application to be instrumented
and its runtime behavior examined. The Helgrind tool uses this instrumentation to
gather data about data races. Listing 4.4 shows the output from Helgrind.
...
==4742==
(in /lib/tls/i686/cmov/libpthread-2.9.so)
(in /lib/tls/i686/cmov/libpthread-2.9.so)
The output from Helgrind shows that there is a potential data race between two threads,
both executing line 7 in the file race.c. This is the anticipated result, but it should be
pointed out that the tools will find some false positives. The programmer may write code
where different threads access the same variable, but the programmer may know that
there is an enforced order that stops an actual data race. The tools, however, may not be
able to detect the enforced order and will report the potential data race.
Another tool that is able to detect potential data races is the Thread Analyzer in
Oracle Solaris Studio. This tool requires an instrumented build of the application, data
collection is done by the collect tool, and the graphical interface is launched with the
command tha.
Listing 4.5 Detecting Data Races Using the Sun Studio Thread Analyzer
$ cc -g -xinstrument=datarace race.c
$ collect -r on ./a.out
$ tha tha.1.er&
The initial screen of the tool displays a list of data races, as shown in Figure 4.1.
Once the user has identified the data race they are interested in, they can view the
source code for the two locations in the code where the problem occurs. In the example, shown
in Figure 4.2, both threads are executing the same source line.
Figure 4.2 Source code with data race shown in Solaris Studio
Thread Analyzer
Although it can be hard to identify data races, avoiding them can be very simple: Make
sure that only one thread can update the variable at a time. The easiest way to do this is
to place a synchronization lock around all accesses to that variable and ensure that before
referencing the variable, the thread must acquire the lock. Listing 4.6 shows a modified
version of the code. This version uses a mutex lock, described in more detail in the next
section, to protect accesses to the variable counter. Although this ensures the correctness
pthread_mutex_lock( &mutex );
counter++;
pthread_mutex_unlock( &mutex );
Synchronization Primitives
Synchronization is used to coordinate the activity of multiple threads. There are various
situations where it is necessary; this might be to ensure that shared resources are not
accessed by multiple threads simultaneously or that all work on those resources is complete
most appropriate to use these rather than attempting to write custom methods of synchronization.
There are two reasons for this. Synchronization primitives provided by the
operating system will usually be recognized by the tools provided with that operating
system. Hence, the tools will be able to do a better job of detecting data races or correctly
labeling synchronization costs. The operating system will often provide support for
sharing the primitives between threads or processes, which can be hard to do efficiently
thread at a time can acquire a mutex lock, so they can be placed around a data structure
to ensure that the data structure is modified by only one thread at a time.
Placing Mutex Locks Around Accesses to Variables:
int counter;
mutex_lock mutex;
void Increment()
acquire( &mutex );
counter++;
release( &mutex );
void Decrement()
acquire( &mutex );
counter--;
release( &mutex );
In the example, the two routines Increment() and Decrement() will either increment
or decrement the variable counter. To modify the variable, a thread has to first
acquire the mutex lock. Only one thread at a time can do this; all the other threads that
want to acquire the lock need to wait until the thread holding the lock releases it. Both
routines use the same mutex; consequently, only one thread at a time can either increment
If multiple threads are attempting to acquire the same mutex at the same time, then
only one thread will succeed, and the other threads will have to wait. This situation is
The region of code between the acquisition and release of a mutex lock is called a
critical section, or critical region. Code in this region will be executed by only one thread at
a time.
As an example of a critical section, imagine that an operating system does not have
the same time. One way to fix this is to place the call to malloc() in a critical section
acquire( &mallocMutex );
release( &mallocMutex );
return memory;
If all the calls to malloc() are replaced with the threadSafeMalloc() call, then
only one thread at a time can be in the original malloc() code, and the calls to
Threads block if they attempt to acquire a mutex lock that is already held by another
thread. Blocking means that the threads are sent to sleep either immediately or after a
One problem with this approach is that it can serialize a program. If multiple threads
simultaneously call threadSafeMalloc(), only one thread at a time will make progress.
This causes the multithreaded program to have only a single executing thread, which
lock is that a thread waiting to acquire a spin lock will keep trying to acquire the lock
without sleeping. In comparison, a mutex lock may sleep if it is unable to acquire the
lock. The advantage of using spin locks is that they will acquire the lock as soon as it is
released, whereas a mutex lock will need to be woken by the operating system before it
can get the lock. The disadvantage is that a spin lock will spin on a virtual CPU monopolizing
that resource. In comparison, a mutex lock will sleep and free the virtual CPU
Often mutex locks are implemented to be a hybrid of spin locks and more traditional
mutex locks. The thread attempting to acquire the mutex spins for a short while before
blocking. There is a performance advantage to this. Since most mutex locks are held for
only a short period of time, it is quite likely that the lock will quickly become free for
the waiting thread to acquire. So, spinning for a short period of time makes it more
likely that the waiting thread will acquire the mutex lock as soon as it is released.
However, continuing to spin for a long period of time consumes hardware resources that
(3) Semaphores
Semaphores are counters that can be either incremented or decremented. They can be
used in situations where there is a finite limit to a resource and a mechanism is needed
to impose that limit. An example might be a buffer that has a fixed size. Every time an
element is added to a buffer, the number of available positions is decreased. Every time
Semaphores can also be used to mimic mutexes; if there is only one element in the
semaphore, then it can be either acquired or available, exactly as a mutex can be either
locked or unlocked.
Semaphores will also signal or wake up threads that are waiting on them to use available
resources; hence, they can be used for signaling between threads. For example, a thread
might set a semaphore once it has completed some initialization. Other threads could
wait on the semaphore and be signaled to start work once the initialization is complete.
called wait, down, or acquire, and the method to release a semaphore might be called post,
up, signal, or release. When the semaphore no longer has resources available, the threads
(4) Barriers
There are situations where a number of threads have to all complete their work before
any of the threads can start on the next task. In these situations, it is useful to have a barrier
One common example of using a barrier arises when there is a dependence between
different sections of code. For example, suppose a number of threads compute the values
stored in a matrix. The variable total needs to be calculated using the values stored in
the matrix. A barrier can be used to ensure that all the threads complete their computation
Compute_values_held_in_matrix();
Barrier();
total = Calculate_value_from_matrix();
The variable total can be computed only when all threads have reached the barrier.
This avoids the situation where one of the threads is still completing its computations
while the other threads start using the results of the calculations. Notice that another
barrier could well be needed after the computation of the value for total if that value
Compute_values_held_in_matrix();
Barrier();
total = Calculate_value_from_matrix();
Barrier();
Perform_next_calculation( total );
DEADLOCK:
A process request the resources, the resources are not available at that time, so the process enter
into the waiting state. The requesting resources are held by another waiting process, both are in
waiting state, this situation is said to be “Deadlock”.
1.MUTUALEXCLUSION
2.HOLD&WAIT
3.NO-PREEMPTION
4. CIRCULAR WAIT
Deadlock Aviodence:
Avoid actions that may lead to a deadlock.
Think of it as a state machine moving from 1 state to another as each instruction is executed.
1.Safe State
2.Banker’s Algorithm
Banker'sAlgorithm
When a request is made, check to see if after the request is satisfied, there is a (atleast one!)
sequence of moves that can satisfy all the requests ie. the new state is safe. If so, satisfy the
request, else make the request wait.
If we have a resource allocation system with only one inatance of each process, a varient of
the resource allocation graph can be used for deadlock avoidence.
DEADLOCK PREVENTIONS
Difference from avoidance is that here, the system itself is build in such a way that there are no
deadlocks.
This may however be even more conservative than deadlock avoidance strategy.
Deadlock Detection
Detection mechanism of deadlocks for single instance of resource type is
different. We can detect the dead locks using wait for graph for single instance resource type and
detect using detection algorithm for multiple instances of resource type.
Single instance of resource type means, the system consisting of only one resource for one type.
We can detect this type of deadlocks with the help of wait for graph.
P1 R1 P2 P2 R2 P3
P2 R3 P1 P3 R4 P2
Wait for graph
A system is in deadlock state , if and only if the wait for graph contains cycles. So
we can detect the deadlocks with cycles. In the figure there is 2 cycles one is P1 to P2 to P1,
second one P2 to P3 to P2 so the system consisting of deadlocks.
The wait for graph is not applicable to several instance of resource type. So we need another
method for this type, that is “deadlock detection algorithm”. This algorithm looks like ‘Banker’s
algorithm” and it employees several data structures that are similar to those used in the Banker’s
algorithm.
Deadlock recovery
Once deadlock has been detected, some strategy is needed for recovery. The various
approaches of recovering from deadlock are:
PROCESS TERMINATION
RESOURCE PREEMPTION
PROCESS TERMINATION
“Process termination” it is one method to recover from deadlock. We uses 2 methods for process
termination, these are:
ABORT ALL DEADLOCKED PROCESS : It means release all the processes in the deadlocked
state, and start the allocation from the starting point. It is a great expensive method.
RESOURCE PREEMPTION
from processes and give these resources to other processes until the deadlock cycle is broken.
There are 3 methods to eliminate the deadlocks using resource preemption.These are :
SELECTING A VICTIM : Select a victim resource from the deadlock state, and preempt that
one.
ROLLBACK : If a resource from a process is preempted, what should be done with that process.
The process must be roll backed to some safe state and restart it from that state.
STARVATION : It must be guaranteed that resource will not always be preempted from the same
process to avoid starvation problem.
Livelock:
A situation in which two or more processes continuously change their states in response to
changes in the other process(es) without doing any useful work. It is somewhat similar to the
deadlock but the difference is processes are getting polite and let other to do the work. This can
be happen when a process trying to avoid a deadlock.
A livelock is similar to a deadlock, except that the states of the processes involved in the livelock
constantly change with regard to one another, none progressing. Livelock is a special case of
resource starvation; the general definition only states that a specific process is not progressing.
A real-world example of livelock occurs when two people meet in a narrow corridor, and each
tries to be polite by moving aside to let the other pass, but they end up swaying from side to side
without making any progress because they both repeatedly move the same way at the same time.
Livelock is a risk with some algorithms that detect and recover from deadlock. If more than one
process takes action, the deadlock detection algorithm can be repeatedly triggered. This can be
avoided by ensuring that only one process (chosen randomly or by priority) takes action.
threads or the processes. There is usually an implicit or explicit action of one thread
sending data to another thread. For example, one thread might be signaling to another
that work is ready for them. We have already seen an example of this where a semaphore
might indicate to waiting threads that initialization has completed. The thread signaling
the semaphore does not know whether there are other threads waiting for that signal.
Alternatively, a thread might be placing a message on a queue, and the message would be
These mechanisms usually require operating system support to mediate the sending of
messages between threads or processes. Programmers can invent their own implementa-
tions, but it can be more efficient to rely on the operating system to put a thread to
The easiest way for multiple threads to communicate is through memory. If two threads
can access the same memory location, the cost of that access is little more than the
memory latency of the system. Of course, memory accesses still need to be controlled to
ensure that only one thread writes to the same memory location at a time. A multi-
threaded application will share memory between the threads by default, so this can be a
very low-cost approach. The only things that are not shared between threads are variables
on the stack of each thread (local variables) and thread-local variables, which will be dis-
cussed later.
To set up shared memory between two processes, one process will make a library call
to create a shared memory region. The call will use a unique descriptor for that shared
memory. This descriptor is usually the name of a file in the file system. The create call
returns a handle identifier that can then be used to map the shared memory region into
the address space of the application. This mapping returns a pointer to the newly mapped
memory. This pointer is exactly like the pointer that would be returned by malloc()
When each process exits, it detaches from the shared memory region, and then the
...
Memory[100]++;
...
Below given shows the process of attaching to an existing shared memory segment. In
this instance, the shared region of memory is already created, so the same descriptor used
to create it can be used to attach to the existing shared memory region. This will provide
the process with an ID that can be used to map the region into the process.
...
CONDITION VARIABLES:
woken up when a condition becomes true. Without condition variables, the waiting
thread would have to use some form of polling to check whether the condition had
become true.
Condition variables work in conjunction with a mutex. The mutex is there to ensure
that only one thread at a time can access the variable. For example, the producer-
has one producer thread and one consumer thread. The producer adds data onto a
queue, and the consumer removes data from the queue. If there is no data on the queue,
then the consumer needs to sleep until it is signaled that an item of data has been placed
on the queue
Acquire Mutex();
Release Mutex();
The producer thread needs to signal a waiting consumer thread only if the queue was
empty and it has just added a new item into that queue. If there were multiple items
already on the queue, then the consumer thread must be busy processing those items and
cannot be sleeping. If there were no items in the queue, then it is possible that the con-
sumer thread is sleeping and needs to be woken up.
Acquire Mutex();
Repeat
Item = 0;
If ( No Items on Queue() )
If (Item on Queue())
Until ( Item != 0 );
Release Mutex();
The consumer thread will wait on the condition variable if the queue is empty. When
the producer thread signals it to wake up, it will first check to see whether there is any-
thing on the queue. It is quite possible for the consumer thread to be woken only to
find the queue empty; it is important to realize that the thread waking up does not
imply that the condition is now true, which is why the code is in a repeat loop in the
example. If there is an item on the queue, then the consumer thread can handle that
The interaction with the mutex is interesting. The producer thread needs to acquire
the mutex before adding an item to the queue. It needs to release the mutex after adding
the item to the queue, but it still holds the mutex when signaling. The consumer thread
cannot be woken until the mutex is released. The producer thread releases the mutex
after the signaling has completed; releasing the mutex is necessary for the consumer
The consumer thread acquires the mutex; it will need it to be able to safely modify
the queue. If there are no items on the queue, then the consumer thread will wait for an
item to be added. The call to wait on the condition variable will cause the mutex to be
released, and the consumer thread will wait to be signaled. When the consumer thread
wakes up, it will hold the mutex; either it will release the mutex when it has removed an
item from the queue or, if there is still nothing in the queue, it will release the mutex
The producer thread can use two types of wake-up calls: Either it can wake up a sin-
gle thread or it can broadcast to all waiting threads. Which one to use depends on the
context. If there are multiple items of data ready for processing, it makes sense to wake
up multiple threads with a broadcast. On the other hand, if the producer thread has
added only a single item to the queue, it is more appropriate to wake up only a single
thread. If all the threads are woken, it can take some time for all the threads to wake up,
execute, and return to waiting, placing an unnecessary burden on the system. Notice that
because each thread has to own the mutex when it wakes up, the process of waking all
the waiting threads is serial; only a single thread can be woken at a time.
The other point to observe is that when a wake-up call is broadcast to all threads,
some of them may be woken when there is no work for them to do. This is one reason
The other problem to be aware of with condition variables is the lost wake-up. This
occurs when the signal to wake up the waiting thread is sent before the thread is readyto receive
it. Listing 4.19 shows a version of the consumer thread code. This version of
the code can suffer from the lost wake-up problem.
Listing 4.19 Consumer Thread Code with Potential Lost Wake-Up Problem
Repeat
Item = 0;
If ( No Items on Queue() )
Acquire Mutex();
Release Mutex();
Acquire Mutex();
If ( Item on Queue() )
Release Mutex();
Until ( Item!=0 );
The problem with the code is the first if condition. If there are no items on the
queue, then the mutex lock is acquired, and the thread waits on the condition variable.
However, the producer thread could have placed an item and signaled the consumer
thread between the consumer thread executing the if statement and acquiring the
mutex. When this happens, the consumer thread waits on the condition variable indefi-
nitely because the producer thread, in Listing 4.17, signals only when it places the first
Signals are a UNIX mechanism where one process can send a signal to another process
and have a handler in the receiving process perform some task upon the receipt of the
message. Many features of UNIX are implemented using signals. Stopping a running
Windows has a similar mechanism for events. The handling of keyboard presses and
mouse moves are performed through the event mechanism. Pressing one of the buttons
on the mouse will cause a click event to be sent to the target window.
Signals and events are really optimized for sending limited or no data along with the
signal, and as such they are probably not the best mechanism for communication when
Listing 4.20 shows how a signal handler is typically installed and how a signal can be
sent to that handler. Once the signal handler is installed, sending a signal to that thread
...
int main()
sendSignal( SIGNAL );
}
MESSAGE QUEUE:
A message queue is a structure that can be shared between multiple processes. Messages
can be placed into the queue and will be removed in the same order in which they were
added. Constructing a message queue looks rather like constructing a shared memory
segment. The first thing needed is a descriptor, typically the location of a file in the file
system. This descriptor can either be used to create the message queue or be used to
attach to an existing message queue. Once the queue is configured, processes can place
messages into it or remove messages from it. Once the queue is finished, it needs to be
deleted.
Listing 4.21 shows code for creating and placing messages into a queue. This code is
...
Listing 4.22 shows the process for receiving messages for a queue. Using the descrip-
tor for an existing message queue enables two processes to communicate by sending and
...
UNIX uses pipes to pass data from one process to another. For example, the output from
the command ls, which lists all the files in a directory, could be piped into the wc com-
mand, which counts the number of lines, words, and characters in the input. The combi-
nation of the two commands would be a count of the number of files in the directory.
Named pipes are file-like objects that are given a specific name that can be shared
between processes. Any process can write into the pipe or read from the pipe. There is
no concept of a “message”; the data is treated as a stream of bytes. The method for using
a named pipe is much like the method for using a file: The pipe is opened, data is writ-
ten into it or read from it, and then the pipe is closed.
Listing 4.23 shows the steps necessary to set up and write data into a pipe, before
closing and deleting the pipe. One process needs to actually make the pipe, and once it
has been created, it can be opened and used for either reading or writing. Once the
process has completed, the pipe can be closed, and one of the processes using it should
...
Close Pipe( ID );
Listing 4.24 shows the steps necessary to open an existing pipe and read messages from
it. Processes using the same descriptor can open and use the same pipe for communication.
Listing 4.24 Opening an Existing Pipe to Receive Messages
...
Close Pipe( ID );