Cs8083 Unit II Notes
Cs8083 Unit II Notes
2.1 PERFORMANCE
The performance of the system are basically computed from the performance of the
processor and memory. Performance can be determined as how quickly the instructions
are executed by the machine.
Taking performance into consideration in the initial phases of application design will lead
to a better final product
Two approaches for improving the performance of the application
✓ Performance is the problem to be solved once the program is functionally correct
✓ Performance is considered as one of the upfront specifications for the application
2.1.1 Common metrics for performance
Items per unit time
✓ Ability of the system to complete tasks rather than on the duration of each
individual task
✓ This is a measure of bandwidth
Ex: Transactions per second, jobs per hour
Time per item
✓ Measure of the time to complete a single task
✓ Measure of latency or response time
QOS Metric
Specify the expectations of the users of the system as well as penalties if the system fails
to meet these expectations
Example:
• Number of transactions of latency greater than some threshold
• Amount of time that the system is unavailable.
• Called downtime or availability
• To solve false sharing, Accessed data structures can use padding so that the
variable used by each thread resides on a separate cache line.
Operating system constraints to scaling
• When there are too many active threads on a system, its not a constraint on the
absolute number of threads on the system, constraint only on the number of active
threads.
• If there are continuously more threads requiring CPU time than there are virtual
CPUs, then the system may be considered to be oversubscribed.
• Multiple threads share the same virtual CPU will typically lead to a greater than
linear reduction in performance because of a number of factors
• Overheads due to the cost of switching context between the active threads and also
the cost associated with the migration of threads between virtual CPUs.
• Threads that are sleeping or blocked waiting for a resource do not consume CPU.
Should not impact the performance of the active threads.
• Using Processor Binding to improve Memory Locality
• Thread migration is the situation where a thread starts running on one virtual
processor but ends up migrated to a different virtual processor.
• If the new virtual processor shares the same core or if its shares cache with the
original virtual processor, then the cost of migration is low.
• For a multicore processor, effect of thread migration tends to be minor since the
new location of the thread shares a close level of cache with the old location of the
thread.
• If the migration is from one physical chip to another, cost can be quite high`
• Processor binding is undertaken to improve performance
• Also restricts the freedom of the operating system to dynamically schedule for the
best performance. It lead to the bound application taking many times longer than
its unbound run time.
• Priority Inversion
• All the processes and threads can have some associated priority. Operating system
used to determine how much CPU time to assign to the thread.
• Thread with higher priority will get more CPU time than a thread with a lower
priority.
• Also used to determine how time is distributed between the threads that comprise
the application
2.3 SYNCHRONIZATION AND DATA SHARING
• Situations where multiple threads are updating the same data in an unsafe way.
• Most common programming error found in parallel code.
• One way to avoid data races is by utilizing proper synchronization between
threads
2.3.1 Data races
Data races are the most common error found in parallel code. A data race occurs when
multiple threads use the same data item and one or more of them are updating it. It is best
illustrated by an example:
In the example, each thread adds 4 to the variable, because they do it at exactly the same
time, the value 14 ends up being stored into the variable. If the two threads had executed
the code at different times, then the variable would have ended up with the value of 18.
This is the situation where both threads are running simultaneously. This illustrates a
common kind of data race and possibly the easiest one to visualize.
Another tool that is able to detect potential data races is the Thread Analyzer in Oracle
Solaris Studio. This tool requires an instrumented build of the application, data collection
is done by the collect tool, and the graphical interface is launched with the command tha.
Listing 4.5 shows the steps to do this.
Listing 4.5 Detecting data races using the Sun studio Thread Analyzer
Fig 2.4 List of data races detected by the Solaris Studio Thread Analyzer
The initial screen of the tool displays a list of data races, as shown in Figure 2.4. Once the
user has identified the data race they are interested in, they can view the source code for
the two locations in the code where the problem occurs.
Thread Analyzer GUI
Races tab – Shows a list of data races detected in the program and associated call stack
traces. Tab is selected by default
Dual source tab – Shows the two source locations corresponding to the two accesses of a
selected data race.
Experiments tab – Shows the load objects in the experiment, and lists error and warning
messages
2.3.3 Avoiding data races
• Place a Synchronization lock around all accesses to that variable.
• Ensure the thread must acquire the lock before referencing the variable
• Example
• Void *func(void *params)
• { pthread_mutex_lock(&mutex);
• Counter++;
• Pthread_mutex_unlock(&mutex);
• }
• Uses a mutex lock to protect accesses to the variable counter
In the example, the two routines Increment() and Decrement() will either increment or
decrement the variable counter. To modify the variable, a thread has to first acquire the
mutex lock. Only one thread at a time can do this; all the other threads that want to
acquire the lock need to wait until the thread holding the lock releases it.
Both routines use the same mutex; consequently, only one thread at a time can either
increment or decrement the variable counter. If multiple threads are attempting to acquire
the same mutex at the same time, then only one thread will succeed, and the other threads
will have to wait. This situation is known as a contended mutex.
If all the calls to malloc() are replaced with the threadSafeMalloc() call, then only one
thread at a time can be in the original malloc() code, and the calls to malloc() become
thread-safe. Threads block if they attempt to acquire a mutex lock that is already held by
another thread.
Blocking means that the threads are sent to sleep either immediately or after a few
unsuccessful attempts to acquire the mutex. One problem with this approach is that it can
serialize a program.
If multiple threads simultaneously call threadSafeMalloc(), only one thread at a time will
make progress. This causes the multithreaded program to have only a single executing
thread, which stops the program from taking advantage of multiple cores.
Spin Locks
Spin locks are essentially mutex locks. The difference between a mutex lock and a spin
lock is that a thread waiting to acquire a spin lock will keep trying to acquire the lock
without sleeping.
The advantage of using spin locks is that they will acquire the lock as soon as it is
released, whereas a mutex lock will need to be woken by the operating system before it
can get the lock.
The disadvantage is that a spin lock will spin on a virtual CPU monopolizing that
resource. In comparison, a mutex lock will sleep and free the virtual CPU for another
thread to use.
Semaphores
Semaphores are counters that can be either incremented or decremented. An example
might be a buffer that has a fixed size. Every time an element is added to a buffer, the
number of available positions is decreased. Every time an element is removed, the
number available is increased.
Semaphores can also be used to mimic mutexes; if there is only one element in the
semaphore, then it can be either acquired or available, exactly as a mutex can be either
locked or unlocked.
Semaphores will also signal or wake up threads that are waiting on them to use available
resources; hence, they can be used for signaling between threads. For example, a thread
might set a semaphore once it has completed some initialization.
Other threads could wait on the semaphore and be signaled to start work once the
initialization is complete.
Semaphore can be acquired by wait operation and can be released by signal operation.
The wait operation decrements the value of semaphore variable by 1. If the value is less
than or equal to zero, then the operation blocks until the value of the semaphore becomes
positive. The signal or post operation increments the value of the semaphore by 1.
Readers-Writer Locks
A readerswriter lock (or multiple-reader lock) allows many threads to read the shared
data but can then lock the readers threads out to allow one thread to acquire a writer lock
to modify the data. A writer cannot acquire the write lock until all the readers have
released their reader locks.
For this reason, the locks tend to be biased toward writers; as soon as one is queued, the
lock stops allowing further readers to enter. This action causes the number of readers
holding the lock to diminish and will eventually allow the writer to get exclusive access
to the lock.
Barriers
There are situations where a number of threads have to all complete their work before
any of the threads can start on the next task.
For example, suppose a number of threads compute the values stored in a matrix. The
variable total needs to be calculated using the values stored in the matrix. A barrier can
be used to ensure that all the threads complete their computation of the matrix before the
variable total is calculated. The code below shows a situation using a barrier to separate
the calculation of a variable from its use.
The variable total can be computed only when all threads have reached the barrier. This
avoids the situation where one of the threads is still completing its computations while
the other threads start using the results of the calculations.
Notice that another barrier could well be needed after the computation of the value for
total if that value is then used in further calculations
Example:
• Two threads are deadlocked
• If thread1 has already acquired lock A and thread2 has already acquired lock B,
then
• A cannot make forward progress because it is waiting for lock B
• Thread2 cannot make progress because it is waiting for lock A
Deadlock occurs
• Two routines update1() and update2() each have an outer loop
• Routine update1() acquires lock A and then attempts to acquire lock B
• Routine update2() acquires lock B and then attempts to acquire lock A
Avoiding deadlocks
• Ensure that threads always acquire the locks in the same order
• If thread2 acquired the locks in the order A and then B,
• Stall while waiting for lock A without having first acquired lock B
• Enable thread1 to acquire B and then eventually release both locks allowing
thread2 to make progress
LIVELOCKS
• A condition that occurs when two or more threads continually change their state in
response to changes in the other threads.
• Livelock is different from deadlock in a way that both the processes involved in
livelock are repeatedly changing their states with regard to each other and not
progressing.
• Traps threads in an unending loop releasing and acquiring locks
• Caused by code to back out of deadlocks.
• Threads are unable to make progress although they are not blocked
• Tasks enter infinite loop of operation that lead to nothing
Example:
• Two people on Landing
• Each time one takes a step to the side, the other mirrors that step, neither moves
forward
This can be avoided by ensuring that only one process (chosen randomly or by priority)
takes action.
Mechanism to avoid Deadlock
▪ If the thread cannot obtain the second lock it requires, it releases the lock that it
already holds.
▪ canAcquire() returns immediately either having acquired the lock or having failed
to acquire the lock.
▪ Each thread acquires a lock and then attempts to acquire the second lock that it
needs.
▪ If it fails to acquire the second lock
o Releases the holding lock before attempting to acquire both locks again
▪ When the thread successfully acquire both the locks
o Exits the loop
o Application will make no forward progress
The pseudo code below depicts the procedure for using an existing shared memory
region. Each process should close the shared memory segment after using it
ID = Open Shared Memory( Descriptor );
Memory = Map Shared Memory( ID );
...
Close Shared Memory( ID );
2. Condition Variables
• Communicate readiness between threads by enabling a thread to be woken up
when a condition becomes true.
• Without condition variables, the waiting thread would have to use some form of
polling to check whether the condition had become true
• Example: Producer Consumer problem
The condition variable is used in solving Producer-consumer problem. The producer
thread produces data item and add into a queue(Enqueue). The consumer thread
consumes data from the queue. Before adding data, the producer has to check whether the
queue has an empty position to accommodate the new data item. Before consuming data
item, the consumer will check whether the queue contains data item in the queue or not.
These conditions are checked using condition variables.
A producer can produce one item while the consumer is consuming another item. The
process of producer and consumer must be synchronized. Therefore before adding an
item in the queue, the producer thread will acquire mutex lock. After adding the item in
the queue, the thread will release the mutex lock. If there is only one item exists in the
queue, then the producer thread will wakeup only one consumer thread by signalling it.
Producer Thread Adding an Item to the Queue
Acquire Mutex();
Add Item to Queue();
If ( Only One Item on Queue )
{
Signal Conditions Met();
}
Release Mutex();
If there are no items in the queue, then the consumer thread will go to sleep mode and has
to be woken up when item is added in the queue by signalling the mutex lock by the
producer thread. If there are more items exist in the queue, then the consumer thread can
consume the items one by one. The consumer thread will release the mutex lock either if
there are no items in the queue or all the items are consumed by the consumer thread.
Consumer Thread removing Items from Queue
Acquire Mutex();
Repeat Item = 0;
If ( No Items on Queue() )
{
Wait on Condition Variable();
}
If (Item on Queue())
{
Item = remove from Queue();
}
Until ( Item != 0 );
Release Mutex();
}
Release Mutex();
Until ( Item!=0 );
▪ The problem with the code is the first if condition. If there are no items on the
queue, then the mutex lock is acquired, and the thread waits on the condition
variable.
▪ However, the producer thread could have placed an item and signaled the
consumer thread between the consumer thread executing the if statement and
acquiring the mutex.
▪ When this happens, the consumer thread waits on the condition variable
indefinitely because the producer thread, signals only when it places the first item
into the queue.
▪ Lost wake-up Occurs when the signal to wakeup the waiting thread is sent before
the thread is ready to receive it
3. Signals and events
▪ UNIX mechanism where one process can send a signal to another process when an
event occur.
▪ Have a handler in the receiving process to perform some task upon the receipt of
the message
Examples:
▪ Signals
o Pressing ^c
o Causes a SIGKILL signal to be sent to the process
o Stop the running application
▪ Events
o Handling of Keyboard presses
o Pressing one of the mouse buttons will send a click event to the target
window
4. Message queues
▪ A structure that can be shared between multiple processes
▪ Messages can be placed into the queue and will be removed in the same order
in which they were added
▪ Like shared memory segment, message queue also uses a descriptor which is
the location of the file in the file system.
▪ The descriptor is mainly used to create the message queue and also used to
attach with the existing message queue.
▪ Messages that to be shared is put into message queue. The first message is
located in the first position of the queue and can be retrieved first.
Example:
Creating and Placing Messages into a Queue
ID = Open Message Queue Queue(Descriptor );
Put Message in Queue( ID, Message );
...
Close Message Queue( ID );
Delete Message Queue( Description );
Opening a Queue and Receiving Messages
ID=Open Message Queue ID(Descriptor);
Message=Remove Message from Queue(ID);
...
Close Message Queue(ID);
5. Named pipes
▪ Pipe – Used to pass data from one process to another
▪ Example
▪ ls- lists all files in a directory
▪ wc – counts the number of lines, words, and characters in the input
▪ command ls is piped into wc command