0% found this document useful (0 votes)
61 views

Hardware and Software Synchronization Advanced Computer Architecture COMP 140 Thursday June 26, 2014

The document discusses hardware and software synchronization in multithreaded systems. It describes how hardware provides atomic instructions to retrieve and change shared memory values, enabling mutual exclusion through locks. Software synchronization mechanisms like locks control thread cooperation and access to shared resources, enforcing ordering through critical sections. Hardware support for atomic operations allows enforcing single-thread access to critical sections for mutual exclusion.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Hardware and Software Synchronization Advanced Computer Architecture COMP 140 Thursday June 26, 2014

The document discusses hardware and software synchronization in multithreaded systems. It describes how hardware provides atomic instructions to retrieve and change shared memory values, enabling mutual exclusion through locks. Software synchronization mechanisms like locks control thread cooperation and access to shared resources, enforcing ordering through critical sections. Hardware support for atomic operations allows enforcing single-thread access to critical sections for mutual exclusion.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Hardware and Software Synchronization

Advanced Computer Architecture


COMP 140
Thursday June 26, 2014
Synchronization

■ In a multithread system, synchronization is extremely


important in a memory system — the key hardware capability
is an uninterruptible instruction or instruction sequence
capable of atomically retrieving and changing a value.
!
■ Basic problem:

■ If two concurrent threads are accessing a shared variable,


and that variable is read/modified/written by those threads,
then access to the variable must be controlled to avoid
erroneous behavior
Synchronization

■ Once a hardware synchronization mechanism exists, software


synchronization mechanisms are then constructed using the
capability.

!
■ “Locks” are used to create “mutual exclusion” and to
implement more complex synchronization mechanisms.
Synchronization
■ Threads cooperate in multithreaded programs

■ To share resources, access shared data structures

■ Threads accessing a memory cache in a Web server

■ To coordinate their execution

■ One thread executes relative to another (recall ping-pong)

■ For correctness, we need to control this cooperation

■ Threads interleave executions arbitrarily and at different


rates

■ Scheduling is not under program control

■ Cooperation is controlled using synchronization

■ Restrict the possible interleavings

■ We’ll discuss in terms of threads, also applies to processes


Synchronization Example
■ Suppose we have to implement a function to handle
withdrawals from a bank account:

withdraw (account, amount) {


balance = get_balance(account);
balance = balance – amount;
put_balance(account, balance);
return balance;
}
• Now suppose that you and your significant other share a bank
account with a balance of $1000.

• Then you each go to separate ATM machines and


simultaneously withdraw $100 from the account.
Synchronization Example
■ We’ll represent the situation by creating a separate thread for
each person to do the withdrawals

■ These threads run in the same bank process:

!
withdraw (account, amount) { withdraw (account, amount) {
! balance = get_balance(account); balance = get_balance(account);
! balance = balance – amount; balance = balance – amount;
put_balance(account, balance); put_balance(account, balance);
! return balance; return balance;
} }
!
■ What’s the problem with this implementation?

■ Think about potential schedules of these two threads


Synchronization Example
■ The problem is that the execution of the two threads can be
interleaved:

! balance = get_balance(account);
balance = balance – amount;
! Execution

balance = get_balance(account);
! sequence
Context

balance = balance – amount;


by CPU put_balance(account, balance); Switch
!
put_balance(account, balance);
!
■ What is the balance of the account now?

■ This is known as a race condition

■ Context switch

■ Each thread is “racing” to put_balance() before the other


Mutual Exclusion
■ One way to ensure who wins the race is to only let one thread
“compete”; this is called mutual exclusion

■ Code that uses mutual exclusion to synchronize its execution


is called a critical section

■ Only one thread at a time can execute in the critical section

■ All other threads are forced to wait on entry

■ When a thread leaves a critical section, another can enter

withdraw (account, amount) {

}
balance = get_balance(account);
balance = balance – amount;
put_balance(account, balance);
return balance;
}
Critical
Section
Critical Section Requirements
1. Mutual exclusion

• If one thread is in the critical section, then no other is

2. Progress

• If some thread T is not in the critical section, then T cannot


prevent some other thread S from entering the critical
section

3. Bounded waiting (no starvation)

• If some thread T is waiting on the critical section, then T will


eventually enter the critical section

4. No assumptions on performance

• Requirements must be met with any number of CPUs with


arbitrary relative speeds
Locks

• One way to implement critical sections is to “lock the door”


on the way in, and unlock it again on the way out

!
• A lock is an object in memory providing two operations

• acquire(): before entering the critical section

• release(): after leaving a critical section

!
• Threads pair calls to acquire() and release()

• Between acquire()/release(), the thread holds the lock

• acquire() does not return until any previous holder releases

• What can happen if the calls are not paired?


Using Locks
acquire(lock);
!
withdraw (account, amount) { balance = get_balance(account);
acquire(lock);

}
! balance = balance – amount;
balance = get_balance(account);
! Critical acquire(lock);
balance = balance – amount;
put_balance(account, balance); Section put_balance(account, balance);
! release(lock);
release(lock);
return
! balance; balance = get_balance(account);
} balance = balance – amount;
! put_balance(account, balance);
release(lock);
!
!
• What happens when green tries to acquire the lock?

• Why is the “return” outside the critical section? Is this ok?

• What happens when a third thread calls acquire?


First Attempt: Spin Locks
■ How do we implement locks? Here is one attempt:

!struct lock {
int held = 0;
!}
! busy-wait (spin-wait)
!void acquire (lock) { for lock to be released
while (lock->held);
! lock->held = 1;
!}
!
!void release (lock) {
lock->held = 0;
!}
!
■ This is called a spinlock because a thread spins waiting for
the lock to be released

■ Does this work?


First Attempt: Spin Locks
■ Does this work? Nope. Two independent threads may both
notice that a lock has been released and thereby acquire it.
struct lock {
int held = 0;
}
! A context switch can
void acquire (lock) { occur here, causing a
while (lock->held); race condition
lock->held = 1;
}
!
void release (lock) {
lock->held = 0;
}
Can we take turns?
■ What if we took turns (assuming only two threads)

!
!struct lock {
int held = 0;
!}
!
!void acquire (lock) {
while (lock->turn != this_thread);
! lock->held = 1;
!}
!
!void release (lock) {
lock->turn = other_thread;
!}
!
■ Does this work? Why not?
Can we take turns? Nope. Can we declare intent?
■ A thread doesn’t know if the other thread is ready.

■ What if we wait until the other thread isn’t interested:

!struct lock {
int interested[2] = [FALSE, FALSE];
!}
!
!void acquire (lock) {
lock->interested[this_thread] = TRUE;
! while (lock->interested[other_thread]);
!}
!
!void release (lock) {
lock->interested[this_thread] = FALSE;
!}
!
■ Now does it work?
Peterson’s Algorithm
■ Take turns only if somebody else is interested; otherwise just go!

!
struct lock {
! int turn = 0;
int interested[2] = [FALSE, FALSE];
! }
!
! void acquire (lock) {
lock->interested[this_thread] = TRUE;
! turn = other_thread;
! while (lock->interested[other_thread] && turn==other_thread);
}
! !
void release (lock) {
! lock->interested[this_thread] = FALSE;
}
!
■ Finally works, but only if we know who else is playing.
Hardware Support to the Rescue!
■ If we build atomic operations into the hardware, then we can
enforce single-thread critical sections, enforcing mutual
exclusion:

■ Two processes can never be in the critical section at the same


time.

■ We now have code that executes “all or nothing”

!
!
!
!
!
!
Hardware Synchronization
■ Basic building blocks:

■ Atomic exchange

■ Swaps register with memory location

■ Test-and-set

■ Sets under condition

■ Fetch-and-increment

■ Reads original value from memory and increments it in memory

■ Requires memory read and write in uninterruptable instruction

!
■ Load linked/store conditional

■ If the contents of the memory location specified by the load linked


are changed before the store conditional to the same address, the
store conditional fails
Atomic Exchange Synchronization
■ Atomic exchange interchanges a value in a register for a value
in memory

■ Assume a simple lock, where value 0 is used to indicate that


the lock is free, and 1 is used to indicate that the lock is
unavailable.

■ The processor thread tries to set the lock by doing an


exchange of 1, which is in a register, with the memory
address corresponding to the lock. The value that is returned
is 1 if some other processor had already claimed access, and
0 otherwise (success).

■ If successful, the value is changed to 1, preventing any


competing exchange from retrieving 0.
Test and Set Synchronization
■ Test and Set semantics (similar to Atomic Exchange):

■ Record the old value and

■ Set the value to indicate available and

■ Return the old value

■ Hardware automatically executes test and set atomically:

! bool test_and_set (bool *flag) {


bool old = *flag;
! *flag = True;
return old;
! }
!
■ When executing test-and-set on “flag”

■ What is value of flag afterwards if it was initially False? True?

■ What is the return result if flag was initially False? True?

Test and Set Synchronization


■ Simple lock implementation with Test and Set:

struct lock {
! int held = 0;
!}
!
void acquire (lock) ■{ When will the while return?

while (test-and-set(&lock->held));
!}
void release (lock) {
! lock->held = 0;
}
!
■ The problem with this spin-lock is that it is wasteful:

■ On single processor machines, when a thread is spinning, the thread holding

the lock isn’t making progress!

■ The lock holder gave up the CPU in the first place by either sleeping/

yielding, or involuntarily.

■ Not as bad on multiprocessor machines, but still need to be used cautiously:

only for a short period of time.


Spin Locks
■ With no cache coherence, we could build a spin lock by
keeping the lock variable in memory.

■ A processor could continually try to acquire the lock using


an atomic operation (e.g., atomic exchange) and test
whether the exchange returned the lock as free.

■ To release the lock, the processor just stores the value 0 to


the lock.

■ MIPS spinlock whose address is in $1:

!
DADDUI $2,$0,#1 ;put 1 into $2

lockit: EXCH $2,0($1) ;atomic exchange

BNEZ $2,lockit ;already locked?


Spin Locks: coherence
■ With coherence, we can use the coherence mechanism to
cache the locks and maintain them.

■ Caching has two advantages:

■ Spinning happens on cached values, not main memory

■ There is often locality in lock accesses: the processor

that used the lock last will use it again. The lock resides
in that processor’s cache, reducing the time to acquire it.

■ We do need a change to our spin procedure

■ we don’t want multiple processors to generate a write

while spinning — most would be write misses

■ each processor would be trying to obtain the lock

variable in an exclusive state


Spin Locks: coherence
■ Modify our spin lock procedure so it spins by doing reads
on a local copy of the lock until it successfully sees that
the lock is available.

■ Then, attempt to acquire the lock by doing a swap.

■ First, read the lock variable to test its state

■ Keep reading and testing until the value of the read

indicates the lock is unlocked.

■ Race against all other processes that want the lock, by

attempting a swap.

■ The winner will see 0 (from the swap) and the losers will

see 1. When the winner finishes its critical section, it


stores a 0 in the lock, and the race is on again.

Spin Locks: coherence


■ Spin lock code (0 is unlocked, 1 is locked):

!
lockit: LD R2,0(R1) ;load of lock

BNEZ R2,lockit ;not available-spin

DADDUI R2,R0,#1 ;load locked value

EXCH R2,0(R1) ;swap

BNEZ R2,lockit ;branch if lock wasn’t 0

!
■ How does this use the coherence mechanism?
Spin Locks: coherence
■ Once the processor with the lock stores a 0 into the lock,
all other caches are invalidated, and much fetch the new
value to update their copy of the lock.

■ One cache gets the copy of the unlocked value (0) first and
performs the swap.

■ When the cache miss of other processors is satisfied, they


find that the variable is already locked, so they go back to
testing and spinning.
Spin Locks: coherence
Models of Memory Consistency
Processor 1: Processor 2:

A=0 B=0

… …

A=1 B=1

if (B==0) … if (A==0) …

• Should be impossible for both if-statements to be evaluated as true

• Delayed write invalidate?

!
• Sequential consistency:

• Result of execution should be the same as long as:

• Accesses on each processor were kept in order

• Accesses on different processors were arbitrarily interleaved


Memory consistency
• To implement, delay completion of all memory accesses
until all invalidations caused by the access are completed

• Reduces performance!

!
• Example: Suppose we have a processor where a write miss takes 50
cycles to establish ownership, 10 cycles to issue each invalidate after
ownership is established, and 80 cycles for an invalidate to complete
and be acknowledged once it is issued. Assuming that four other
processors share a cache block, how long does a write miss stall the
writing processor if the processor is sequentially consistent? Assume
that the invalidates must be explicitly acknowledged before the
coherence controller knows they are completed. Suppose we could
continue executing after obtaining ownership for the write miss without
waiting for the invalidates; how long would the write take?
Memory Consistency
• Answer: When we wait for invalidates, each write takes the sum of the
ownership time plus the time to complete the invalidates. Since the
invalidates can overlap, we need only worry about the last one, which
starts 10 + 10 + 10 + 10 = 40 cycles after ownership is established.
Hence, the total time for the write is 50 + 40 + 80 = 170cycles. In
comparison, the ownership time is only 50 cycles. With appropriate
write buffer implementations, it is even possible to continue before
ownership is established.

• Alternatives:

• Program-enforced synchronization to force write on


processor to occur before read on the other processor

• Requires synchronization object for A and another for B

• “Unlock” after write

• “Lock” after read


Relaxed Consistency Models
• Rules:

• X → Y

• Operation X must complete before operation Y is done

• Sequential consistency requires:

• R → W, R → R, W → R, W → W

!
• Relax W → R

• “Total store ordering” (“processor consistency”). Retains order among writes, so


many programs operate just fine.

! If the programmer doesn’t


• Relax W → W
understand the model
• “Partial store order”
under which the processor
!
behaves — use expert-built
• Relax R → W and R → R
standard synchronization
• “Weak ordering” and “release consistency”
libraries!
Multicore Characteristics Summary
References

Peterson’s Algorithm: https://en.wikipedia.org/wiki/


Peterson's_algorithm

Synchronization with Locks: http://cseweb.ucsd.edu/classes/fa05/


cse120/lectures/120-l5.pdf

Dijkstra’s initial paper on concurrent programming: Dijkstra, E. W.


(1965). "Solution of a problem in concurrent programming control".
Communications of the ACM 8 (9): 569. doi:10.1145/365559.365617

Shared memory Tutorial: http://www.hpl.hp.com/techreports/


Compaq-DEC/WRL-95-7.pdf

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy