Hardware and Software Synchronization Advanced Computer Architecture COMP 140 Thursday June 26, 2014
Hardware and Software Synchronization Advanced Computer Architecture COMP 140 Thursday June 26, 2014
!
■ “Locks” are used to create “mutual exclusion” and to
implement more complex synchronization mechanisms.
Synchronization
■ Threads cooperate in multithreaded programs
!
withdraw (account, amount) { withdraw (account, amount) {
! balance = get_balance(account); balance = get_balance(account);
! balance = balance – amount; balance = balance – amount;
put_balance(account, balance); put_balance(account, balance);
! return balance; return balance;
} }
!
■ What’s the problem with this implementation?
! balance = get_balance(account);
balance = balance – amount;
! Execution
balance = get_balance(account);
! sequence
Context
■ Context switch
}
balance = get_balance(account);
balance = balance – amount;
put_balance(account, balance);
return balance;
}
Critical
Section
Critical Section Requirements
1. Mutual exclusion
2. Progress
4. No assumptions on performance
!
• A lock is an object in memory providing two operations
!
• Threads pair calls to acquire() and release()
}
! balance = balance – amount;
balance = get_balance(account);
! Critical acquire(lock);
balance = balance – amount;
put_balance(account, balance); Section put_balance(account, balance);
! release(lock);
release(lock);
return
! balance; balance = get_balance(account);
} balance = balance – amount;
! put_balance(account, balance);
release(lock);
!
!
• What happens when green tries to acquire the lock?
!struct lock {
int held = 0;
!}
! busy-wait (spin-wait)
!void acquire (lock) { for lock to be released
while (lock->held);
! lock->held = 1;
!}
!
!void release (lock) {
lock->held = 0;
!}
!
■ This is called a spinlock because a thread spins waiting for
the lock to be released
!
!struct lock {
int held = 0;
!}
!
!void acquire (lock) {
while (lock->turn != this_thread);
! lock->held = 1;
!}
!
!void release (lock) {
lock->turn = other_thread;
!}
!
■ Does this work? Why not?
Can we take turns? Nope. Can we declare intent?
■ A thread doesn’t know if the other thread is ready.
!struct lock {
int interested[2] = [FALSE, FALSE];
!}
!
!void acquire (lock) {
lock->interested[this_thread] = TRUE;
! while (lock->interested[other_thread]);
!}
!
!void release (lock) {
lock->interested[this_thread] = FALSE;
!}
!
■ Now does it work?
Peterson’s Algorithm
■ Take turns only if somebody else is interested; otherwise just go!
!
struct lock {
! int turn = 0;
int interested[2] = [FALSE, FALSE];
! }
!
! void acquire (lock) {
lock->interested[this_thread] = TRUE;
! turn = other_thread;
! while (lock->interested[other_thread] && turn==other_thread);
}
! !
void release (lock) {
! lock->interested[this_thread] = FALSE;
}
!
■ Finally works, but only if we know who else is playing.
Hardware Support to the Rescue!
■ If we build atomic operations into the hardware, then we can
enforce single-thread critical sections, enforcing mutual
exclusion:
!
!
!
!
!
!
Hardware Synchronization
■ Basic building blocks:
■ Atomic exchange
■ Test-and-set
■ Fetch-and-increment
!
■ Load linked/store conditional
struct lock {
! int held = 0;
!}
!
void acquire (lock) ■{ When will the while return?
while (test-and-set(&lock->held));
!}
void release (lock) {
! lock->held = 0;
}
!
■ The problem with this spin-lock is that it is wasteful:
■ The lock holder gave up the CPU in the first place by either sleeping/
yielding, or involuntarily.
!
DADDUI $2,$0,#1 ;put 1 into $2
that used the lock last will use it again. The lock resides
in that processor’s cache, reducing the time to acquire it.
attempting a swap.
■ The winner will see 0 (from the swap) and the losers will
!
lockit: LD R2,0(R1) ;load of lock
!
■ How does this use the coherence mechanism?
Spin Locks: coherence
■ Once the processor with the lock stores a 0 into the lock,
all other caches are invalidated, and much fetch the new
value to update their copy of the lock.
■ One cache gets the copy of the unlocked value (0) first and
performs the swap.
A=0 B=0
… …
A=1 B=1
if (B==0) … if (A==0) …
!
• Sequential consistency:
• Reduces performance!
!
• Example: Suppose we have a processor where a write miss takes 50
cycles to establish ownership, 10 cycles to issue each invalidate after
ownership is established, and 80 cycles for an invalidate to complete
and be acknowledged once it is issued. Assuming that four other
processors share a cache block, how long does a write miss stall the
writing processor if the processor is sequentially consistent? Assume
that the invalidates must be explicitly acknowledged before the
coherence controller knows they are completed. Suppose we could
continue executing after obtaining ownership for the write miss without
waiting for the invalidates; how long would the write take?
Memory Consistency
• Answer: When we wait for invalidates, each write takes the sum of the
ownership time plus the time to complete the invalidates. Since the
invalidates can overlap, we need only worry about the last one, which
starts 10 + 10 + 10 + 10 = 40 cycles after ownership is established.
Hence, the total time for the write is 50 + 40 + 80 = 170cycles. In
comparison, the ownership time is only 50 cycles. With appropriate
write buffer implementations, it is even possible to continue before
ownership is established.
• Alternatives:
• X → Y
• R → W, R → R, W → R, W → W
!
• Relax W → R