Arrier Ynchronization: by Samir Najar
Arrier Ynchronization: by Samir Najar
By Samir Najar
By graphics coprocessor
TWO-PHASE RENDERING
while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; } while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; }
4
TWO-PHASE RENDERING
while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; } while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; }
5
Even phases
TWO-PHASE RENDERING
while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; } while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; }
6
odd phases
SYNCHRONIZATION PROBLEMS
How do threads stay in phase? Too early?
we render no frame before its time Recycle memory before frame is displayed
Too late?
10
Uh, oh
11
BARRIER SYNCHRONIZATION
0 0 0
barrier
12
BARRIER SYNCHRONIZATION
barrier
1 1 1
13
BARRIER SYNCHRONIZATION
barrier
BARRIER DEF.
A barrier is a coordination mechanism ( an algorithm) that forces processes which participate in a concurrent (or distributed) algorithm to wait until each one of them has reached a certain point in its program. The collection of these coordination point is called the barrier. Once all the processes have reached the barrier, they are all permitted to continue past the barrier.
15
WHY DO WE CARE?
Mostly of interest to
Elsewhere
Garbage collection Less common in systems programming Still important topic
16
before
after
a+b
a+b+c a+b+c +d
17
PARALLEL PREFIX
18
a+b
b+c
c+d
19
a+b
a+b+c a+b+c +d
20
PARALLEL PREFIX
21
PREFIX
class Prefix extends Thread { private int[] a; private int i; private Barrier b; public Prefix(int[] a, Barrier b, int i) { this.a = a; this.b = b; this.i = i; }
22
PREFIX
class Prefix extends Thread { private int[] a; private int i; private Barrier b; public Prefix(int[] a, Barrier b, int i) { this.a = a; Array of input this.b = b; values this.i = i; }
23
PREFIX
class Prefix extends Thread { private int[] a; private int i; private Barrier b; public Prefix(int[] a, Barrier b, int i) { this.a = a; Thread index this.b = b; this.i = i; }
24
PREFIX
class Prefix extends Thread { private int[] a; private int i; private Barrier b; public Prefix(int[] a, Barrier b, int i) { this.a = a; Shared barrier this.b = b; this.i = i; }
25
PREFIX
class Prefix extends Thread { private int[] a; private int i; Initialize fields private Barrier b; public Prefix(int[] a, Barrier b, int i) { this.a = a; this.b = b; this.i = i; }
26
27
BARRIER IMPLEMENTATIONS
Cache coherence
Latency
Symmetry
32
ATOMIC COUNTER
Shared
counter: atomic counter ranges over {0..n}, initially go: atomic bit, initial value is immaterial
0
Local
local.go: a bit, initial value is immaterial 1 2 3 4 5 6 local.go := 1 - go counter := counter + 1 If counter := n counter := 0 go : 1 - go Else await(local.go go ) fi
33
ATOMIC COUNTER
Shared
counter: atomic counter ranges over {0..n}, initially 0 go: atomic bit, initial value is immaterial
Local
local.go: 1
a bit, initial value is immaterial
local.go := 1 - go
2 3 4 5 6
ATOMIC COUNTER
Shared
counter: atomic counter ranges over {0..n}, initially 0 go: atomic bit, initial value is immaterial
Local
local.go: 1
a bit, initial value is immaterial
local.go := 1 - go
2
3 4 5 6
counter := counter + 1
ATOMIC COUNTER
Shared
counter: atomic counter ranges over {0..n}, initially 0 go: atomic bit, initial value is immaterial
Local
local.go: 1 2 3 4 5 6
a bit, initial value is immaterial
ATOMIC COUNTER
Shared
counter: atomic counter ranges over {0..n}, initially 0 go: atomic bit, initial value is immaterial
Local
local.go: 1 2 3 4 5 6
a bit, initial value is immaterial
ATOMIC COUNTER
Shared
counter: atomic counter ranges over {0..n}, initially 0 go: atomic bit, initial value is immaterial
Local
local.go: 1 2 3 4 5 6
a bit, initial value is immaterial
Notify all
ATOMIC COUNTER
Shared
counter: atomic counter ranges over {0..n}, initially 0 go: atomic bit, initial value is immaterial
Local
local.go: 1 2 3 4 5 6
a bit, initial value is immaterial
REMARKS
Only atomic counter should be initialized. The number of remote memory in CC model is O(1) Small number of steps
ATOMIC COUNTER
Shared counter:
atomic counter ranges over {0..n}, initially 0 go[1..n]: array of atomic bits, initial value is immaterial Local local.go: a bit, initial value is immaterial
1 local.go := 1 go[i] 2 3 4 5 6 counter := If counter counter for j=1 counter + 1 := n := 0 to n do go[j] : 1 go[j] od
41
REMARKS
Memory contention is reduced by letting each process spin only on a locally accessible variable. Still memory contention when accessing the counter which is shared by all processes
42
43
2-barrier
2-barrier
44
2-barrier
2-barrier
45
repeat node = [node/degreelevel+1] level := level + 1 local.go := go[level,node] counter[level,node] := counter[level,node] + 1 If counter[level,node] = degree then counter[level,node] := 0 if (level = logdegreen-1) then go[level,node] := 1 - go[level,node] else await (local.go go[level,node]) fi Until (local.go go[level,node])
while (level 0) do level := level -1 node := [i/degreelevel+1] go[level,node] := 1 - go[level,node] od
46
node = [node/degreelevel+1]
level := level + 1 local.go := go[level,node] counter[level,node] := counter[level,node] + 1 If counter[level,node] = degree then counter[level,node] := 0 if (level = logdegreen-1) then go[level,node] := 1 - go[level,node] else await (local.go go[level,node]) fi Until (local.go go[level,node]) while (level 0) do level := level -1 node := [i/degreelevel+1] go[level,node] := 1 - go[level,node] od
Access barrier number node, every level number of nodes decremented by half
47
level := level + 1
local.go := go[level,node] counter[level,node] := counter[level,node] + 1 If counter[level,node] = degree then counter[level,node] := 0 if (level = logdegreen-1) then go[level,node] := 1 - go[level,node] else await (local.go go[level,node]) fi Until (local.go go[level,node]) while (level 0) do level := level -1 node := [i/degreelevel+1] go[level,node] := 1 - go[level,node] od
48
local.go := go[level,node]
counter[level,node] := counter[level,node] + 1 If counter[level,node] = degree then counter[level,node] := 0 if (level = logdegreen-1) then go[level,node] := 1 - go[level,node] else await (local.go go[level,node]) fi Until (local.go go[level,node]) while (level 0) do level := level -1 node := [i/degreelevel+1] go[level,node] := 1 - go[level,node] od
49
counter[level,node] := counter[level,node] + 1
If counter[level,node] = degree then counter[level,node] := 0 if (level = logdegreen-1) then go[level,node] := 1 - go[level,node] else await (local.go go[level,node]) fi Until (local.go go[level,node]) while (level 0) do level := level -1 node := [i/degreelevel+1] go[level,node] := 1 - go[level,node] od
50
51
counter[level,node] := 0
if (level = logdegreen-1) then go[level,node] := 1 - go[level,node] else await (local.go go[level,node]) fi Until (local.go go[level,node])
while (level 0) do level := level -1 node := [i/degreelevel+1] go[level,node] := 1 - go[level,node] od
52
Root
53
Notify Root
go[level,node] := 1 - go[level,node]
else await (local.go go[level,node]) fi Until (local.go go[level,node]) while (level 0) do level := level -1 node := [i/degreelevel+1] go[level,node] := 1 - go[level,node] od
54
55
node := i level := -1 repeat node = [node/degreelevel+1] level := level + 1 local.go := go[level,node] counter[level,node] := counter[level,node] + 1 If counter[level,node] = degree then counter[level,node] := 0 if (level = logdegreen-1) then go[level,node] := 1 - go[level,node] else await (local.go go[level,node]) fi
end barrier
56
57
REMARKS
Small barriers
Cache behavior
Local spinning on bus-based architecture, not on statically-assigned locations. the number of remote memory references per process is at most O(log n) in the CC model and is unbounded in the DSM model.
58
DISSEMINATION BARRIER
n processes, {0..n-1} At round r
Thread i notifies thread i +2r (mod n) Waits for notification from process j +2r (mod n) = i
59
DISSEMINATION BARRIER
+1 +2 +4
60
DISSEMINATION DETAILS
Avoids interference
Avoid reinitializing fields
Use sense-reversing
Thread Arguments
Parity of round Sense (flips when round becomes 0)
61
DISSEMINATION BARRIER
Type flags = two dimensional array[0..1,0..logn-1] of atomic bits Shared Allflags: array[1..n] of flags, initial values are all 0 Local parity, sense: bits range over {0..1} initially both 0 round: register, range over {0,,log n-1} 1 2 3 4 5 6 for round:=0 to [log n -1] do allflags[i+2round (mod n)[parity,round]]:=sense await (allflags[i][parity,round] = sense) Od If parity = 1 then sense := 1 sense fi parity := 1 - parity
62
DISSEMINATION BARRIER
Type
flags =
bits
Shared
Allflags:
array[1..n] of flags, initial values are all 0
Local
parity, sense: round: register,
bits range over {0..1} initially both 0 range over {0,,log n-1}
63
DISSEMINATION BARRIER
for round:=0 to [log n -1] do
DISSEMINATION BARRIER
for round:=0 to [log n -1] do allflags[i+2round (mod n)[parity,round]]:=sense
Process i in round round notifies process i+2round (mod n) that has arrived
65
DISSEMINATION BARRIER
for round:=0 to [log n -1] do allflags[i+2round (mod n)[parity,round]]:=sense await (allflags[i][parity,round] = sense)
Process i wait for notification from process i+2round (mod n) by spinning on allflags[i][parity,round] bit
DISSEMINATION BARRIER
for round:=0 to [log n -1] do allflags[i+2round (mod n)[parity,round]]:=sense await (allflags[i][parity,round] = sense) Od If parity = 1 then sense := 1 sense fi parity := 1 parity
Prepare for next barrier In all even barrier episodes process i spins on the bits from allflags[i][0,*] and in odd barrier episodes process i spins on the bit from allflags[i][0,*]
67
REMARKS
68
IDEAS
Sense-reversing
69
Bibliography
[HS08] Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann, 2008 .
70