Treedoc Largescale Ladis09
Treedoc Largescale Ladis09
concurrency control
in large,Marcdynamic systems
Shapiro, INRIA & LIP6
Nuno Preguiça, Universidade Nova de Lisboa
Mihai Leția, ENS Lyon
Consistency without
concurrency control
f (x1) g(x1)
x
x1
g(x2)
x2
g(x3) f (x3)
x3
R
0 1
I A
0 1 0
L N I
’
=LLI’ N R I A
Naming tree: minimal, self-adjusting:
logarithmic
TID: path = [0|1]*
Contents: infix order
insert adds leaf ⇒ non-destructive, TIDs don’t
change
Delete: tombstone, TIDs don't change
Consistency without concurrency control in large, dynamic systems
Wikipedia GWB page: space
overhead
kB
serialised
Treedoc
wikido
c
×10 revisions
Consistency without concurrency control in large, dynamic systems
Rebalance
R R
I A I A
L N I L N I
’ ’
=L'INR =L'INR
I I
’ I I A
!!
L I R • L N I
!
’
=L'INR =L'INR
I I !!!
Invalidates TIDs:
Frame of reference = epoch
Requires agreement
Pervasive!
e.g. Vector Clocks
Consistency without concurrency control in large, dynamic systems
Rebalance in large, dynamic
systems
Rebalance requires consensus
Consensus requires small, stable
membership
Large communities?!
Dynamic scenarios?!
Solution: two tiers
Core: rebalancing (and updates)
Nebula: updates (and rebalancing)
Migration protocol
Group membership
Arbitrary membership
Small, stable
Large, dynamic
Rebalance:
Communicate with sites in
Unanimous agreement
same epoch only
(2-phase commit)
Catch-up to rebalance,
All core sites in same join core epoch
epoch
N I L N I
del(1) ins(L,00)
ins(',001)
N I L N I
del(1) ins(L,00)
rebalance ins(',001)
I R
N • I A
I R • • L N I
del(1) ins(L,00)
rebalance ins(',001)
I R
N • I A
I R • • L N I
del(1)
del(1 ins(L,00)
)
rebalanc
rebalance ins(',001)
e
Consistency without concurrency control in large, dynamic systems
Catch-up protocol
I I
N • N •
I R • • I R • •
L L
’ ins(L,000
’ ins(L,000)
)
ins(',0001)
Disambiguator options:
(200 revisions
only)
Consistency without concurrency control in large, dynamic systems
Wikipedia GWB benchmark
en.wikipedia.org/George_W_Bush
150 kB text
42,000 revisions: most frequently revised
Biggest revision: 100 kB
Benchmark data
Treedoc node = paragraph
First 15,000 revisions = 350,000 updates
Biggest revision < 2 s; average: 0.24 s/revision
Rebalance every 1,000 revisions
256-ary tree
tombstones
liv
e ×1000
ops
Consistency without concurrency control in large, dynamic systems
Time per operation
μs
no rebalance
with rebalance
×1000 ops
bytes [0,255]
relative