0% found this document useful (0 votes)
10 views41 pages

Treedoc Largescale Ladis09

A Commutative Replicated Data Type (CRDT) is one where all concurrent operations commute. The replicas of a CRDT converge automatically, without complex concurrency control. This paper describes Treedoc, a novel CRDT design for cooperative text editing. An essential property is that the identifiers of Treedoc atoms are selected from a dense space. We discuss practical alternatives for implementing the identifier space based on an extended binary tree. We also discuss storage alternatives for dat

Uploaded by

ms1234
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views41 pages

Treedoc Largescale Ladis09

A Commutative Replicated Data Type (CRDT) is one where all concurrent operations commute. The replicas of a CRDT converge automatically, without complex concurrency control. This paper describes Treedoc, a novel CRDT design for cooperative text editing. An essential property is that the identifiers of Treedoc atoms are selected from a dense space. We discuss practical alternatives for implementing the identifier space based on an extended binary tree. We also discuss storage alternatives for dat

Uploaded by

ms1234
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
You are on page 1/ 41

Consistency without

concurrency control
in large,Marcdynamic systems
Shapiro, INRIA & LIP6
Nuno Preguiça, Universidade Nova de Lisboa
Mihai Leția, ENS Lyon
Consistency without
concurrency control
f (x1) g(x1)
x
x1
g(x2)
x2
g(x3) f (x3)
x3

Object x, operation f(x)


propose f(x1)
eventually replayf(x2), f(x3), ...
If f || g commute: converges safely
without concurrency control
Commutative Replicated Data Type
(CRDT): Designed for commutative
Consistency without concurrency control in large, dynamic systems
A sequence CRDT
Treedoc = sequence of elements:
insert-at-pos, delete
Commutative when concurrent
Minimise overhead
Scalable
A commutative replicated data type
for cooperative editing, ICDCS
2009
Focus today:
Garbage collection
vs. scale
Consistency without concurrency control in large, dynamic systems
Commutative updates

R
0 1
I A
0 1 0
L N I


=LLI’ N R I A
Naming tree: minimal, self-adjusting:
logarithmic
TID: path = [0|1]*
Contents: infix order
insert adds leaf ⇒ non-destructive, TIDs don’t
change
Delete: tombstone, TIDs don't change
Consistency without concurrency control in large, dynamic systems
Wikipedia GWB page: space
overhead
kB
serialised

Treedoc

wikido
c
×10 revisions
Consistency without concurrency control in large, dynamic systems
Rebalance
R R

I A I A

L N I L N I

’ ’
=L'INR =L'INR
I I

Consistency without concurrency control in large, dynamic systems


Rebalance
N R

’ I I A
!!
L I R • L N I
!

=L'INR =L'INR
I I !!!
Invalidates TIDs:
Frame of reference = epoch
Requires agreement
Pervasive!
e.g. Vector Clocks
Consistency without concurrency control in large, dynamic systems
Rebalance in large, dynamic
systems
Rebalance requires consensus
Consensus requires small, stable
membership
Large communities?!
Dynamic scenarios?!
Solution: two tiers
Core: rebalancing (and updates)
Nebula: updates (and rebalancing)
Migration protocol

Consistency without concurrency control in large, dynamic systems


Core Nebula
Group membership
Small, stable
Rebalance: Arbitrary membership
Unanimous agreement Large, dynamic
(2-phase commit) Communicate with sites in
All core sites in same same epoch only
epoch Catch-up to rebalance,
join core epoch

Consistency without concurrency control in large, dynamic systems


Core Nebula

Group membership
Arbitrary membership
Small, stable
Large, dynamic
Rebalance:
Communicate with sites in
Unanimous agreement
same epoch only
(2-phase commit)
Catch-up to rebalance,
All core sites in same join core epoch
epoch

Consistency without concurrency control in large, dynamic systems


Catch-up protocol summary
Core Nebula
Send old updates
replay core's updates
Send rebalance
Replay rebalance,
ignoring nebula updates.
Transform nebula updates.
Send transformed updates.
Replay nebula
updates
Consistency without concurrency control in large, dynamic systems
Catch-up protocol

Core Site Nebula Site ins(L,00


del(1 )
) R R ins(',001
)
I A I A

N I L N I

del(1) ins(L,00)

ins(',001)

Consistency without concurrency control in large, dynamic systems


Catch-up protocol

Core Site Nebula Site ins(L,00


rebalanc )
e R R ins(',001
)
I A I A

N I L N I

del(1) ins(L,00)

rebalance ins(',001)

Consistency without concurrency control in large, dynamic systems


Catch-up protocol

Core Site Nebula Site

I R

N • I A

I R • • L N I

del(1) ins(L,00)

rebalance ins(',001)

Consistency without concurrency control in large, dynamic systems


Catch-up protocol

Core Site Nebula Site

I R

N • I A

I R • • L N I

del(1)
del(1 ins(L,00)
)
rebalanc
rebalance ins(',001)
e
Consistency without concurrency control in large, dynamic systems
Catch-up protocol

Core Site Nebula Site

I I

N • N •
I R • • I R • •
L L

’ ins(L,000
’ ins(L,000)
)
ins(',0001)

Consistency without concurrency control in large, dynamic systems


ins(',0001
Summary
CRDT:
Convergence ensured
Design for commutativity
GC cannot be ignored
Requires commitment
Pervasive issue
Large-scale commitment:
Core / Nebula
To synchronise: catch-up +
migration
Consistency without concurrency control in large, dynamic systems
Future work
More CRDTs
Understanding CRDTs: what invariants can be
CRDTized
Approximations of CRDTs
Data types for consistent cloud computing
without concurrency control

Consistency without concurrency control in large, dynamic systems


Consistency without concurrency control in large, dynamic systems
Replicated sequence
INRI INRI
A A
insert (0, delete
"L") (5)
LINRI INR
A I
insert (1,
"'")
L'INRI insert (0,
A "L")(1,
delete insert
(5) "'")
L'INI L'INR
A I

Consistency without concurrency control in large, dynamic systems


State of the art
Serialising updates
Single, global execution order
Lock-step: Poor user experience
Doesn't scale
Operational Transformation
Local execution orders
Modify arguments to take into account
concurrent operations scheduled before
Weak theory, hidden complexity
Insight: design data type to be commutative

Consistency without concurrency control in large, dynamic systems


Commutative Replicated
Data Type (CRDT)
Assuming:
All concurrent operations commute
Non-concurrent operations execute in happens-
before order
All operations eventually execute at every
replica
Then replicas eventually converge to correct
value
Design data types to support commutative
operations
Consistency without concurrency control in large, dynamic systems
Concurrent inserts
Exceptions to binary tree: disambiguator
Concurrent inserts ordered by disambiguator
Path = site-ID? < [0|1], disambiguator? >*
Alternatives:
site identifier of initiator: short, but delete
leaves a tombstone
or: Unique ID of operation: long, immediate
delete

Consistency without concurrency control in large, dynamic systems


Causal ordering
Vector clocks:
Number of messages received from each site
Causal ordering
Filter duplicate messages
Efficient but grow indefinitely
Treedoc
TID encodes causal order
Duplicates idempotent
Approximate Vector Clocks + Treedoc
Consistency without concurrency control in large, dynamic systems
Rebalance requires
commitment
Commutativity of update || rebalance
Updates win
Rebalance does not impact update performance
Rebalance: unanimous agreement
Standard 2- or 3-phase commit
Initiator is coordinator
Other site: If concurrent update: “Not OK”
Off critical path!

Consistency without concurrency control in large, dynamic systems


Experiments
Estimate overheads, compare design
alternatives:
Atom granularity: word vs. line
Disambiguator: siteID+tombstone vs. unique
ID+immediate delete
Are trees unbalanced?
Effect of rebalance, heuristics
Replay update history of CVS, SVN, Wiki
repositories

Consistency without concurrency control in large, dynamic systems


Implementation alternatives
Disambiguator: next slide
Atom: character, word, line, paragraph
Fine-grain: structural overhead
Coarse-grain: “delete” artefacts
Data structure:
Tree: no TID storage, no GC interior nodes
vs. { (TID, atom) }: flexible, GC
Mixed
Arity: binary vs. 256-ary

Consistency without concurrency control in large, dynamic systems


Disambiguator design

Disambiguator options:

1 byte, no GC until rebalance

or 10 bytes, immediate GC (if leaf)


IStored in n°
thought every
1... node
but I was wrong

Intuitively which do you think is best?


Consistency without concurrency control in large, dynamic systems
Latex files

Consistency without concurrency control in large, dynamic systems


Latex / line

Consistency without concurrency control in large, dynamic systems


Summarize: % deleted
nodes (mean)

Consistency without concurrency control in large, dynamic systems


Atom granularity and deletes

% deleted longest TID avg TID

character 73.61 53 27.907

word 69.27 16 6.877

paragraph 81.90 11 4.125

(200 revisions
only)
Consistency without concurrency control in large, dynamic systems
Wikipedia GWB benchmark
en.wikipedia.org/George_W_Bush

150 kB text
42,000 revisions: most frequently revised
Biggest revision: 100 kB
Benchmark data
Treedoc node = paragraph
First 15,000 revisions = 350,000 updates
Biggest revision < 2 s; average: 0.24 s/revision
Rebalance every 1,000 revisions
256-ary tree

Consistency without concurrency control in large, dynamic systems


Wikipedia GWB page
nodes

tombstones

liv
e ×1000
ops
Consistency without concurrency control in large, dynamic systems
Time per operation
μs

no rebalance

with rebalance

×1000 ops

Consistency without concurrency control in large, dynamic systems


Size, frequency of TIDs
number number

bytes [0,255]
relative

Consistency without concurrency control in large, dynamic systems


flat vs. WOOT vs. Treedoc

Consistency without concurrency control in large, dynamic systems


Summary: garbage collection
Efficiency, GC are important
Tree + re-balance
Requires commitment (move off
critical path)
Pervasive issue
Large-scale commitment:
Core: commitment
Nebula: asynchronous updates
Occasional synchronisation:
migration

Consistency without concurrency control in large, dynamic systems


Summary: CRDT
CRDT:
Convergence ensured
Design for commutativity
Techniques for commutativity:
Partial order
Non-destructive updates
Identifiers don't depend on
concurrent activity
Consensus: off critical path

Consistency without concurrency control in large, dynamic systems


Commutativity:

Genuine vs.u, vprecedence


Commutativity: ∀S, : S.u.v ≡ S.v.u
Genuine: both u and v take effect
“Addition, subtraction commutative”
Non-destructive updates
Precedence: only one of u, v ultimately takes
effect
“File writes commute under Last-Writer-Wins”
Destructive updates

Consistency without concurrency control in large, dynamic systems


Future work
Integrate with real editor
(Single-document) Transactions
Generalisation
Characterise invariants
More data types: set, multilog, others?
Mixed data types: improve
cf. persistent data structures
When consensus required:
Move off critical path
Speculation + conflict resolution
Consistency without concurrency control in large, dynamic systems

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy