0% found this document useful (0 votes)
52 views

Memory Recalimation

This document discusses memory reclamation for recoverable mutual exclusion algorithms. It presents the first general algorithm for memory reclamation in the context of recoverable mutual exclusion that can be easily plugged into any RME algorithm while preserving correctness. The algorithm has an overhead of O(n2*size of node) space and is remote memory reference optimal with a constant overhead per process. It works for both cache-coherent and distributed shared memory models and handles memory reclamation safely in the presence of process failures and recovery using persistent memory.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Memory Recalimation

This document discusses memory reclamation for recoverable mutual exclusion algorithms. It presents the first general algorithm for memory reclamation in the context of recoverable mutual exclusion that can be easily plugged into any RME algorithm while preserving correctness. The algorithm has an overhead of O(n2*size of node) space and is remote memory reference optimal with a constant overhead per process. It works for both cache-coherent and distributed shared memory models and handles memory reclamation safely in the presence of process failures and recovery using persistent memory.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Memory Reclamation for Recoverable Mutual Exclusion

SAHIL DHOKED, The University of Texas at Dallas, USA


NEERAJ MITTAL, The University of Texas at Dallas, USA
Mutual exclusion (ME) is a commonly used technique to handle conflicts in concurrent systems. With recent advancements in non-
volatile memory technology, there is an increased focus on the problem of recoverable mutual exclusion (RME), a special case of ME
where processes can fail and recover. However, in order to ensure that the problem of RME is also of practical interest, and not just
arXiv:2103.01538v1 [cs.DC] 2 Mar 2021

a theoretical one, memory reclamation poses as a major obstacle in several RME algorithms. Often RME algorithms need to allocate
memory dynamically, which increases the memory footprint of the algorithm over time. These algorithms are typically not equipped
with suitable garbage collection due to concurrency and failures.
In this work, we present the first “general” recoverable algorithm for memory reclamation in the context of recoverable mutual
exclusion. Our algorithm can be plugged into any RME algorithm very easily and preserves all correctness property and most desirable
properties of the algorithm. The space overhead of our algorithm is O (𝑛2 ∗𝑠𝑖𝑧𝑒𝑜 𝑓 (𝑛𝑜𝑑𝑒) ), where 𝑛 is the total number of processes
in the system. In terms of remote memory references (RMRs), our algorithm is RMR-optimal, i.e, it has a constant RMR overhead per
passage. Our RMR and space complexities are applicable to both CC and DSM memory models.

1 INTRODUCTION
Mutual exclusion (ME) is a commonly used technique to handle conflicts in concurrent systems. The problem of mutual
exclusion was first defined by Dijkstra [8] more than half a century ago. Mutual exclusion algorithms, commonly
known as locks, are used by processes to execute a part of code, called critical section (CS) in isolation without
any interference from other processes. The CS typically consists of code that involves access to shared resources,
which when accessed concurrently could potentially cause undesirable race conditions. The mutual exclusion problem
involves designing algorithms to ensure processes enter the CS one at a time.
Generally, algorithms for mutual exclusion are designed with the assumption that failures do not occur, especially
while a process is accessing a lock or a shared resource. However, such failures can occur in the real world. A power
outage or network failure might create an unrecoverable situation causing processes to be unable to continue. If such
failures occur, traditional mutual exclusion algorithms, which are not designed to operate properly under failures, may
deadlock or otherwise fail to guarantee important safety and liveness properties. In many cases, such failures may
have disastrous consequences. This gave rise to the problem of recoverable mutual exclusion (RME). The RME problem
involves designing an algorithm that ensures mutual exclusion under the assumption that process failures may occur
at any point during their execution, but the system is able to recover from such failures and proceed without any
adverse consequences.
Traditionally, concurrent algorithms use checkpointing and logging to tolerate failures by regularly saving relevant
portion of application state to a persistent storage such as hard disk drive (HDD). Accessing a disk is orders of
magnitude slower than accessing main memory. As a result, checkpointing and logging algorithms are often designed
to minimize disk accesses. Non-volatile random-access memory (NVRAM) is a new class of memory technologies that
combines the low latency and high bandwidth of traditional random access memory with the density, non-volatility,
and economic characteristic of traditional storage media (e.g., HDD). Existing checkpointing and logging algorithms
can be modified to use NVRAMs instead of disks to yield better performance, but, in doing so,we would not be
Authors’ addresses: Sahil Dhoked, The University of Texas at Dallas, TX, 75080, USA, sahil.dhoked@utdallas.edu; Neeraj Mittal, The University of Texas
at Dallas, TX, 75080, USA, neerajm@utdallas.edu.
1
2 Sahil Dhoked and Neeraj Mittal

leveraging the true power of NVRAMs [13, 21]. NVRAMs can be used to directly store implementation specific variables
and, as such, have the potential for providing near-instantaneous recovery from failures.
By directly storing implementation variables on NVRAMs, most of the application data can be easily recovered
after failures. However, recovery of implementation variables alone is not enough. Processor state information such
as contents of program counter, CPU registers and execution stack cannot be recovered completely and need to be
handled separately. Due to this reason, there is a renewed interest in developing fast and dependable algorithms for
solving many important computing problems in software systems vulnerable to process failures using NVRAMs. Using
innovative methods, with NVRAMs in mind, we aim to design efficient and robust fault-tolerant algorithms for solving
mutual exclusion and other important concurrent problems.
The RME problem in the current form was formally defined a few years ago by Golab and Ramaraju in [12]. Several
algorithms have been proposed to solve this problem [7, 10, 13, 16, 17]. However, in order to ensure that the problem
of RME is also of practical interest, and not just a theoretical one, memory reclamation poses as a major obstacle in
several RME algorithms. Often, RME algorithms allocate memory dynamically which increases the memory footprint
of the algorithm over time. These algorithms are typically not equipped with suitable garbage collection to avoid errors
that may arise from concurrency and potential failures.
Memory reclamation, in single process systems without failures, follows a straightforward pattern. The process
allocates “nodes” dynamically, consumes it, and frees it once it has no more need of this node. Freed nodes may later
be reused (as part of a different allocation) or returned to the operating system. However, due to some programmer
error, if a node that is freed is later accessed by the process in the context of the previous allocation, it may cause
some serious damage to the program and the operating system as well. In the context of multi-process systems, when
a process frees a node, we may face the same issue without any programmer error. Even if the process that frees the
node is able to guarantee that it will not access that node again, there may exist another process that is just about to
access or dereference the node in the context of the old allocation.
In order to avoid the aforementioned error, freeing a node is broken down into two tasks. First, a process retires the
node, after which, any process that did not have access to the node may no longer be able to get access to the node.
Second, the node needs to be reclaimed once it is deemed to be “safe”, i.e., no process can obtain any further access
to the node in the context of the previous allocation. A memory reclamation service is responsible to provide a safe
reclamation of a node once it is retired. On the other hand, the responsibility of retiring the node is typically on the
programmer that needs to consume the memory reclamation service.
Prior works on memory reclamation [3, 5, 9, 18, 20, 23] provide safe memory reclamation in the absence of failures,
but are not trivially suited to account for failures and subsequent recovery using persistent memory. Moreover, most
works focus on providing memory reclamation in the context of lock-free data structures.
In this work, we present the first “general” recoverable algorithm (that we know of) for memory reclamation in
the context of recoverable mutual exclusion. Our algorithm is general enough that it can be plugged into any RME
algorithm very easily, while preserving all correctness properties and most desirable properties of the algorithm. On
the other hand, it is specific enough to take advantage of assumptions made by RME algorithms. In particular, our
algorithm may be blocking, but it is suitable in the context of the RME due to the very blocking nature of the RME
problem.
Our approach derives from prior works of EBR [9] (epoch based reclamation) and QSBR [18] (quiescent state based
reclamation). However, unlike EBR and QSBR, where the memory consumption may grow unboundedly due to a
slow process, our algorithm guarantees a bounded memory consumption. The space overhead of our algorithm is
Memory Reclamation for Recoverable Mutual Exclusion 3

O (𝑛 2 ∗ 𝑠𝑖𝑧𝑒𝑜 𝑓 (𝑛𝑜𝑑𝑒) ), where 𝑛 is the total number of processes in the system, and a “node” is a collection of all the
resources used in one passage of the CS.
One of the most important measures of performance of an RME algorithm is the maximum number of remote
memory references (RMRs) made by a process per critical section request in order to acquire and release the lock as
well as recover the lock after a failure. Whether or not a memory reference incurs an RMR depends on the underlying
memory model. The two most common memory models used to analyze the performance of an RME algorithm are
cache-coherent (CC) and distributed shared memory (DSM) models. In terms of remote memory references (RMRs), our
algorithm is RMR-optimal, i.e, it has a constant RMR overhead per passage for both CC and DSM memory models.
Moreover, this algorithm uses only read, write and comparison based primitives,
The main idea behind our approach is (1) maintain two pools of “nodes”, clean (reclaimed) and dirty (retired) (2) wait
for dirty nodes to become clean, while consuming the clean pool (3) switch dirty and clean pools. Our algorithm
operates in tandem with any RME algorithm via two methods/APIs that can be invoked by the programmer to allocate
new nodes and retire old nodes.

Roadmap: The rest of the text is organized as follows. We describe our system model and formally define the RME
and the memory reclamation problem in section 2. We define a new object, called the Broadcast object and its properties
in section 3. We also present an RMR-optimal solution to the Broadcast object for both the CC and DSM model in
section 3. In section 4, we present an algorithm that provides memory reclamation for RME algorithms. This algorithm
is RMR-optimal, but not lock-free. In section 5, we describe how our memory reclamation algorithm can be equipped
to existing RME algorithms. A detailed description of the related work is given in section 6. Finally, in section 7, we
present our conclusions and outline directions for future research.

2 SYSTEM MODEL AND PROBLEM FORMULATION


We assume that RME algorithms follow the same model and formulation as used by Golab and Ramaraju [13].

2.1 System model


We consider an asynchronous system of 𝑛 processes (𝑝 1, 𝑝 2, . . . , 𝑝𝑛 ). Processes can only communicate by performing
read, write and read-modify-write (RMW) instructions on shared variables. Besides shared memory, each process also
has its private local memory that stores variables only accessible to that process (e.g., program counter, CPU registers,
execution stack, etc.). Processes are not assumed to be reliable and may fail.
A system execution is modeled as a sequence of process steps. In each step, some process either performs some local
computation affecting only its private variables or executes one of the available instructions (read, write or RMW) on
a shared variable or fails. Processes may run at arbitrary speeds and their steps may interleave arbitrarily. In any
execution, between two successive steps of a process, other processes can perform an unbounded but finite number of
steps.
To access the critical section, processes synchronize using a recoverable lock that provides mutual exclusion (ME)
despite failures.
4 Sahil Dhoked and Neeraj Mittal

Algorithm 1: Process execution model


1 while true do
2 Non-Critical Section (NCS)
3 Recover
4 Enter
5 Critical Section (CS)
6 Exit
7 end while

2.2 Failure model


We assume the crash-recover failure model. A process may fail at any time during its execution by crashing. A crashed
process recovers eventually and restarts its execution from the beginning. A crashed process does not perform any
steps until it has restarted. A process may fail multiple times, and multiple processes may fail concurrently.
On crashing, a process loses the contents of all volatile private variables, including but not limited to the contents of
its program counter, CPU registers and execution stack. However, the contents of the shared variables and non-volatile
private variables remain unaffected and are assumed to persist despite any number of failures. When a crashed process
restarts, all its volatile private variables are reset to their initial values.
Processes that have crashed are difficult to distinguish from processes that are running arbitrarily slow. However,
we assume that every process is live in the sense that a process that has not crashed eventually executes its next step
and a process that has crashed eventually recovers. In this work, we consider a failure to be associated with a single
process.

2.3 Process execution model


The process execution for RME algorithms is modeled using two types of computations, namely non-critical section
and critical section. A critical section refers to the part of the application program in which a process needs to access
shared resources in isolation. A non-critical section refers to the remainder of the application program.
The execution model of a process with respect to a lock is depicted in Algorithm 1. As shown, a process repeatedly
executes the following five segments in order: NCS, Recover, Enter, CS and Exit. The first segment, referred to as
NCS, models the steps executed by a process in which it only accesses variables outside the lock. The second segment,
referred to as Recover, models the steps executed by a process to perform any cleanup required due to past failures
and restore the internal structure of the lock to a consistent state. The third segment, referred to as Enter, models the
steps executed by a process to acquire the lock so that it can execute its critical section in isolation. The fourth segment,
referred to as CS, models the steps executed by a process in the critical section where it accesses shared resources in
isolation. Finally, the fifth segment, referred to as Exit, models the steps executed by a process to release the lock it
acquired earlier in Enter segment.
It is assumed that in the NCS segment, a process does not access any part of the lock or execute any computation
that could potentially cause a race condition. Moreover, in Recover, Enter and Exit segments, processes access shared
variables pertaining to the lock (and the lock only). A process may crash at any point during its execution, including
while executing NCS, Recover, Enter, CS or Exit segment.
Memory Reclamation for Recoverable Mutual Exclusion 5

Definition 2.1 (passage). A passage of a process is defined as the sequence of steps executed by the process from
when it begins executing Recover segment to either when it finishes executing the corresponding Exit segment or
experiences a failure, whichever occurs first.

Definition 2.2 (super-passage). A super-passage of a process is a maximal non-empty sequence of consecutive passages
executed by the process, where only the last passage of the process in the sequence is failure-free.

2.4 RME problem definition


A history is a collection of steps taken by processes. A process 𝑝 is said to be active in a history 𝐻 if 𝐻 contains
at least one step by 𝑝. We assume that every critical section is finite. A history 𝐻 is said to be fair if (a) it is finite,
or (b) if it is infinite and every active process in 𝐻 either executes infinitely many steps or stops taking steps after a
failure-free passage. Designing a recoverable mutual exclusion (RME) algorithm involves designing Recover, Enter
and Exit segments such that the following correctness properties are satisfied.
Mutual Exclusion (ME) For any finite history 𝐻 , at most one process is in its CS at the end of 𝐻 .
Starvation Freedom (SF) Let 𝐻 be an infinite fair history in which every process crashes only a finite number of
times in each super passage. Then, if a process 𝑝 leaves the NCS segment in some step of 𝐻 , then 𝑝 eventually
enters its CS segment.
Bounded Critical Section Reentry (BCSR) For any history 𝐻 , if a process 𝑝 crashes inside its CS segment, then,
until 𝑝 has reentered its CS segment at least once, any subsequent execution of Enter segment by 𝑝 either
completes within a bounded number of 𝑝’s own steps or ends with 𝑝 crashing.
Note that mutual exclusion is a safety property, and starvation freedom is a liveness property. The bounded critical
section reentry is a safety as well as a liveness property. If a process fails inside its CS, then a shared object or resource
(e.g., a shared data structure) may be left in an inconsistent state. The bounded critical section reentry property allows
such a process to “fix” the shared resource before any other process can enter its CS (e.g., [10, 13, 16]). This property
assumes that the CS is idempotent; i.e, the CS is designed so that, in a super passage, multiple executions of the CS is
equivalent to one execution of the CS.
Our correctness properties are the same as those used in [10, 13, 16]. We have stated them here for the sake of
completeness. In addition to the correctness properties, it is also desirable for an RME algorithm to satisfy the following
additional properties.
Bounded Exit (BE) For any infinite history 𝐻 , any execution of the Exit segment by any process 𝑝 either completes
in a bounded number of 𝑝’s own steps or ends with 𝑝 crashing.
Bounded Recovery (BR) For any infinite history 𝐻 , any execution of Recover segment by process 𝑝 either completes
in a bounded number of 𝑝’s own steps or ends with 𝑝 crashing.

2.5 Memory Reclamation problem definition


We only consider those RME algorithms that need to allocate nodes dynamically on the heap. We assume that the
underlying RME algorithm needs to use a new node per request. A node is a collection of resources required by the
underlying RME algorithm per request.
The general memory reclamation problem involves designing two methods, (1) new_node( ), and (2) retire(node).
These methods are used to allocate and deallocate nodes dynamically. The retire(node) method assumes a node is
retired only when there are no more references to it in shared memory, and no new shared references will be created.
6 Sahil Dhoked and Neeraj Mittal

Free

Allocated

Retired Reclaimed

Fig. 1. The lifecycle of a node

The responsibility of a memory reclamation service is to provide safe reclamation (defined later) of a node once it is
retired. On the other hand, the responsibility of retiring the node is typically on the programmer that needs to consume
the memory reclamation service.
In our work, we assume that nodes are reused (instead of freed), once they are reclaimed. As a result, the lifecycle of
a node follows four (logical) stages: (1) Free (2) Allocated (3) Retired (4) Reclaimed. The lifecycle of a node follows
a pattern as shown in Figure 1. Initially, a node is assumed to be in the Free stage. Once it is assigned by the new_node( )
method, it is in the Allocated stage. After getting retired, it is in the Retired stage, and finally, it is moved to the
Reclaimed stage by the memory reclamation algorithm. Once a node is reclaimed, it can be reused and will move to
the Allocated stage, and so on.
Designing a memory reclamation scheme for recoverable mutual exclusion (RME) algorithms involves designing
the new_node( ) and retire(node) methods such that the following correctness properties are satisfied.

Safe reclamation For any history 𝐻 , if process 𝑝𝑖 accesses a node 𝑥, then either 𝑥 is local to 𝑝𝑖 , or 𝑥 is in Allocated
or Retired stages.

Note that any RME algorithm only requires a single node at any given point in time. Thus, we would want multiple
executions of the new_node( ) method to return the same node until the node is retired. Similarly, we want to allow
the same node to be retired multiple times until a new node is requested.

Idempotent allocation Given any history 𝐻 , process 𝑝𝑖 and a pair of operations, 𝑜𝑝 1 and 𝑜𝑝 2 , of the new_node( )
method invoked by 𝑝𝑖 , if there does not exist an invocation of retire(node) by 𝑝𝑖 between 𝑜𝑝 1 and 𝑜𝑝 2 , then either
both these operations returned the same node in 𝐻 , or at least one of these operations ended with a crash.
Idempotent retirement Given any history 𝐻 , process 𝑝𝑖 and a pair of operations, 𝑜𝑝 1 and 𝑜𝑝 2 , of the retire(node)
method invoked by 𝑝𝑖 , if there does not exist an invocation of new_node( ) by 𝑝𝑖 between 𝑜𝑝 1 and 𝑜𝑝 2 , then
either history 𝐻 ′ = 𝐻 − {𝑜𝑝 1 } or 𝐻 ′′ = 𝐻 − {𝑜𝑝 2 } or both are equivalent to 𝐻 .

In case of failures, it is the responsibility of the underlying algorithm to detect if the failure occurred while executing
any method of the memory reclamation code and if so, re-execute the same method.

2.6 Performance measures


We measure the performance of RME algorithms in terms of the number of remote memory references (RMRs) incurred
by the algorithm during a single passage. Similarly, the performance of a memory reclamation algorithm for RME is
measured in terms of RMR overhead per passage. The definition of a remote memory reference depends on the memory
Memory Reclamation for Recoverable Mutual Exclusion 7

model implemented by the underlying hardware architecture. In particular, we consider the two most popular shared
memory models:

Cache Coherent (CC) The CC model assumes a centralized main memory. Each process has access to the central
shared memory in addition to its local cache memory. The shared variables, when needed, are cached in the
local memory. These variables may be invalidated if updated by another process. Reading from an invalidated
variable causes a cache miss and requires the variable value to be fetched from the main memory. Similarly,
write on shared variables is performed on the main memory. Under this model, a remote memory reference
occurs each time there is a fetch operation from or a write operation to the main memory.
Distributed Shared Memory (DSM) The DSM model has no centralized memory. Shared variables reside on individual
process nodes. These variables may be accessed by processes either via the interconnect or a local memory read,
depending on where the variable resides. Under this model, a remote memory reference occurs when a process
needs to perform any operation on a variable that does not reside in its own node’s memory.

2.7 Synchronization primitives


We assume that, in addition to read and write instructions, the system also supports compare-and-swap (CAS) read-
modify-write (RMW) instruction.
A compare-and-swap instruction takes three arguments: 𝑎𝑑𝑑𝑟𝑒𝑠𝑠, 𝑜𝑙𝑑 and 𝑛𝑒𝑤; it compares the contents of a memory
location (𝑎𝑑𝑑𝑟𝑒𝑠𝑠) to a given value (𝑜𝑙𝑑) and, only if they are the same, modifies the contents of that location to a given
new value (𝑛𝑒𝑤). It returns true if the contents of the location were modified and false otherwise.
This instruction is commonly available in many modern processors such as Intel 64 [15] and AMD64 [1].

3 THE BROADCAST OBJECT


Our memory reclamation technique utilizes a recoverable Broadcast object whose primary function is to allow a
designated process to signal (and free) multiple waiting processes. The Broadcast object is inspired by the SIGNAL
object used by Jayanti, Jayanti and Joshi [16] to solve the RME problem. Unlike the SIGNAL object, that can signal
only one waiting process and perform signalling only once, the Broadcast object allows a designated process to signal
multiple waiting processes and can be reused, even in the presence of failures.
In essence, the Broadcast object is a recoverable MRSW (Multi Reader Single Writer) counter object that supports
three operations Set, Wait and Read.
(1) Set(𝑥) is invoked by a process to set the counter value to 𝑥
(2) Wait(𝑥) is invoked by a process that intends to wait till the counter value is greater than or equal to 𝑥
(3) Read( ) is invoked to read the current value of the counter.
This object assumes the following usage:
(1) Set operation will only be invoked by a designated process 𝑝 𝑤
(2) Wait operation can be invoked by all processes except 𝑝 𝑤
(3) Set operation must only be invoked in an incremental fashion. In other words, if the last successful Set operation
was Set(𝑦), then the next invocation may only be Set(𝑦 + 1) or, to maintain idempotence, Set(𝑦)
(4) Wait operation must not be invoked with a parameter whose value is at most one unit greater than the current
counter value. Formally, if the last successful Set operation was Set(𝑦), and Wait(𝑧) is invoked, then 𝑧 ≤ 𝑦 + 1.
8 Sahil Dhoked and Neeraj Mittal

An implementation of the Broadcast object is trivial in the CC model. Using a shared MRSW atomic integer, processes
can achieve O (1) RMR-complexity for Set, Wait and Read. However, this approach does not work for the DSM model.
In the DSM model, each shared variable resides on a single processor node. Thus, if processes wait by spinning on the
same shared variable, some processes (from remote nodes) will incur an unbounded number of RMRs. Thus, each
process needs to spin on a variable stored in its local processor node. In this case, process 𝑝 𝑤 needs to broadcast
its Set(𝑥) operation to ensure that all processes that are spinning due to an invocation of the Wait(𝑥) operation are
subsequently signalled. This action could potentially incur O (𝑛) RMRs for the Set(𝑥) operation. Thus, a constant-RMR
implementation of the Broadcast object for the DSM model is non-trivial.
We present an efficient implementation of the Broadcast object for the DSM model in algorithm 2. This implementation
incurs O (1) RMRs for Set, Wait and Read and utilizes O (𝑛) space per Broadcast object. The main idea in our
implementation of the Broadcast object is a wakeup chain, created by the designated process 𝑝 𝑤 , such that each process
in the wakeup chain wakes up the next process in the chain. To trigger the wakeup in the wakeup chain, process 𝑝 𝑤
only needs to wake up the first process in the wakeup chain.

3.1 Variables used


The variable 𝑐𝑜𝑢𝑛𝑡 is used to store the counter value. The variable 𝑖𝑛𝑡𝑒𝑟𝑖𝑚_𝑐𝑜𝑢𝑛𝑡 is used by Set(𝑥) to temporarily store
the new counter value until the operation terminates. Variables 𝑡𝑎𝑟𝑔𝑒𝑡, 𝑎𝑛𝑛𝑜𝑢𝑛𝑐𝑒 and 𝑤𝑎𝑘𝑒𝑢𝑝 are arrays of integers
with one entry for each process. The 𝑖-th entry of 𝑡𝑎𝑟𝑔𝑒𝑡 is used by process 𝑝𝑖 to spin on until the counter value of the
Broadcast object reaches 𝑡𝑎𝑟𝑔𝑒𝑡 [𝑖]. The 𝑖-th entry of 𝑎𝑛𝑛𝑜𝑢𝑛𝑐𝑒 is used by process 𝑝𝑖 to indicate process 𝑝 𝑤 about its
intention to wait for the counter value of the Broadcast object to be set to 𝑎𝑛𝑛𝑜𝑢𝑛𝑐𝑒 [𝑖]. The 𝑤𝑎𝑘𝑒𝑢𝑝 array is used to
create a wakeup chain. The 𝑖-th entry of 𝑤𝑎𝑘𝑒𝑢𝑝 is used by process 𝑝𝑖 to determine the next process in the wakeup
chain. The 𝑤𝑎𝑘𝑒𝑢𝑝 and 𝑎𝑛𝑛𝑜𝑢𝑛𝑐𝑒 arrays are local to process 𝑝 𝑤 .

3.2 Algorithm description


In Wait(𝑥), a process 𝑝𝑖 first sets 𝑡𝑎𝑟𝑔𝑒𝑡 [𝑖] and then 𝑎𝑛𝑛𝑜𝑢𝑛𝑐𝑒 [𝑖] to 𝑥, to announce its intention to wait for the counter
value to reach 𝑥. It then checks if Set has been invoked for this particular value of 𝑥, by checking the variable
𝑖𝑛𝑡𝑒𝑟𝑖𝑚_𝑐𝑜𝑢𝑛𝑡. If yes, it resets 𝑡𝑎𝑟𝑔𝑒𝑡 [𝑖] to 0. It then spins, if required, till 𝑡𝑎𝑟𝑔𝑒𝑡 [𝑖] is set to 0 by some process in
the wakeup chain. Process 𝑝𝑖 then clears its announcement and determines the next process 𝑝𝑘 in the wakeup chain,
where 𝑤𝑎𝑘𝑒𝑢𝑝 [𝑖] = 𝑘 ≠ 0. It wakes up 𝑝𝑘 by updating the value of 𝑡𝑎𝑟𝑔𝑒𝑡 [𝑘] from 𝑥 to 0 using a CAS instruction.
Note that there could be multiple wakeup chains in the algorithm for different 𝑡𝑎𝑟𝑔𝑒𝑡 values. However, the algorithm
maintains the invariant that all processes in a particular wakeup chain have the same 𝑡𝑎𝑟𝑔𝑒𝑡 value.
In Set(𝑥), process 𝑝 𝑤 first sets the 𝑖𝑛𝑡𝑒𝑟𝑖𝑚_𝑐𝑜𝑢𝑛𝑡 to 𝑥 such that any process that invokes Wait from this point
does not get blocked. Then, it creates the wakeup chain of processes in a reverse order by keeping track of the last
process in the wakeup chain and double checking the announce array to ensure the process is indeed waiting. Lastly,
𝑝 𝑤 wakes up the first process in the wakeup chain (indicated by variable 𝑙𝑎𝑠𝑡) and subsequently all waiting processes
will be woken up.
The only use of the Read( ) operation is to keep track of the last successful Setoperation. All Set(𝑥), Wait(𝑥) and
Read( ) operations are idempotent methods and would execute perfectly even if run multiple times as long as they are
run to completion once.
Memory Reclamation for Recoverable Mutual Exclusion 9

Algorithm 2: Pseudocode for Broadcast object for process 𝑝𝑖 for the DSM model

1 shared non-volatile variables 23 Designated process


/* Keep track of counter value */ /* Host process for Broadcast object */
2 𝑐𝑜𝑢𝑛𝑡 : atomic integer; 24 𝑝 𝑤 : writer process;
/* Internal counter to synchronize Set and Wait */
3 𝑖𝑛𝑡𝑒𝑟𝑖𝑚_𝑐𝑜𝑢𝑛𝑡 : atomic integer; 25 Initialization
/* counts are initially zero */
/* value to spin on; the 𝑖 -th entry is local to process 𝑝𝑖 */
26 𝑐𝑜𝑢𝑛𝑡 ← 0;
4 𝑡𝑎𝑟𝑔𝑒𝑡 [1. . .𝑛]: array [1. . .𝑛] of integer;
/* announcement of target value to 𝑝 𝑤 ; all entries are local to
27 𝑖𝑛𝑡𝑒𝑟𝑖𝑚_𝑐𝑜𝑢𝑛𝑡 ← 0;
process 𝑝 𝑤 */ 28 foreach 𝑗 ∈ {1, 2, . . . , 𝑛 } do
/* processes are not waiting initially */
5 𝑎𝑛𝑛𝑜𝑢𝑛𝑐𝑒 [1. . .𝑛]: array [1. . .𝑛] of integer;
29 𝑡𝑎𝑟𝑔𝑒𝑡 [ 𝑗 ] ← 0;
/* id of next process in wakeup chain; all entries are local to
process 𝑝 𝑤 */
30 𝑎𝑛𝑛𝑜𝑢𝑛𝑐𝑒 [ 𝑗 ] ← 0;
6 𝑤𝑎𝑘𝑒𝑢𝑝 [1. . .𝑛]: array [1. . .𝑛] of integer; 31 𝑤𝑎𝑘𝑒𝑢𝑝 [ 𝑗 ] ← 0;
32 end foreach

7 Function Wait(x) 33 Function Set(x)


/* Wait till counter value reaches 𝑥 */ /* Sets the counter value to 𝑥 */
/* Process 𝑝 𝑤 should never invoke Wait(x) */ /* Set(x) may only be invoked by process 𝑝 𝑤 */
8 𝑡𝑎𝑟𝑔𝑒𝑡 [𝑖 ] ← 𝑥; 34 𝑙𝑎𝑠𝑡 ← 0;
9 𝑎𝑛𝑛𝑜𝑢𝑛𝑐𝑒 [𝑖 ] ← 𝑥; 35 𝑗 ← 1;
/* No need to wait if 𝑝 𝑤 intends to set counter to 𝑥 */ /* Inform all process about an incoming set operation */
10 if 𝑖𝑛𝑡𝑒𝑟𝑖𝑚_𝑐𝑜𝑢𝑛𝑡 ≥ 𝑥 then 36 𝑖𝑛𝑡𝑒𝑟𝑖𝑚_𝑐𝑜𝑢𝑛𝑡 ← 𝑥;
11 𝑡𝑎𝑟𝑔𝑒𝑡 [𝑖 ] ← 0; 37 while j < n do
12 end if 38 if 𝑎𝑛𝑛𝑜𝑢𝑛𝑐𝑒 [ 𝑗 ] = 𝑥 then
/* Spin till some process resets the target value */ /* Assign 𝑗 to wakeup the last waiting process */
13 await 𝑡𝑎𝑟𝑔𝑒𝑡 [𝑖 ] > 0; 39 𝑤𝑎𝑘𝑒𝑢𝑝 [ 𝑗 ] ← 𝑙𝑎𝑠𝑡;
14 𝑎𝑛𝑛𝑜𝑢𝑛𝑐𝑒 [𝑖 ] ← 0; 40 if 𝑎𝑛𝑛𝑜𝑢𝑛𝑐𝑒 [ 𝑗 ] = 𝑥 then
/* 𝑗 is the last process if it is still waiting */
15 𝑘 ← 𝑤𝑎𝑘𝑒𝑢𝑝 [𝑖 ];
41 𝑙𝑎𝑠𝑡 ← 𝑗 ;
16 if 𝑘 > 0 then
/* Wake up next process in wakeup chain */ 42 end if
17 CAS(𝑡𝑎𝑟𝑔𝑒𝑡 [𝑘 ], 𝑥, 0) 43 end if
18 end if 44 𝑗 ← 𝑗 + 1;
19 end 45 end while
46 if 𝑙𝑎𝑠𝑡 > 0 then
20 Function Read( ) /* Release the last process and all waiting processes will
21 return 𝑐𝑜𝑢𝑛𝑡 be automatically released using the wakeup chain */
22 end 47 CAS(𝑡𝑎𝑟𝑔𝑒𝑡 [𝑙𝑎𝑠𝑡 ], 𝑥, 0);
48 end if
49 𝑐𝑜𝑢𝑛𝑡 ← 𝑥
50 end

4 THE MEMORY RECLAMATION ALGORITHM


Our idea relies on the notion of a grace period and quiescent states [3, 14, 18].

Definition 4.1 (Grace period). A grace period is a time interval [𝑎, 𝑏] such that all nodes retired before time 𝑎 are safe
to be reclaimed after time 𝑏.

Definition 4.2 (Quiescent state). A process is said to be in a quiescent state at a certain point in time if it cannot
access any node from another process using only its local variables.
10 Sahil Dhoked and Neeraj Mittal

Note that quiescent states are defined within the context of an algorithm. Different algorithms may encompass
different quiescent states. In the context of quiescent states, a grace period is a time interval that overlaps with at least
one quiescent state of each process. In order to reuse a node, a process, say 𝑝𝑖 , must first retire its node and then wait
for at least one complete grace period to safely reclaim the node. After one complete grace period has elapsed, it is safe
to assume that no process would be able to acquire any access to that node.
Main idea: In the case of RME algorithms, we assume that when a process is in the NCS, it is in a quiescent state.
It suffices to say that after 𝑝𝑖 retires its node, if some process 𝑝 𝑗 ( 𝑗 ≠ 𝑖) is in the NCS segment, then 𝑝 𝑗 would be
unable to access that node thereafter. In order to safely reuse (reclaim) a node, process 𝑝𝑖 determines its grace period
in two phases, the snapshot phase and the waiting phase. In the snapshot phase, 𝑝𝑖 takes a snapshot of the status of all
processes and, in the waiting phase, 𝑝𝑖 waits till each process has been in the NCS segment at least once during or after
its respective snapshot. In order to remove the RMR overhead caused by scanning through each process, 𝑝𝑖 executes
each phase in a step manner.
Our memory reclamation algorithm is provides two methods: 1) new_node( ), and 2) retire_last_node( ). A pseudocode
of the memory reclamation algorithm is presented in algorithm 3. Any RME algorithm that needs to dynamically
allocate memory can utilize our memory reclamation algorithm by invoking these two methods. The new_node( )
method returns a “node” that is required by a process to enter the CS of the RME algorithm. Similarly, while leaving
the CS, the retire_last_node( ) method will retire the node used to enter the CS, Our algorithm assumes (and relies on)
the fact that each process will request a new node each time before entering the CS and retire its node prior to entering
the NCS segment. THE RMR overhead of our algorithm is O (1), while the space overhead is O (𝑛 2 ∗ 𝑠𝑖𝑧𝑒𝑜 𝑓 (𝑛𝑜𝑑𝑒)).

4.1 Variables used


There are two types of non-volatile variables used in this algorithm, shared and local. Variables 𝑠𝑡𝑎𝑟𝑡 and 𝑓 𝑖𝑛𝑖𝑠ℎ are
shared non-volatile arrays of integers and Broadcast objects respectively, with one entry per process. The 𝑖-th entry of
𝑠𝑡𝑎𝑟𝑡 is used by process 𝑝𝑖 to indicate the number of new nodes requested. The 𝑖-th entry of 𝑓 𝑖𝑛𝑖𝑠ℎ is used by process
𝑝𝑖 to indicate the number of nodes retired. The entries of 𝑓 𝑖𝑛𝑖𝑠ℎ are Broadcast objects for other processes to spin on
until 𝑓 𝑖𝑛𝑖𝑠ℎ[𝑖] exceeds a particular value.
In addition, we use five local non-volatile variables. The variable 𝑠𝑛𝑎𝑝𝑠ℎ𝑜𝑡 is an array of integers to take a snapshot of
𝑠𝑡𝑎𝑟𝑡 array. Variable 𝑝𝑜𝑜𝑙 is a collection of 2∗(2𝑛+2) nodes that would be employed by the underlying mutual exclusion
algorithm to enter into CS. Variable 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝑝𝑜𝑜𝑙 is used to keep track of the active nodes in 𝑝𝑜𝑜𝑙 and 𝑏𝑎𝑐𝑘𝑢𝑝𝑝𝑜𝑜𝑙 is used
to account for failures while switching the value of 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝑝𝑜𝑜𝑙. Variable 𝑖𝑛𝑑𝑒𝑥 is used to keep track of the number of
times the method 𝑆𝑡𝑒𝑝 ( ) has been invoked since the last time 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝑝𝑜𝑜𝑙 was switched.

4.2 Algorithm Description


Each process maintains two pools locally, reserve and active, each of 2𝑛 + 2 nodes (𝑝𝑜𝑜𝑙 [0, 1] [1, . . . , 2𝑛 + 2]). The
reserve pool contains nodes that have previously been retired and are in the process of reclamation. The active pool
contains a mix of reclaimed nodes that are ready for reuse, and retired nodes that were consumed from the active pool
while trying to reclaim the nodes from the reserve pool. The retired and safe nodes in the active pool are separated by
the local variable 𝑖𝑛𝑑𝑒𝑥.
The 𝑠𝑡𝑎𝑟𝑡 and 𝑓 𝑖𝑛𝑖𝑠ℎ counters function in sync and differ by at most one. If 𝑠𝑡𝑎𝑟𝑡 [𝑖] − 𝑓 𝑖𝑛𝑖𝑠ℎ[𝑖] = 1 for some 𝑖,
it implies that process 𝑝𝑖 has left the NCS. On the other hand, if 𝑠𝑡𝑎𝑟𝑡 [𝑖] − 𝑓 𝑖𝑛𝑖𝑠ℎ[𝑖] = 0, it implies that process 𝑝𝑖
is in the NCS and in a quiescent state. In order to enter the CS, a 𝑝𝑖 first requests for a new node by invoking the
Memory Reclamation for Recoverable Mutual Exclusion 11

Algorithm 3: Pseudocode for Memory reclamation for process 𝑝𝑖

1 shared non-volatile variables 26 Function retire_last_node( )


/* Counter of CS attempts; 𝑖 -th entry is local to process 𝑝𝑖 */ 27 if 𝑠𝑡𝑎𝑟𝑡 [𝑖 ] ≠ 𝑓 𝑖𝑛𝑖𝑠ℎ [𝑖 ].Read ( ) then
2 𝑠𝑡𝑎𝑟𝑡 [1 . . . 𝑛]: array of integer 28 𝑓 𝑖𝑛𝑖𝑠ℎ [𝑖 ].Set (𝑠𝑡𝑎𝑟𝑡 [𝑖 ])
/* Broadcast object to wait for CS completion; 𝑝𝑖 is the writer 29 end if
for the 𝑖 -th entry */ 30 end
3 𝑓 𝑖𝑛𝑖𝑠ℎ [1 . . . 𝑛]: array of Broadcast object
31 Function Step( )
4 local non-volatile variables /* 𝑖𝑛𝑑𝑒𝑥 will progress in each execution */
/* Array to store last observed value of start of other process */ /* Blocking */
5 𝑠𝑛𝑎𝑝𝑠ℎ𝑜𝑡 [1 . . . 𝑛] : array of integers 32 if 𝑖𝑛𝑑𝑒𝑥 ≤ 𝑛 then
/* Pool of nodes for memory management */ /* Take snapshot */

6 𝑝𝑜𝑜𝑙 [0, 1] [1 . . . 2𝑛 + 2]: two pools of 2𝑛 + 2 nodes 33 𝑠𝑛𝑎𝑝𝑠ℎ𝑜𝑡 [𝑖𝑛𝑑𝑒𝑥 ] = 𝑠𝑡𝑎𝑟𝑡 [𝑖 ]


/* Index of current pool */ 34 𝑖𝑛𝑑𝑒𝑥 + +
7 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝑝𝑜𝑜𝑙: integer 35 else if 𝑖𝑛𝑑𝑒𝑥 > 𝑛 and 𝑖𝑛𝑑𝑒𝑥 <= 2𝑛 then
/* Index of backup pool */ /* Wait for others to finish doorway */

8 𝑏𝑎𝑐𝑘𝑢𝑝𝑝𝑜𝑜𝑙: integer 36 if 𝑖𝑛𝑑𝑒𝑥 − 𝑛 ≠ 𝑖 then


/* No need to wait for self */
/* Counter to track steps taken since last pool switch */
37 𝑓 𝑖𝑛𝑖𝑠ℎ [𝑖𝑛𝑑𝑒𝑥 − 𝑛].Wait(𝑠𝑛𝑎𝑝𝑠ℎ𝑜𝑡 [𝑖𝑛𝑑𝑒𝑥 − 𝑛])
9 𝑖𝑛𝑑𝑒𝑥: integer
38 end if
10 initialization 39 𝑖𝑛𝑑𝑒𝑥 + +
11 foreach 𝑗 ∈ {1, 2, . . . , 𝑛 } do 40 else if 𝑖𝑛𝑑𝑒𝑥 = 2𝑛 + 1 then
12 𝑠𝑡𝑎𝑟𝑡 [ 𝑗 ] ← 0 /* Backup pool is now reliable */
13 end foreach 41 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝑝𝑜𝑜𝑙 ← 𝑏𝑎𝑐𝑘𝑢𝑝𝑝𝑜𝑜𝑙
14 foreach 𝑝 ∈ {𝑝 1, 𝑝 2, . . . , 𝑝𝑛 } do 42 𝑖𝑛𝑑𝑒𝑥 + +
15 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝑝𝑜𝑜𝑙 ← 0 43 else
16 𝑏𝑎𝑐𝑘𝑢𝑝𝑝𝑜𝑜𝑙 ← 1 /* Reset backuppool */

17 𝑖𝑛𝑑𝑒𝑥 ← 1 44 𝑏𝑎𝑐𝑘𝑢𝑝𝑝𝑜𝑜𝑙 ← 1 − 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝑝𝑜𝑜𝑙


/* Reset index */
18 end foreach
45 𝑖𝑛𝑑𝑒𝑥 ← 1
19 Function new_node( ) 46 end if
20 if 𝑠𝑡𝑎𝑟𝑡 [𝑖 ] = 𝑓 𝑖𝑛𝑖𝑠ℎ [𝑖 ] then
47 end
21 Step( )
22 𝑠𝑡𝑎𝑟𝑡 [𝑖 ] + +
23 end if
/* Return the node in 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝑝𝑜𝑜𝑙 pointed by 𝑖𝑛𝑑𝑒𝑥 */
24 return 𝑝𝑜𝑜𝑙 [𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝑝𝑜𝑜𝑙 ] [𝑖𝑛𝑑𝑒𝑥 ]
25 end

new_node( ) method (line 19). This indicates that the process has left the NCS and thus increments the 𝑠𝑡𝑎𝑟𝑡 [𝑖] counter
(line 22). Similarly, once a process needs to retire a node, it invokes the retire_last_node method (line 26) wherein it
updates the 𝑓 𝑖𝑛𝑖𝑠ℎ counter (line 28). The 𝑠𝑡𝑎𝑟𝑡 and 𝑓 𝑖𝑛𝑖𝑠ℎ counters are guarded by if-blocks (line 20, line 27) to warrant
idempotence in case of multiple failures.
A process can consume nodes from the active pool only after taking steps towards reclaiming nodes from the reserve
pool (line 21). The memory reclamation steps are implemented in the 𝑆𝑡𝑒𝑝 ( ) method. The role of the 𝑆𝑡𝑒𝑝 ( ) method
is two-fold. Firstly, it advances the local variable 𝑖𝑛𝑑𝑒𝑥 during each successful execution in order to guarantee a fresh
node on every invocation of the new_node( ) method. Second, the 𝑆𝑡𝑒𝑝 ( ) method performs memory reclamation in
three phases.
12 Sahil Dhoked and Neeraj Mittal

(1) Snapshot (line 33): 𝑝𝑖 takes a snapshot of 𝑠𝑡𝑎𝑟𝑡 [ 𝑗] for all 𝑗 ∈ {1, . . . , 𝑛}
(2) Waiting (line 37): 𝑝𝑖 waits for 𝑓 𝑖𝑛𝑖𝑠ℎ[ 𝑗] to “catch up” to 𝑠𝑡𝑎𝑟𝑡 [ 𝑗] using a Broadcast object as described in section 3.
Simply put, 𝑝𝑖 waits for very old unsatisfied requests of other processes to be satisfied. In this context, a request
is very old if 𝑝𝑖 overtook it 𝑛 times. This ensures that each process has been in a quiescent state before 𝑝𝑖 goes
to the pool swapping phase
(3) Pool swap (line 41 and line 44): If process 𝑝𝑖 reaches this phase, it implies that at least one grace period has
elapsed since the nodes in the reserve pool were retired. At this point it is safe to reuse nodes from the reserve
pool and 𝑝𝑖 simply swaps the active and reserve pool. In order to account for failures, this swap occurs over two
invocations of the 𝑆𝑡𝑒𝑝 ( ) method and the 𝑖𝑛𝑑𝑒𝑥 variable is then reset (line 45).

Note that the algorithm is designed in such a way that multiple executions of the new_node( ) method will return
the same node until the retire_last_node method is called and vice versa. This design aids to introduce idempotence in
and accommodates the failure scenario where a process crashes before being able to capture the node returned by the
new_node( ) method.

5 APPLICATIONS
Golab and Ramaraju’s algorithms [12] have a bounded space complexity, but use the MCS algorithm as their base lock.
The space complexity of the MCS algorithm may grow unboundedly. Using our memory reclamation algorithm, we
can bound the space complexity of their algorithms.
Two known sub-logarithmic RME algorithms, from Golab and Hendler [10], and, from Jayanti, Jayanti and Joshi
[16], both use MCS queue-based structures. Memory reclamation in these algorithms is not trivial and requires careful
analysis and proofs. Our memory reclamation algorithm fits perfectly with these algorithms. The main idea is to employ
one instance of the memory reclamation algorithm at each level of the sub-logarithmic arbitration tree. As a result, the
overall space complexity of these algorithms can be bounded by O (𝑛 3).
Dhoked and Mittal’s algorithm [7], also uses a MCS-queue based structure where the space complexity may grow
unboundedly. Using a separate instance of our memory reclamation algorithm for each level of their adaptive algorithm,
we can bound the space complexity of their algorithm to O (𝑛 2 ∗ log 𝑛/log log 𝑛)

6 RELATED WORK
6.1 Memory reclamation
In [20], Michael used hazard pointers, a wait-free technique for memory reclamation that only requires a bounded
amount of space. Hazard pointers are special shared pointers that protect nodes from getting reclaimed. Such nodes
can be safely accessed. Any node that is not protected by a hazard pointer is assumed to be safe to reclaim. Being
shared pointers, hazard pointers are expensive to read and update.
In [9], Fraser devised a technique called epoch based reclamation (EBR). As the name suggests, the algorithm
maintains an epoch counter 𝑒 and three limbo lists corresponding to epochs 𝑒 − 1, 𝑒 and 𝑒 + 1. The main idea is
that nodes reitred in epoch 𝑒 − 1 are safe to be reclaimed in epoch 𝑒 + 1. This approach is not lock-free and a slow
process may cause the size of the limbo lists to increase unboundedly.
In [18], Mckenney and Slingwine present the RCU framework where they demonstrate the use of quiescent state
based reclamation (QSBR). QSBR relies on detecting quiescent states and a grace period during which each thread
passes through at least one quiescent state. Nodes retired before the grace period are safe to be reclaimed after the
Memory Reclamation for Recoverable Mutual Exclusion 13

grace period. In [3], Arcangeli et. al. make use of the RCU framework and QSBR reclamation for the System V IPC in
the Linux kernel.
In [5], Brown presents DEBRA and DEBRA+ reclamation schemes. DEBRA is a distributed extension of EBR where
each process maintains its individual limbo lists instead of shared limbo lists and epoch computation is performed
incrementally. DEBRA+ relies on hardware assistance from the operating system to provide signalling in order to
prohibit slow or stalled processes to access reclaimed memory.

6.2 Recoverable Mutual Exclusion


Golab and Ramaraju formally defined the RME problem in [12]. They also presented four different RME algorithms—a 2-
process RME algorithm and three 𝑛-process RME algorithms. The first algorithm is based on Yang and Anderson’s lock
[24], and is used as a building block to design an 𝑛-process RME algorithm. Both these RME algorithms use only read,
write and comparison-based primitives. The worst-case RMR complexity of the 2-process algorithm is O (1) whereas
that of the resultant 𝑛-process algorithm is O (log 𝑛). Both RME algorithms have optimal RMR complexity because,
as shown in [2, 4, 24], any mutual exclusion algorithm that uses only read, write and comparison-based primitives
has worst-case RMR complexity of Ω(log 𝑛). The remaining two algorithms are used as transformations which can be
applied to the MCS algorithm. The third algorithm transforms the MCS algorithm to yield a constant RMR complexity
in the absence of failures, but unbounded worst case RMR complexity. The fourth algorithm transforms the MCS
algorithm to achieve bounded RMR complexity in the worst case.
Later, Golab and Hendler [10] proposed an RME algorithm with sub-logarithmic RMR complexity of O ( log 𝑛/log log 𝑛)
under the CC model using MCS queue based lock [19] as a building block. This algorithm was later shown to be
vulnerable to starvation [16]. Ramaraju showed in [22] that it is possible to design an RME algorithm with O (1) RMR
complexity provided the hardware provides a special RMW instruction to swap the contents of two arbitrary locations
in memory atomically. Unfortunately, at present, no hardware supports such an instruction to our knowledge.
In [17], Jayanti and Joshi presented a fair RME algorithm with O (log 𝑛) RMR complexity. Their algorithm satisfies
bounded (wait-free) exit and FCFS (first-come-first-served) properties and only requires a bounded amount of space
consumption. In [16], Jayanti, Jayanti and Joshi proposed an RME algorithm that uses MCS queue-based structure to
achieve sub-logarithmic RMR complexity of O ( log 𝑛/log log 𝑛). To our knowledge, this is the best known RME algorithm
as far as the worst-case RMR complexity is concerned that also satisfies bounded recovery and bounded exit properties.
In [7], Dhoked and Mittal use the MCS queue-based lock to present an adaptive transformation to any RME algorithm
whose RMR complexity is constant in the absence of failures and gradually adapts to the number of failures. The RMR

complexity of their algorithm is given by 𝑚𝑖𝑛{ 𝐹, log 𝑛/log log 𝑛 }. Using a weaker version of starvation freedom, Chan
and Woelfel [6] present a novel solution to the RME problem that incurs a constant number of RMRs in the amortized
case, but its worst case RMR complexity may be unbounded.
In [11], Golab and Hendler proposed an RME algorithm under the assumption of system-wide failure (all processes
fail and restart) with O (1) RMR complexity.

7 CONCLUSION AND FUTURE WORK


In this work, we formalized the problem of memory reclamation for recoverable mutual exclusion algorithms and
present a plug-and-play solution that can be used by existing and new RME algorithms. Our algorithm is RMR-optimal
for both the CC and DSM models. Next steps would be to design a recoverable memory reclamation for RME that
14 Sahil Dhoked and Neeraj Mittal

satisfies some notion of fairness. Another direction of work involves formulating the problem of memory reclamation
for recoverable lock-free data structures and designing algorithms for the same.

REFERENCES
[1] AMD 2019. AMD64 Architecture Programmer’s Manual Volume 3: General Purpose and System Instructions. AMD.
https://www.amd.com/system/files/TechDocs/24594.pdf
[2] J. H. Anderson and Y.-J. Kim. 2002. An Improved Lower Bound for the Time Complexity of Mutual Exclusion. Distributed Computing (DC) 15, 4
(Dec. 2002), 221–253. https://doi.org/10.1007/s00446-002-0084-2
[3] Andrea Arcangeli, Mingming Cao, Paul E McKenney, and Dipankar Sarma. 2003. Using Read-Copy-Update Techniques for System V IPC in the
Linux 2.5 Kernel.. In USENIX Annual Technical Conference, FREENIX Track. 297–309.
[4] H. Attiya, D. Hendler, and P. Woelfel. 2008. Tight RMR Lower Bounds for Mutual Exclusion and Other Problems. In Proceedings of the
40th Annual ACM Symposium on Theory of Computing (STOC) (Victoria, British Columbia, Canada). ACM, New York, NY, USA, 217–226.
https://doi.org/10.1145/1374376.1374410
[5] T. A. Brown. 2015. Reclaiming Memory for Lock-Free Data Structures: There Has to Be a Better Way. In Proceedings of the ACM Symposium on
Principles of Distributed Computing (PODC). ACM, Donostia-San Sebastián, Spain, 261–270.
[6] D. Y. C. Chan and P. Woelfel. 2020. Recoverable Mutual Exclusion with Constant Amortized RMR Complexity from Standard Primitives. In
Proceedings of the 39th ACM Symposium on Principles of Distributed Computing (PODC). ACM, New York, NY, USA, 10 pages.
[7] Sahil Dhoked and Neeraj Mittal. 2020. An Adaptive Approach to Recoverable Mutual Exclusion. In Proceedings of the 39th Symposium
on Principles of Distributed Computing (Virtual Event, Italy) (PODC ’20). Association for Computing Machinery, New York, NY, USA, 1–10.
https://doi.org/10.1145/3382734.3405739
[8] E. W. Dijkstra. 1965. Solution of a Problem in Concurrent Programming Control. Communications of the ACM (CACM) 8, 9 (1965), 569.
[9] K. Fraser. 2004. Practical Lock-Freedom. Ph.D. Dissertation. University of Cambridge.
[10] W. Golab and D. Hendler. 2017. Recoverable Mutual Exclusion in Sub-Logarithmic Time. In Proceedings of the ACM Symposium on Principles of
Distributed Computing (PODC) (Washington, DC, USA). ACM, New York, NY, USA, 211–220. https://doi.org/10.1145/3087801.3087819
[11] W. Golab and D. Hendler. 2018. Recoverable Mutual Exclusion Under System-Wide Failures. In Proceedings of the ACM Symposium on Principles of
Distributed Computing (PODC) (Egham, United Kingdom). ACM, New York, NY, USA, 17–26. https://doi.org/10.1145/3212734.3212755
[12] W. Golab and A. Ramaraju. 2016. Recoverable Mutual Exclusion: [Extended Abstract]. In Proceedings of the ACM Symposium on Principles of
Distributed Computing (PODC) (Chicago, Illinois, USA). ACM, New York, NY, USA, 65–74. https://doi.org/10.1145/2933057.2933087
[13] W. Golab and A. Ramaraju. 2019. Recoverable Mutual Exclusion. Distributed Computing (DC) 32, 6 (Nov. 2019), 535–564.
[14] Thomas E Hart, Paul E McKenney, Angela Demke Brown, and Jonathan Walpole. 2007. Performance of memory reclamation for lockless
synchronization. J. Parallel and Distrib. Comput. 67, 12 (2007), 1270–1285.
[15] Intel 2016. Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A: Instruction Set Reference, A-M. Intel.
https://software.intel.com/sites/default/files/managed/a4/60/325383-sdm-vol-2abcd.pdf
[16] P. Jayanti, S. Jayanti, and A. Joshi. 2019. A Recoverable Mutex Algorithm with Sub-logarithmic RMR on Both CC and DSM. In
Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC) (Toronto ON, Canada). ACM, New York, NY, USA, 177–186.
https://doi.org/10.1145/3293611.3331634
[17] P. Jayanti and A. Joshi. 2017. Recoverable FCFS Mutual Exclusion with Wait-Free Recovery. In Proceedings of the 31st Symposium on Distributed
Computing (DISC) (Vienna, Austria), Andréa W. Richa (Ed.), Vol. 91. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 30:1–
30:15. https://doi.org/10.4230/LIPIcs.DISC.2017.30
[18] Paul E McKenney and John D Slingwine. 1998. Read-copy update: Using execution history to solve concurrency problems. In Parallel and Distributed
Computing and Systems, Vol. 509518.
[19] J. M. Mellor-Crummey and M. L. Scott. 1991. Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors. ACM Transactions on
Computer Systems 9, 1 (Feb. 1991), 21–65. https://doi.org/10.1145/103727.103729
[20] M. M. Michael. 2004. Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects. IEEE Transactions on Parallel and Distributed Systems
(TPDS) 15, 6 (2004), 491–504.
[21] D. Narayanan and O. Hodson. 2012. Whole-System Persistence. In Proceedings of the International Conference on Architectural Support for
Programming Languages and Operating Systems (ASPLOS) (London, UK). ACM, New York, NY, USA, 401–410.
[22] A. Ramaraju. 2015. RGLock: Recoverable Mutual Exclusion for Non-Volatile Main Memory Systems. Master’s thesis. Electrical and Computer
Engineering Department, University of Waterloo. http://hdl.handle.net/10012/9473
[23] Haosen Wen, Joseph Izraelevitz, Wentao Cai, H. Alan Beadle, and Michael L. Scott. 2018. Interval-Based Memory Reclamation. In Proceedings of
the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Vienna, Austria) (PPoPP ’18). Association for Computing
Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3178487.3178488
[24] J.-H. Yang and J. H. Anderson. 1995. A Fast, Scalable Mutual Mxclusion Algorithm. Distributed Computing (DC) 9, 1 (March 1995), 51–60.
https://doi.org/10.1007/BF01784242

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy