Memory Recalimation
Memory Recalimation
a theoretical one, memory reclamation poses as a major obstacle in several RME algorithms. Often RME algorithms need to allocate
memory dynamically, which increases the memory footprint of the algorithm over time. These algorithms are typically not equipped
with suitable garbage collection due to concurrency and failures.
In this work, we present the first “general” recoverable algorithm for memory reclamation in the context of recoverable mutual
exclusion. Our algorithm can be plugged into any RME algorithm very easily and preserves all correctness property and most desirable
properties of the algorithm. The space overhead of our algorithm is O (𝑛2 ∗𝑠𝑖𝑧𝑒𝑜 𝑓 (𝑛𝑜𝑑𝑒) ), where 𝑛 is the total number of processes
in the system. In terms of remote memory references (RMRs), our algorithm is RMR-optimal, i.e, it has a constant RMR overhead per
passage. Our RMR and space complexities are applicable to both CC and DSM memory models.
1 INTRODUCTION
Mutual exclusion (ME) is a commonly used technique to handle conflicts in concurrent systems. The problem of mutual
exclusion was first defined by Dijkstra [8] more than half a century ago. Mutual exclusion algorithms, commonly
known as locks, are used by processes to execute a part of code, called critical section (CS) in isolation without
any interference from other processes. The CS typically consists of code that involves access to shared resources,
which when accessed concurrently could potentially cause undesirable race conditions. The mutual exclusion problem
involves designing algorithms to ensure processes enter the CS one at a time.
Generally, algorithms for mutual exclusion are designed with the assumption that failures do not occur, especially
while a process is accessing a lock or a shared resource. However, such failures can occur in the real world. A power
outage or network failure might create an unrecoverable situation causing processes to be unable to continue. If such
failures occur, traditional mutual exclusion algorithms, which are not designed to operate properly under failures, may
deadlock or otherwise fail to guarantee important safety and liveness properties. In many cases, such failures may
have disastrous consequences. This gave rise to the problem of recoverable mutual exclusion (RME). The RME problem
involves designing an algorithm that ensures mutual exclusion under the assumption that process failures may occur
at any point during their execution, but the system is able to recover from such failures and proceed without any
adverse consequences.
Traditionally, concurrent algorithms use checkpointing and logging to tolerate failures by regularly saving relevant
portion of application state to a persistent storage such as hard disk drive (HDD). Accessing a disk is orders of
magnitude slower than accessing main memory. As a result, checkpointing and logging algorithms are often designed
to minimize disk accesses. Non-volatile random-access memory (NVRAM) is a new class of memory technologies that
combines the low latency and high bandwidth of traditional random access memory with the density, non-volatility,
and economic characteristic of traditional storage media (e.g., HDD). Existing checkpointing and logging algorithms
can be modified to use NVRAMs instead of disks to yield better performance, but, in doing so,we would not be
Authors’ addresses: Sahil Dhoked, The University of Texas at Dallas, TX, 75080, USA, sahil.dhoked@utdallas.edu; Neeraj Mittal, The University of Texas
at Dallas, TX, 75080, USA, neerajm@utdallas.edu.
1
2 Sahil Dhoked and Neeraj Mittal
leveraging the true power of NVRAMs [13, 21]. NVRAMs can be used to directly store implementation specific variables
and, as such, have the potential for providing near-instantaneous recovery from failures.
By directly storing implementation variables on NVRAMs, most of the application data can be easily recovered
after failures. However, recovery of implementation variables alone is not enough. Processor state information such
as contents of program counter, CPU registers and execution stack cannot be recovered completely and need to be
handled separately. Due to this reason, there is a renewed interest in developing fast and dependable algorithms for
solving many important computing problems in software systems vulnerable to process failures using NVRAMs. Using
innovative methods, with NVRAMs in mind, we aim to design efficient and robust fault-tolerant algorithms for solving
mutual exclusion and other important concurrent problems.
The RME problem in the current form was formally defined a few years ago by Golab and Ramaraju in [12]. Several
algorithms have been proposed to solve this problem [7, 10, 13, 16, 17]. However, in order to ensure that the problem
of RME is also of practical interest, and not just a theoretical one, memory reclamation poses as a major obstacle in
several RME algorithms. Often, RME algorithms allocate memory dynamically which increases the memory footprint
of the algorithm over time. These algorithms are typically not equipped with suitable garbage collection to avoid errors
that may arise from concurrency and potential failures.
Memory reclamation, in single process systems without failures, follows a straightforward pattern. The process
allocates “nodes” dynamically, consumes it, and frees it once it has no more need of this node. Freed nodes may later
be reused (as part of a different allocation) or returned to the operating system. However, due to some programmer
error, if a node that is freed is later accessed by the process in the context of the previous allocation, it may cause
some serious damage to the program and the operating system as well. In the context of multi-process systems, when
a process frees a node, we may face the same issue without any programmer error. Even if the process that frees the
node is able to guarantee that it will not access that node again, there may exist another process that is just about to
access or dereference the node in the context of the old allocation.
In order to avoid the aforementioned error, freeing a node is broken down into two tasks. First, a process retires the
node, after which, any process that did not have access to the node may no longer be able to get access to the node.
Second, the node needs to be reclaimed once it is deemed to be “safe”, i.e., no process can obtain any further access
to the node in the context of the previous allocation. A memory reclamation service is responsible to provide a safe
reclamation of a node once it is retired. On the other hand, the responsibility of retiring the node is typically on the
programmer that needs to consume the memory reclamation service.
Prior works on memory reclamation [3, 5, 9, 18, 20, 23] provide safe memory reclamation in the absence of failures,
but are not trivially suited to account for failures and subsequent recovery using persistent memory. Moreover, most
works focus on providing memory reclamation in the context of lock-free data structures.
In this work, we present the first “general” recoverable algorithm (that we know of) for memory reclamation in
the context of recoverable mutual exclusion. Our algorithm is general enough that it can be plugged into any RME
algorithm very easily, while preserving all correctness properties and most desirable properties of the algorithm. On
the other hand, it is specific enough to take advantage of assumptions made by RME algorithms. In particular, our
algorithm may be blocking, but it is suitable in the context of the RME due to the very blocking nature of the RME
problem.
Our approach derives from prior works of EBR [9] (epoch based reclamation) and QSBR [18] (quiescent state based
reclamation). However, unlike EBR and QSBR, where the memory consumption may grow unboundedly due to a
slow process, our algorithm guarantees a bounded memory consumption. The space overhead of our algorithm is
Memory Reclamation for Recoverable Mutual Exclusion 3
O (𝑛 2 ∗ 𝑠𝑖𝑧𝑒𝑜 𝑓 (𝑛𝑜𝑑𝑒) ), where 𝑛 is the total number of processes in the system, and a “node” is a collection of all the
resources used in one passage of the CS.
One of the most important measures of performance of an RME algorithm is the maximum number of remote
memory references (RMRs) made by a process per critical section request in order to acquire and release the lock as
well as recover the lock after a failure. Whether or not a memory reference incurs an RMR depends on the underlying
memory model. The two most common memory models used to analyze the performance of an RME algorithm are
cache-coherent (CC) and distributed shared memory (DSM) models. In terms of remote memory references (RMRs), our
algorithm is RMR-optimal, i.e, it has a constant RMR overhead per passage for both CC and DSM memory models.
Moreover, this algorithm uses only read, write and comparison based primitives,
The main idea behind our approach is (1) maintain two pools of “nodes”, clean (reclaimed) and dirty (retired) (2) wait
for dirty nodes to become clean, while consuming the clean pool (3) switch dirty and clean pools. Our algorithm
operates in tandem with any RME algorithm via two methods/APIs that can be invoked by the programmer to allocate
new nodes and retire old nodes.
Roadmap: The rest of the text is organized as follows. We describe our system model and formally define the RME
and the memory reclamation problem in section 2. We define a new object, called the Broadcast object and its properties
in section 3. We also present an RMR-optimal solution to the Broadcast object for both the CC and DSM model in
section 3. In section 4, we present an algorithm that provides memory reclamation for RME algorithms. This algorithm
is RMR-optimal, but not lock-free. In section 5, we describe how our memory reclamation algorithm can be equipped
to existing RME algorithms. A detailed description of the related work is given in section 6. Finally, in section 7, we
present our conclusions and outline directions for future research.
Definition 2.1 (passage). A passage of a process is defined as the sequence of steps executed by the process from
when it begins executing Recover segment to either when it finishes executing the corresponding Exit segment or
experiences a failure, whichever occurs first.
Definition 2.2 (super-passage). A super-passage of a process is a maximal non-empty sequence of consecutive passages
executed by the process, where only the last passage of the process in the sequence is failure-free.
Free
Allocated
Retired Reclaimed
The responsibility of a memory reclamation service is to provide safe reclamation (defined later) of a node once it is
retired. On the other hand, the responsibility of retiring the node is typically on the programmer that needs to consume
the memory reclamation service.
In our work, we assume that nodes are reused (instead of freed), once they are reclaimed. As a result, the lifecycle of
a node follows four (logical) stages: (1) Free (2) Allocated (3) Retired (4) Reclaimed. The lifecycle of a node follows
a pattern as shown in Figure 1. Initially, a node is assumed to be in the Free stage. Once it is assigned by the new_node( )
method, it is in the Allocated stage. After getting retired, it is in the Retired stage, and finally, it is moved to the
Reclaimed stage by the memory reclamation algorithm. Once a node is reclaimed, it can be reused and will move to
the Allocated stage, and so on.
Designing a memory reclamation scheme for recoverable mutual exclusion (RME) algorithms involves designing
the new_node( ) and retire(node) methods such that the following correctness properties are satisfied.
Safe reclamation For any history 𝐻 , if process 𝑝𝑖 accesses a node 𝑥, then either 𝑥 is local to 𝑝𝑖 , or 𝑥 is in Allocated
or Retired stages.
Note that any RME algorithm only requires a single node at any given point in time. Thus, we would want multiple
executions of the new_node( ) method to return the same node until the node is retired. Similarly, we want to allow
the same node to be retired multiple times until a new node is requested.
Idempotent allocation Given any history 𝐻 , process 𝑝𝑖 and a pair of operations, 𝑜𝑝 1 and 𝑜𝑝 2 , of the new_node( )
method invoked by 𝑝𝑖 , if there does not exist an invocation of retire(node) by 𝑝𝑖 between 𝑜𝑝 1 and 𝑜𝑝 2 , then either
both these operations returned the same node in 𝐻 , or at least one of these operations ended with a crash.
Idempotent retirement Given any history 𝐻 , process 𝑝𝑖 and a pair of operations, 𝑜𝑝 1 and 𝑜𝑝 2 , of the retire(node)
method invoked by 𝑝𝑖 , if there does not exist an invocation of new_node( ) by 𝑝𝑖 between 𝑜𝑝 1 and 𝑜𝑝 2 , then
either history 𝐻 ′ = 𝐻 − {𝑜𝑝 1 } or 𝐻 ′′ = 𝐻 − {𝑜𝑝 2 } or both are equivalent to 𝐻 .
In case of failures, it is the responsibility of the underlying algorithm to detect if the failure occurred while executing
any method of the memory reclamation code and if so, re-execute the same method.
model implemented by the underlying hardware architecture. In particular, we consider the two most popular shared
memory models:
Cache Coherent (CC) The CC model assumes a centralized main memory. Each process has access to the central
shared memory in addition to its local cache memory. The shared variables, when needed, are cached in the
local memory. These variables may be invalidated if updated by another process. Reading from an invalidated
variable causes a cache miss and requires the variable value to be fetched from the main memory. Similarly,
write on shared variables is performed on the main memory. Under this model, a remote memory reference
occurs each time there is a fetch operation from or a write operation to the main memory.
Distributed Shared Memory (DSM) The DSM model has no centralized memory. Shared variables reside on individual
process nodes. These variables may be accessed by processes either via the interconnect or a local memory read,
depending on where the variable resides. Under this model, a remote memory reference occurs when a process
needs to perform any operation on a variable that does not reside in its own node’s memory.
An implementation of the Broadcast object is trivial in the CC model. Using a shared MRSW atomic integer, processes
can achieve O (1) RMR-complexity for Set, Wait and Read. However, this approach does not work for the DSM model.
In the DSM model, each shared variable resides on a single processor node. Thus, if processes wait by spinning on the
same shared variable, some processes (from remote nodes) will incur an unbounded number of RMRs. Thus, each
process needs to spin on a variable stored in its local processor node. In this case, process 𝑝 𝑤 needs to broadcast
its Set(𝑥) operation to ensure that all processes that are spinning due to an invocation of the Wait(𝑥) operation are
subsequently signalled. This action could potentially incur O (𝑛) RMRs for the Set(𝑥) operation. Thus, a constant-RMR
implementation of the Broadcast object for the DSM model is non-trivial.
We present an efficient implementation of the Broadcast object for the DSM model in algorithm 2. This implementation
incurs O (1) RMRs for Set, Wait and Read and utilizes O (𝑛) space per Broadcast object. The main idea in our
implementation of the Broadcast object is a wakeup chain, created by the designated process 𝑝 𝑤 , such that each process
in the wakeup chain wakes up the next process in the chain. To trigger the wakeup in the wakeup chain, process 𝑝 𝑤
only needs to wake up the first process in the wakeup chain.
Algorithm 2: Pseudocode for Broadcast object for process 𝑝𝑖 for the DSM model
Definition 4.1 (Grace period). A grace period is a time interval [𝑎, 𝑏] such that all nodes retired before time 𝑎 are safe
to be reclaimed after time 𝑏.
Definition 4.2 (Quiescent state). A process is said to be in a quiescent state at a certain point in time if it cannot
access any node from another process using only its local variables.
10 Sahil Dhoked and Neeraj Mittal
Note that quiescent states are defined within the context of an algorithm. Different algorithms may encompass
different quiescent states. In the context of quiescent states, a grace period is a time interval that overlaps with at least
one quiescent state of each process. In order to reuse a node, a process, say 𝑝𝑖 , must first retire its node and then wait
for at least one complete grace period to safely reclaim the node. After one complete grace period has elapsed, it is safe
to assume that no process would be able to acquire any access to that node.
Main idea: In the case of RME algorithms, we assume that when a process is in the NCS, it is in a quiescent state.
It suffices to say that after 𝑝𝑖 retires its node, if some process 𝑝 𝑗 ( 𝑗 ≠ 𝑖) is in the NCS segment, then 𝑝 𝑗 would be
unable to access that node thereafter. In order to safely reuse (reclaim) a node, process 𝑝𝑖 determines its grace period
in two phases, the snapshot phase and the waiting phase. In the snapshot phase, 𝑝𝑖 takes a snapshot of the status of all
processes and, in the waiting phase, 𝑝𝑖 waits till each process has been in the NCS segment at least once during or after
its respective snapshot. In order to remove the RMR overhead caused by scanning through each process, 𝑝𝑖 executes
each phase in a step manner.
Our memory reclamation algorithm is provides two methods: 1) new_node( ), and 2) retire_last_node( ). A pseudocode
of the memory reclamation algorithm is presented in algorithm 3. Any RME algorithm that needs to dynamically
allocate memory can utilize our memory reclamation algorithm by invoking these two methods. The new_node( )
method returns a “node” that is required by a process to enter the CS of the RME algorithm. Similarly, while leaving
the CS, the retire_last_node( ) method will retire the node used to enter the CS, Our algorithm assumes (and relies on)
the fact that each process will request a new node each time before entering the CS and retire its node prior to entering
the NCS segment. THE RMR overhead of our algorithm is O (1), while the space overhead is O (𝑛 2 ∗ 𝑠𝑖𝑧𝑒𝑜 𝑓 (𝑛𝑜𝑑𝑒)).
new_node( ) method (line 19). This indicates that the process has left the NCS and thus increments the 𝑠𝑡𝑎𝑟𝑡 [𝑖] counter
(line 22). Similarly, once a process needs to retire a node, it invokes the retire_last_node method (line 26) wherein it
updates the 𝑓 𝑖𝑛𝑖𝑠ℎ counter (line 28). The 𝑠𝑡𝑎𝑟𝑡 and 𝑓 𝑖𝑛𝑖𝑠ℎ counters are guarded by if-blocks (line 20, line 27) to warrant
idempotence in case of multiple failures.
A process can consume nodes from the active pool only after taking steps towards reclaiming nodes from the reserve
pool (line 21). The memory reclamation steps are implemented in the 𝑆𝑡𝑒𝑝 ( ) method. The role of the 𝑆𝑡𝑒𝑝 ( ) method
is two-fold. Firstly, it advances the local variable 𝑖𝑛𝑑𝑒𝑥 during each successful execution in order to guarantee a fresh
node on every invocation of the new_node( ) method. Second, the 𝑆𝑡𝑒𝑝 ( ) method performs memory reclamation in
three phases.
12 Sahil Dhoked and Neeraj Mittal
(1) Snapshot (line 33): 𝑝𝑖 takes a snapshot of 𝑠𝑡𝑎𝑟𝑡 [ 𝑗] for all 𝑗 ∈ {1, . . . , 𝑛}
(2) Waiting (line 37): 𝑝𝑖 waits for 𝑓 𝑖𝑛𝑖𝑠ℎ[ 𝑗] to “catch up” to 𝑠𝑡𝑎𝑟𝑡 [ 𝑗] using a Broadcast object as described in section 3.
Simply put, 𝑝𝑖 waits for very old unsatisfied requests of other processes to be satisfied. In this context, a request
is very old if 𝑝𝑖 overtook it 𝑛 times. This ensures that each process has been in a quiescent state before 𝑝𝑖 goes
to the pool swapping phase
(3) Pool swap (line 41 and line 44): If process 𝑝𝑖 reaches this phase, it implies that at least one grace period has
elapsed since the nodes in the reserve pool were retired. At this point it is safe to reuse nodes from the reserve
pool and 𝑝𝑖 simply swaps the active and reserve pool. In order to account for failures, this swap occurs over two
invocations of the 𝑆𝑡𝑒𝑝 ( ) method and the 𝑖𝑛𝑑𝑒𝑥 variable is then reset (line 45).
Note that the algorithm is designed in such a way that multiple executions of the new_node( ) method will return
the same node until the retire_last_node method is called and vice versa. This design aids to introduce idempotence in
and accommodates the failure scenario where a process crashes before being able to capture the node returned by the
new_node( ) method.
5 APPLICATIONS
Golab and Ramaraju’s algorithms [12] have a bounded space complexity, but use the MCS algorithm as their base lock.
The space complexity of the MCS algorithm may grow unboundedly. Using our memory reclamation algorithm, we
can bound the space complexity of their algorithms.
Two known sub-logarithmic RME algorithms, from Golab and Hendler [10], and, from Jayanti, Jayanti and Joshi
[16], both use MCS queue-based structures. Memory reclamation in these algorithms is not trivial and requires careful
analysis and proofs. Our memory reclamation algorithm fits perfectly with these algorithms. The main idea is to employ
one instance of the memory reclamation algorithm at each level of the sub-logarithmic arbitration tree. As a result, the
overall space complexity of these algorithms can be bounded by O (𝑛 3).
Dhoked and Mittal’s algorithm [7], also uses a MCS-queue based structure where the space complexity may grow
unboundedly. Using a separate instance of our memory reclamation algorithm for each level of their adaptive algorithm,
we can bound the space complexity of their algorithm to O (𝑛 2 ∗ log 𝑛/log log 𝑛)
6 RELATED WORK
6.1 Memory reclamation
In [20], Michael used hazard pointers, a wait-free technique for memory reclamation that only requires a bounded
amount of space. Hazard pointers are special shared pointers that protect nodes from getting reclaimed. Such nodes
can be safely accessed. Any node that is not protected by a hazard pointer is assumed to be safe to reclaim. Being
shared pointers, hazard pointers are expensive to read and update.
In [9], Fraser devised a technique called epoch based reclamation (EBR). As the name suggests, the algorithm
maintains an epoch counter 𝑒 and three limbo lists corresponding to epochs 𝑒 − 1, 𝑒 and 𝑒 + 1. The main idea is
that nodes reitred in epoch 𝑒 − 1 are safe to be reclaimed in epoch 𝑒 + 1. This approach is not lock-free and a slow
process may cause the size of the limbo lists to increase unboundedly.
In [18], Mckenney and Slingwine present the RCU framework where they demonstrate the use of quiescent state
based reclamation (QSBR). QSBR relies on detecting quiescent states and a grace period during which each thread
passes through at least one quiescent state. Nodes retired before the grace period are safe to be reclaimed after the
Memory Reclamation for Recoverable Mutual Exclusion 13
grace period. In [3], Arcangeli et. al. make use of the RCU framework and QSBR reclamation for the System V IPC in
the Linux kernel.
In [5], Brown presents DEBRA and DEBRA+ reclamation schemes. DEBRA is a distributed extension of EBR where
each process maintains its individual limbo lists instead of shared limbo lists and epoch computation is performed
incrementally. DEBRA+ relies on hardware assistance from the operating system to provide signalling in order to
prohibit slow or stalled processes to access reclaimed memory.
satisfies some notion of fairness. Another direction of work involves formulating the problem of memory reclamation
for recoverable lock-free data structures and designing algorithms for the same.
REFERENCES
[1] AMD 2019. AMD64 Architecture Programmer’s Manual Volume 3: General Purpose and System Instructions. AMD.
https://www.amd.com/system/files/TechDocs/24594.pdf
[2] J. H. Anderson and Y.-J. Kim. 2002. An Improved Lower Bound for the Time Complexity of Mutual Exclusion. Distributed Computing (DC) 15, 4
(Dec. 2002), 221–253. https://doi.org/10.1007/s00446-002-0084-2
[3] Andrea Arcangeli, Mingming Cao, Paul E McKenney, and Dipankar Sarma. 2003. Using Read-Copy-Update Techniques for System V IPC in the
Linux 2.5 Kernel.. In USENIX Annual Technical Conference, FREENIX Track. 297–309.
[4] H. Attiya, D. Hendler, and P. Woelfel. 2008. Tight RMR Lower Bounds for Mutual Exclusion and Other Problems. In Proceedings of the
40th Annual ACM Symposium on Theory of Computing (STOC) (Victoria, British Columbia, Canada). ACM, New York, NY, USA, 217–226.
https://doi.org/10.1145/1374376.1374410
[5] T. A. Brown. 2015. Reclaiming Memory for Lock-Free Data Structures: There Has to Be a Better Way. In Proceedings of the ACM Symposium on
Principles of Distributed Computing (PODC). ACM, Donostia-San Sebastián, Spain, 261–270.
[6] D. Y. C. Chan and P. Woelfel. 2020. Recoverable Mutual Exclusion with Constant Amortized RMR Complexity from Standard Primitives. In
Proceedings of the 39th ACM Symposium on Principles of Distributed Computing (PODC). ACM, New York, NY, USA, 10 pages.
[7] Sahil Dhoked and Neeraj Mittal. 2020. An Adaptive Approach to Recoverable Mutual Exclusion. In Proceedings of the 39th Symposium
on Principles of Distributed Computing (Virtual Event, Italy) (PODC ’20). Association for Computing Machinery, New York, NY, USA, 1–10.
https://doi.org/10.1145/3382734.3405739
[8] E. W. Dijkstra. 1965. Solution of a Problem in Concurrent Programming Control. Communications of the ACM (CACM) 8, 9 (1965), 569.
[9] K. Fraser. 2004. Practical Lock-Freedom. Ph.D. Dissertation. University of Cambridge.
[10] W. Golab and D. Hendler. 2017. Recoverable Mutual Exclusion in Sub-Logarithmic Time. In Proceedings of the ACM Symposium on Principles of
Distributed Computing (PODC) (Washington, DC, USA). ACM, New York, NY, USA, 211–220. https://doi.org/10.1145/3087801.3087819
[11] W. Golab and D. Hendler. 2018. Recoverable Mutual Exclusion Under System-Wide Failures. In Proceedings of the ACM Symposium on Principles of
Distributed Computing (PODC) (Egham, United Kingdom). ACM, New York, NY, USA, 17–26. https://doi.org/10.1145/3212734.3212755
[12] W. Golab and A. Ramaraju. 2016. Recoverable Mutual Exclusion: [Extended Abstract]. In Proceedings of the ACM Symposium on Principles of
Distributed Computing (PODC) (Chicago, Illinois, USA). ACM, New York, NY, USA, 65–74. https://doi.org/10.1145/2933057.2933087
[13] W. Golab and A. Ramaraju. 2019. Recoverable Mutual Exclusion. Distributed Computing (DC) 32, 6 (Nov. 2019), 535–564.
[14] Thomas E Hart, Paul E McKenney, Angela Demke Brown, and Jonathan Walpole. 2007. Performance of memory reclamation for lockless
synchronization. J. Parallel and Distrib. Comput. 67, 12 (2007), 1270–1285.
[15] Intel 2016. Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A: Instruction Set Reference, A-M. Intel.
https://software.intel.com/sites/default/files/managed/a4/60/325383-sdm-vol-2abcd.pdf
[16] P. Jayanti, S. Jayanti, and A. Joshi. 2019. A Recoverable Mutex Algorithm with Sub-logarithmic RMR on Both CC and DSM. In
Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC) (Toronto ON, Canada). ACM, New York, NY, USA, 177–186.
https://doi.org/10.1145/3293611.3331634
[17] P. Jayanti and A. Joshi. 2017. Recoverable FCFS Mutual Exclusion with Wait-Free Recovery. In Proceedings of the 31st Symposium on Distributed
Computing (DISC) (Vienna, Austria), Andréa W. Richa (Ed.), Vol. 91. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 30:1–
30:15. https://doi.org/10.4230/LIPIcs.DISC.2017.30
[18] Paul E McKenney and John D Slingwine. 1998. Read-copy update: Using execution history to solve concurrency problems. In Parallel and Distributed
Computing and Systems, Vol. 509518.
[19] J. M. Mellor-Crummey and M. L. Scott. 1991. Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors. ACM Transactions on
Computer Systems 9, 1 (Feb. 1991), 21–65. https://doi.org/10.1145/103727.103729
[20] M. M. Michael. 2004. Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects. IEEE Transactions on Parallel and Distributed Systems
(TPDS) 15, 6 (2004), 491–504.
[21] D. Narayanan and O. Hodson. 2012. Whole-System Persistence. In Proceedings of the International Conference on Architectural Support for
Programming Languages and Operating Systems (ASPLOS) (London, UK). ACM, New York, NY, USA, 401–410.
[22] A. Ramaraju. 2015. RGLock: Recoverable Mutual Exclusion for Non-Volatile Main Memory Systems. Master’s thesis. Electrical and Computer
Engineering Department, University of Waterloo. http://hdl.handle.net/10012/9473
[23] Haosen Wen, Joseph Izraelevitz, Wentao Cai, H. Alan Beadle, and Michael L. Scott. 2018. Interval-Based Memory Reclamation. In Proceedings of
the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Vienna, Austria) (PPoPP ’18). Association for Computing
Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3178487.3178488
[24] J.-H. Yang and J. H. Anderson. 1995. A Fast, Scalable Mutual Mxclusion Algorithm. Distributed Computing (DC) 9, 1 (March 1995), 51–60.
https://doi.org/10.1007/BF01784242