0% found this document useful (0 votes)

21 views5 pages

Define The Terms: Rollback Propagation.: Coordinated Checkpointing

The document discusses various concepts in distributed computing, including rollback propagation, outside world processes, and types of checkpointing. It compares coordinated and uncoordinated checkpointing, details the Koo–Toueg coordinated checkpointing algorithm, and explains the rollback recovery algorithm. Additionally, it describes the Juang–Venkatesan algorithm for asynchronous checkpointing and recovery, emphasizing the importance of consistent checkpoints and message logging.

Uploaded by

spartanzaugust2019

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views5 pages

Define The Terms: Rollback Propagation.: Coordinated Checkpointing

Uploaded by

spartanzaugust2019

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

FT 6 Distributed Computing

1. Define the terms: rollback propagation.

Upon a failure of one or more processes in a system, these dependencies may force some of the
processes that did not fail to roll back, creating what is commonly called a rollback propagation.
2. What is meant by “outside world process (OWP).”?
A distributed system often interacts with the outside world to receive input data or deliver the
outcome of a computation
Outside World Process (OWP)
a special process that interacts with the rest of the system through message passing
A common approach
save each input message on the stable storage before allowing the application program to process it

3. What are the two types of communication-induced check pointing?

Two types of communication-induced checkpointing
model-based checkpointing and index-based checkpointing.

4. Formulate the different types of messages.

In-transit message
Lost messages
Orphan messages
Duplicate messages
Delayed messages
5. Compare coordinated check pointing and uncoordinated check pointing.

 Coordinated Checkpointing:

 In this approach, all processes in the system take checkpoints at the same time, ensuring a
consistent global state. This simplifies recovery because the system can restart from the
last global checkpoint without inconsistency.

 Uncoordinated Checkpointing:

 Here, processes take checkpoints independently without synchronization. While this can
reduce overhead and increase flexibility, it may lead to inconsistent states that complicate
recovery. Inconsistent states can require additional mechanisms, like logging or message
logging, to handle dependencies between processes.

6. i) Summarize the Koo–Toueg coordinated check pointing algorithm.

A coordinated checkpointing and recovery technique that takes a consistent set of checkpointing and
avoids domino effect and livelock problems during the recovery
Includes 2 parts: the checkpointing algorithm and the recovery algorithm
Checkpointing algorithm
Assumptions: FIFO channel, end-to-end protocols, communication failures do not partition the
network, single process initiation, no process fails during the execution of the algorithm
Two kinds of checkpoints: permanent and tentative
Permanent checkpoint: local checkpoint, part of a consistent global checkpoint
Tentative checkpoint: temporary checkpoint, become permanent checkpoint when the algorithm
terminates successfully

Checkpointing algorithm
First phase

An initiating process Pi takes a tentative checkpoint and requests all other processes to take tentative
checkpoints. Each process informs Pi whether it succeeded in taking a tentative checkpoint. A process
says “no” to a request if it fails to take a tentative checkpoint, which could be due to several reasons,
depending upon the underlying application. If Pi learns that all the processes have successfully taken
tentative checkpoints, Pi decides that all tentative checkpoints should be made permanent; otherwise,
Pi decides that all the tentative checkpoints should be discarded.

Second phase

Pi informs all the processes of the decision it reached at the end of the first phase. A process, on
receiving the message from Pi, will act accordingly.

Therefore, either all or none of the processes advance the checkpoint by taking permanent checkpoints.

The algorithm requires that after a process has taken a tentative checkpoint, it cannot send messages
related to the underlying computation until it is informed of Pi’s decision.

Correctness: for 2 reasons

Either all or none of the processes take permanent checkpoint
No process sends message after taking permanent checkpoint
Optimization: maybe not all of the processes need to take
checkpoints (if not change since the last checkpoint)

ii) Explain the rollback recovery algorithm.

The rollback recovery algorithm

Restore the system state to a consistent state after a failure with assumptions: single initiator,
checkpoint and rollback recovery algorithms are not invoked concurrently
2 phases
The initiating process send a message to all other processes and ask for the preferences – restarting to
the previous checkpoints. All need to agree about either do or not.
The initiating process send the final decision to all processes, all the processes act accordingly after
receiving the final decision.

First phase

An initiating process Pi sends a message to all other processes to check if they all are willing to restart
from their previous checkpoints. A process may reply “no” to a restart request due to any reason (e.g., it
is already participating in a checkpoint or recovery process initiated by some other process). If Pi learns
that all processes are willing to restart from their previous checkpoints, Pi decides that all processes
should roll back to their previous checkpoints.

Otherwise, Pi aborts the rollback attempt and it may attempt a recovery at a later time.

Second phase

Pi propagates its decision to all the processes. On receiving Pi’s decision, a process acts accordingly.

During the execution of the recovery algorithm, a process cannot send messages related to the
underlying computation while it is waiting for Pi’s decision.

7. Describe about the Juang–Venkatesan algorithm for asynchronous check pointing and recovery.

Assumptions: communication channels are reliable, delivery messages in FIFO order, infinite buffers,
message transmission delay is arbitrary but finite
Underlying computation/application is event-driven: process P is at state s, receives message m,
processes the message, moves to state s’ and send messages out. So the triplet (s, m, msgs_sent)
represents the state of P

Two type of log storage are maintained:

Volatile log: short time to access but lost if processor crash. Move to stable log periodically.
Stable log: longer time to access but remained if crashed
Asynchronous checkpointing:
After executing an event, the triplet is recorded without any synchronization with other processes.
Local checkpoint consist of set of records, first are stored in volatile log, then moved to stable log.
Recovery algorithm
Notations:
𝑅𝐶𝑉𝐷𝑖←(𝐶𝑘𝑃𝑡𝑖 ): number of messages received by 𝑝𝑖 from 𝑝𝑗, from the beginning of computation to
checkpoint 𝐶𝑘𝑃𝑡𝑖
𝑆𝐸𝑁𝑇𝑖→(𝐶𝑘𝑃𝑡𝑖 ): number of messages sent by 𝑝𝑖 to 𝑝𝑗, from the beginning of computation to
checkpoint 𝐶𝑘𝑃𝑡𝑖
Idea:
From the set of checkpoints, find a set of consistent checkpoints
Doing that based on the number of messages sent and received
Since RCVDX←Y _CkPtX_ = 3 > 2 (2 is the value received in the ROLLBACK(Y , 2) message from Y ), X
will set CkPtX to ex2 satisfyingRCVDX←Y _ex2_ = 2 ≤ 2. Since RCVDZ←Y _CkPt Z_ = 2 > 1, Z will set CkPtZ
to ez1 satisfying RCVDZ←Y _ez1_ = 1 ≤ 1. At Y , RCVDY←X_CkPtY _ = 1 < 2 and RCVDY←Z_CkPtY _ = 1 =
SENTZ←Y _CkPtZ_.

Hence, Y need not roll back further. In the second iteration, Y sends ROLLBACK(Y, 2) to X and
ROLLBACK(Y, 1) to Z; Z sends ROLLBACK(Z, 1) to Y and ROLLBACK(Z, 0) to X; X sends
ROLLBACK(X, 0) to Z and ROLLBACK(X, 1) to Y . Note that if Y rolls back beyond ey3 and loses the
message from X that caused ey3, X can resend this message to Y because
ex2 is logged at X and this message is available in the log. The second and third iteration will progress in
the same manner. Note that the set of recovery points chosen at the end of the first iteration, {ex2, ey2,
ez1}, is consistent, and no further rollback occurs.

PBT 03 PostCard
100% (2)
PBT 03 PostCard
68 pages
DC 4unit
No ratings yet
DC 4unit
8 pages
Unit 4
No ratings yet
Unit 4
94 pages
Unit 4 Part 3
No ratings yet
Unit 4 Part 3
33 pages
Presentation On Consistent Checkpoints & Recovery in Distributed System
100% (1)
Presentation On Consistent Checkpoints & Recovery in Distributed System
26 pages
Unit 4 Part 3
No ratings yet
Unit 4 Part 3
33 pages
Lm3 Checkpointing Algorithm
No ratings yet
Lm3 Checkpointing Algorithm
40 pages
Concurrent Checkpointing and Recovery in Distributed Systems
No ratings yet
Concurrent Checkpointing and Recovery in Distributed Systems
61 pages
Checkpoints Recovery
No ratings yet
Checkpoints Recovery
35 pages
Dc-3551 Unit IV Notes
No ratings yet
Dc-3551 Unit IV Notes
32 pages
12 JuangVenkatesan
No ratings yet
12 JuangVenkatesan
4 pages
Unit Iv Recovery
No ratings yet
Unit Iv Recovery
27 pages
Unit 4
No ratings yet
Unit 4
32 pages
Lm2-Rollback & Recovery
No ratings yet
Lm2-Rollback & Recovery
34 pages
Recovery DC
No ratings yet
Recovery DC
6 pages
DC Unit4
No ratings yet
DC Unit4
33 pages
DS NOTES Unit 4 PDF
No ratings yet
DS NOTES Unit 4 PDF
36 pages
Rohini 836843492
No ratings yet
Rohini 836843492
3 pages
Unit 4 Part 3
No ratings yet
Unit 4 Part 3
21 pages
Unit IV 2 Marks With Answer
No ratings yet
Unit IV 2 Marks With Answer
2 pages
CS8603 U.iv
No ratings yet
CS8603 U.iv
33 pages
Unit 4 Part 2
No ratings yet
Unit 4 Part 2
21 pages
Algorithm For Asynchronous Check Pointing and Recovery
No ratings yet
Algorithm For Asynchronous Check Pointing and Recovery
4 pages
16 - Issues in Failure Recovery
No ratings yet
16 - Issues in Failure Recovery
5 pages
DC Unit4
No ratings yet
DC Unit4
32 pages
1904050001
No ratings yet
1904050001
119 pages
CheckpointingRecovery ds14
No ratings yet
CheckpointingRecovery ds14
35 pages
Unit 4 Answer Key
No ratings yet
Unit 4 Answer Key
24 pages
A Minimum-Process Coordinated Checkpointing Protocol For Mobile Distributed System
No ratings yet
A Minimum-Process Coordinated Checkpointing Protocol For Mobile Distributed System
10 pages
System Recovery
No ratings yet
System Recovery
38 pages
Checkpointing and Rollback Recovery For Distributed Systems 5cvcuy5txm
No ratings yet
Checkpointing and Rollback Recovery For Distributed Systems 5cvcuy5txm
23 pages
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
No ratings yet
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
52 pages
CST402 Scheme
No ratings yet
CST402 Scheme
9 pages
Assignment 4 - 044
No ratings yet
Assignment 4 - 044
4 pages
Unit - Iv
No ratings yet
Unit - Iv
10 pages
4th Unit Topics Recovery
No ratings yet
4th Unit Topics Recovery
73 pages
Module 4 - Distributed Shared Memory and Failure Recovery - Sreerag Sanilkumar
No ratings yet
Module 4 - Distributed Shared Memory and Failure Recovery - Sreerag Sanilkumar
14 pages
Distributed Computing Module 4 Important Topics PYQs
No ratings yet
Distributed Computing Module 4 Important Topics PYQs
23 pages
DC Ict Test-2
No ratings yet
DC Ict Test-2
1 page
A Low Overhead Minimum Process Global Snapshop Collection Algorithm For Mobile Distributed System
No ratings yet
A Low Overhead Minimum Process Global Snapshop Collection Algorithm For Mobile Distributed System
19 pages
DC Unit 4 Important
No ratings yet
DC Unit 4 Important
6 pages
Unit-3 Part2
No ratings yet
Unit-3 Part2
74 pages
4.1.6. Coordinated Checkpointing Algorithm-1
No ratings yet
4.1.6. Coordinated Checkpointing Algorithm-1
7 pages
11 Coordinated Checkpoint
No ratings yet
11 Coordinated Checkpoint
3 pages
Rollback Slides
No ratings yet
Rollback Slides
22 pages
A 161126
No ratings yet
A 161126
26 pages
Disributed Computing Question Bank
No ratings yet
Disributed Computing Question Bank
5 pages
DC (Unit 4)
No ratings yet
DC (Unit 4)
14 pages
CS 194: Distributed Systems
No ratings yet
CS 194: Distributed Systems
15 pages
2098
No ratings yet
2098
5 pages
15Z603
No ratings yet
15Z603
2 pages
Module 4
No ratings yet
Module 4
59 pages
Coordinated Checkpoint Versus Message Log For Fault Tolerant MPI
No ratings yet
Coordinated Checkpoint Versus Message Log For Fault Tolerant MPI
27 pages
Distributed UNIT IV
No ratings yet
Distributed UNIT IV
60 pages
Cse QP Unit 4
No ratings yet
Cse QP Unit 4
2 pages
4.1.4. Checkpoint Based Recovery-1
No ratings yet
4.1.4. Checkpoint Based Recovery-1
10 pages
Synchronous Systems With Failures
No ratings yet
Synchronous Systems With Failures
9 pages
Document 32distributed Computing Concept
No ratings yet
Document 32distributed Computing Concept
16 pages
Fault Tolerant Systems: Part 17 - Checkpointing II Chapter 6 - Checkpointing
No ratings yet
Fault Tolerant Systems: Part 17 - Checkpointing II Chapter 6 - Checkpointing
34 pages
Failure Recovery in Distributed Systems
No ratings yet
Failure Recovery in Distributed Systems
24 pages
IGNOU Operating System Previous Years Solved Papers
From Everand
IGNOU Operating System Previous Years Solved Papers
Manish Soni
No ratings yet
12 CS Pa2 06102021 1
No ratings yet
12 CS Pa2 06102021 1
12 pages
RS485 Type Modbus Communication Protocol (V1.1)
No ratings yet
RS485 Type Modbus Communication Protocol (V1.1)
4 pages
Computer Hardware & Maintenance
No ratings yet
Computer Hardware & Maintenance
4 pages
Basis Document GCL
No ratings yet
Basis Document GCL
29 pages
WSDL
No ratings yet
WSDL
48 pages
About This Course MCSA 20761
No ratings yet
About This Course MCSA 20761
6 pages
CSS Pre Test (50 Items)
100% (2)
CSS Pre Test (50 Items)
4 pages
Auditing Routers, Switches, and Firewalls
0% (1)
Auditing Routers, Switches, and Firewalls
14 pages
Business Anaytics Schedule
No ratings yet
Business Anaytics Schedule
1 page
Unscrambler - X Prediction Engine ProgrammerGÇÖs Reference Manual
No ratings yet
Unscrambler - X Prediction Engine ProgrammerGÇÖs Reference Manual
49 pages
Cognos TM1 Vs Anaplan
No ratings yet
Cognos TM1 Vs Anaplan
19 pages
Python2 Cheat Sheet v2
No ratings yet
Python2 Cheat Sheet v2
2 pages
Lecture 1
No ratings yet
Lecture 1
33 pages
11 Switching
No ratings yet
11 Switching
20 pages
Wdi32 Idoc Formats
No ratings yet
Wdi32 Idoc Formats
21 pages
SQL RENAME Table - Javatpoint
No ratings yet
SQL RENAME Table - Javatpoint
10 pages
Chapter 1 - Introduction - : WWW - Cs.uiuc - Edu/ Hanj
No ratings yet
Chapter 1 - Introduction - : WWW - Cs.uiuc - Edu/ Hanj
52 pages
Chapter9 Role of Metadata New
No ratings yet
Chapter9 Role of Metadata New
14 pages
Nagios Command Definitions
0% (1)
Nagios Command Definitions
5 pages
End To End Development Example in Sap® Netweaver 7.4 Sap® Hana
No ratings yet
End To End Development Example in Sap® Netweaver 7.4 Sap® Hana
71 pages
IO Devices
No ratings yet
IO Devices
7 pages
Managed Self-Service BI & Data As A Service: Pass Dw/Bi Virtual Chapter 7/12/2012 Melissa Coates
No ratings yet
Managed Self-Service BI & Data As A Service: Pass Dw/Bi Virtual Chapter 7/12/2012 Melissa Coates
50 pages
SAP File Processing For SAP HANA en
No ratings yet
SAP File Processing For SAP HANA en
26 pages
Dbms Assignment 5
No ratings yet
Dbms Assignment 5
9 pages
Data Communication CS601 Lecture No 26 Topic 149 To 155 by Abid Ali
No ratings yet
Data Communication CS601 Lecture No 26 Topic 149 To 155 by Abid Ali
13 pages
ICMP Attacks
No ratings yet
ICMP Attacks
10 pages
Dynamic Memory Allocation
No ratings yet
Dynamic Memory Allocation
5 pages
PLM Presentation 2008 Baxter
No ratings yet
PLM Presentation 2008 Baxter
31 pages
Wa0000.
No ratings yet
Wa0000.
155 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Define The Terms: Rollback Propagation.: Coordinated Checkpointing

Uploaded by

Define The Terms: Rollback Propagation.: Coordinated Checkpointing

Uploaded by

FT 6 Distributed Computing

1. Define the terms: rollback propagation.

3. What are the two types of communication-induced check pointing?

4. Formulate the different types of messages.

6. i) Summarize the Koo–Toueg coordinated check pointing algorithm.

Correctness: for 2 reasons

ii) Explain the rollback recovery algorithm.

The rollback recovery algorithm

Two type of log storage are maintained:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.