0% found this document useful (0 votes)

55 views4 pages

CIS 763: Notes On Faults and Fault-Tolerances

The document discusses different types of faults that can occur in distributed programs, including stuck-at, crash, and Byzantine faults. It then defines three types of fault tolerance: masking tolerance, where the program satisfies its safety and liveness specifications even in the presence of faults; nonmasking tolerance, where the program eventually satisfies both specifications after faults stop occurring; and fail-safe tolerance, where the program satisfies safety but not necessarily liveness in the presence of faults. Examples using sensor readings and modular redundancy are given to illustrate the different types of fault tolerance.

Uploaded by

harshvch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views4 pages

CIS 763: Notes On Faults and Fault-Tolerances

Uploaded by

harshvch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

CIS 763: Notes on Faults and Fault-tolerances

Recall that, in the absence of faults, a program satises its safety and liveness specication. We prove this satisfaction by exhibiting an invariant predicate such that, in the absence of faults, the program is always at a state where the invariant predicate is true.

Faults.
of ways:

The faults that a distributed/network program is subject to may be categorized in a variety

Type: e.g., the faults are stuck-at, fail-stop, crash, omission, timing, performance, or Byzantine. Duration: e.g., the faults are permanent, intermittent, or transient. Observability: e.g., the faults are detectable or not. Repair: e.g. the faults are correctable or not. To reason about faults in a simple and uniform manner, we adopt the following thesis: Faults are systematically represented by actions whose execution perturbs the program state. Denition (Fault-class). A fault-class for a program p is a set of actions over the variables of p.

where ? denotes a nondeterministically chosen binary-value.

The fault that corrupts the state of the wire is represented by the fault action: out = in out :=? ,

Consider, for example, a fault that corrupts the state of a wire. The wire itself is represented by the following program action over two bit variables in and out: out = in out := in .

For this representation to capture all of the categories mentioned above sometimes requires the use of auxiliary state. For example, consider the fault by which the wire is stuck-at-low-voltage. In this case, the correct behavior of the wire is represented by using an auxiliary boolean variable broken and the program action: out = in broken out := in . If a fault occurs, the incorrect behavior of the wire is represented by the program action that sets out to 0 provided that the state of the wire is broken: broken out := 0 . The stuck-at-low-voltage fault is represented by the fault action: broken broken := true . Continuing along these lines, consider process crashes. The crash of a process is represented by introducing an auxiliary variable up for that process, as follows. Each action of that process is to be executed only if up is true. The crash itself is modeled as the occurrence of a fault that corrupts up, by setting it to false. Similarly, the Byzantine behavior of a process can be captured by introducing an auxiliary variable good, as follows: If the variable good is true, then the process executes its normal actions. When a fault action corrupts good to false, the process executes actions whose behavior is nondeterministic.

We are now ready to dene what it means for a program p with an invariant S to tolerate a fault-class F .

Tolerances.

Denition (Fault-span). Let S be an invariant of a program p and F be a fault-class. T is an F -span of p from S i S T, T is closed in p, and each action of F preserves T . Denition(F -tolerant for SP EC from S). p is F -tolerant for SP EC from S i there exists a state predicate T that satises the following three conditions: Starting from any state where T is true, if any action in p or F is executed, the resulting state is also one where T is true. (In other words, T is closed in p and T is closed in F .) Starting from any state where T is true, every computation of p alone eventually reaches a state where S is true. (In other words, T leads to S in p.) This denition may be understood as follows. The state predicate T is an F -span of p from S a boundary in the state space of p up to which (but not beyond which) the state of p may be perturbed by the occurrence of faults in F . If faults in F continue to occur, the state of p remains within this boundary. When faults in F stop occurring, p converges from this boundary to the stricter boundary in the state space where the invariant S is true. It is important to note that there may be multiple such state predicates T from which p meets the above three requirements. Each of these multiple T state predicates captures a (potentially dierent) type of fault-tolerance of p. At any state where S is true, T is also true. (In other words, S T .)

Types of Tolerances.

We now proceed to classify three types of fault-tolerances that a program can exhibit, namely masking, nonmasking, and fail-safe tolerance.

1. In the presence of faults, a masking tolerant program always satises its safety specication, and the execution of p after execution of actions in F yields a computation that is in both the safety and liveness specication of p, i.e., the computation is in the problem specication of p. Denition (masking tolerant). p is masking tolerant to F for SP EC from S i p is F -tolerant for SP EC from S, and S is closed in F . (In other words, if a fault in F occurs in a state where S is true, p continues to be in a state where S is true.) We prove this tolerance by exhibiting an invariant predicate such that even in the presence of faults the program is always at a state where the invariant predicate is true. 2. Nonmasking tolerance is less strict than masking tolerance: in the presence of faults, the program need not satisfy its safety specication but, when faults stop occurring, the program eventually satises both its safety and liveness specication; i.e., the computation has a sux that is in the problem specication. Denition (nonmasking tolerant). p is nonmasking tolerant to F for SP EC from S i p is F -tolerant for SP EC from S, and S is not closed in F . (In other words, if a fault in F occurs in a state where S

is true, p may be perturbed to a state where S is violated. However, p then recovers to a state where S is true.) We prove this tolerance by exhibiting an invariant predicate such that when faults stop occurring the computation eventually reaches (recovers to) a state where the invariant predicate is true. More specically, this would involve calculating a fault-span predicate, and showing that: T leads-to S in p We distinguish a special case of nonmasking tolerance: p is stabilizing tolerant to F i p is nonmasking tolerant to F , and true converges to S in p. (In other words, stabilizing tolerant programs recover from any state in the program state space to S.) 3. Fail-safe tolerance is also less strict than masking: in the presence of faults, the program satises its safety specication but, when faults stop occurring, the program need not satisfy its liveness specication; i.e., the computation is in the safety specication but not necessarily in the liveness specication. Denition (fail-safe tolerant). Let SSP EC be the minimal safety specication that contains SP EC. p is fail-safe tolerant to F for SP EC from S i there exists a state predicate R such that p is F -tolerant for SSP EC from S R, S R is closed in p and in F . (In other words, if a fault in F occurs in a state where S is true, p may be perturbed to a state where S or R is true. In the latter case, the subsequent execution of p yields a computation that is in SSP EC but not necessarily in SP EC.) We prove this satisfaction by exhibiting an invariant predicate and a safe predicate such that when faults occur the program is always at a state where the invariant predicate is true or at least the safe predicate is true.

Examples of Types of Tolerances. Consider the critical section problem: Its safety specication is mutual exclusion multiple processes cannot simultaneously be in the critical section and its liveness specication is freedom from deadlock if some process requests critical section access then eventually some process accesses its critical section.
For the critical section problem, a masking fault-tolerant solution would preserve both mutual exclusion in the presence of the faults and satisfy freedom from deadlock if only nitely many faults occurred. A nonmasking fault-tolerant solution would eventually satisfy both mutual exclusion and freedom from deadlock if only nitely many faults occurred. Observe that this is equivalent to saying that the solution would satisfy freedom from deadlock and eventually satisfy mutual exclution if only nitely many faults occurred. A failsafe fault-tolerant solution would satisfy mutual exclusion in the presence of faults, but not necessarily freedom from deadlock. Next, we give an example in the use of double/triple modular redundancy: The problem is to assign the value of an input variable into the variable out. Sensors named x, y, z contain the value of the input variable. Faults may corrupt the sensor values values of at most one of the sensors. Fault-intolerant program IR. Program IR consists of a single action that copies the value of x into out. The value of out denotes that out has not been assigned. Thus, the action of IR is as follows: IR :: out = out := x

IR satises the specication in the absence of one sensor corruption but not in its presence.

Fail-safe fault-tolerant program SR. To preserve safety in the presence of one corrupted sensor, we use another sensor y thus obtaining double modular redundancy: SR :: out = x = y out := x

SR does not satisfy its liveness specication in the presence of one sensor corruption. Nonmasking fault-tolerant program NR. To restore safety in the presence of one corrupted sensor, while preserving liveness, we use yet another sensor z thus obtaining triple modular redundancy: N R1 :: N R2 :: out = out = x (x = y x = z) out := x out := y or out := z

M R satises the livenss specication and eventually satises the safety specication in the presence of one sensor corruption. Masking fault-tolerant program MR. In fact, triple modular redundancy suces to preserve both safety and liveness in the presence of a sensor corruption: M R1 :: M R2 :: M R3 :: out = (x = y x = z) out = (y = x y = z) out = (z = y z = x) out := x out := y out := z

M R satises the specication in the presence of one sensor corruption.

Remarks.
In the absence of faults means that each computation consists of program actions only. In the presence of faults means that each computation is an interleaving of program and fault actions. When faults stop occurring means that the computation has only nitely many occurrences of fault actions. A computation eventually satises a property means that the computation has a sux that satises the property. For design and engineering purposes, it is important to characterize the classes of faults that the program is subject to. This characterization involves analyzing the environment of the program the environment includes other program with which this interacts. In some cases, exhaustively characterizing the fault classes is dicult. In such cases, one should choose some fault-class that is large enough to accommodate all possible faults. It is often for this reason that designers choose weak fault-models such as transient state failures (where the state may be perturbed arbitrarily) or Byzantine failure (where the program may behave arbitrarily). We have made an assumption in this discussion: execution of any fault action in F always maintains the problem specication, i.e., if a prex maintains a problem specication and s is the extended prex obtained by execution of a fault action in F (where s is a state and s is the concatenation of and s), then s also maintains the problem specication.

Cousot - Syntactic and Semantic Soundness of Structural Dataflow Analysis
No ratings yet
Cousot - Syntactic and Semantic Soundness of Structural Dataflow Analysis
22 pages
Furia2014 - Loop Invariant
No ratings yet
Furia2014 - Loop Invariant
52 pages
SentinelOne Api Documentation Version 2 1
No ratings yet
SentinelOne Api Documentation Version 2 1
3,013 pages
Assigning Meanings To Programs 1967
No ratings yet
Assigning Meanings To Programs 1967
15 pages
SWS10 Lecture 7
No ratings yet
SWS10 Lecture 7
29 pages
Fault-Tolerant Parallel Computing
No ratings yet
Fault-Tolerant Parallel Computing
4 pages
Lecture 7
No ratings yet
Lecture 7
57 pages
Aws Thesis
No ratings yet
Aws Thesis
40 pages
Proving Guarantee and Recurrence Temporal Properties by Abstract Interpretation
No ratings yet
Proving Guarantee and Recurrence Temporal Properties by Abstract Interpretation
18 pages
STDcurs1 Merged
No ratings yet
STDcurs1 Merged
139 pages
DS CH7 - Fault Tolerance
No ratings yet
DS CH7 - Fault Tolerance
17 pages
04-Bounded Auditable Restoration of Distributed Systems
No ratings yet
04-Bounded Auditable Restoration of Distributed Systems
14 pages
Unit4 Reliability Evaluation
No ratings yet
Unit4 Reliability Evaluation
5 pages
2 Solution
No ratings yet
2 Solution
5 pages
12BSC203 Verifying Algorithms
No ratings yet
12BSC203 Verifying Algorithms
40 pages
CH 06
No ratings yet
CH 06
118 pages
Derivation of A Simple Synchronization Algorithm
No ratings yet
Derivation of A Simple Synchronization Algorithm
9 pages
12 Static
No ratings yet
12 Static
10 pages
Algorithmic Verification of Asynchronous Programs
No ratings yet
Algorithmic Verification of Asynchronous Programs
46 pages
Rtes Reliability and Fault Torelance
No ratings yet
Rtes Reliability and Fault Torelance
40 pages
Software Fault Tolerance Methods
No ratings yet
Software Fault Tolerance Methods
50 pages
What Is A Fault and Why Does It Matter 2015
No ratings yet
What Is A Fault and Why Does It Matter 2015
26 pages
Static Program Analysis CS701: Thomas Reps
No ratings yet
Static Program Analysis CS701: Thomas Reps
8 pages
Software Model Checking Survey
No ratings yet
Software Model Checking Survey
57 pages
What Is A Fault? and Why Does It Matter?
No ratings yet
What Is A Fault? and Why Does It Matter?
21 pages
Chapter 8
No ratings yet
Chapter 8
107 pages
Lesson 2 - Fault and Error Modelling
No ratings yet
Lesson 2 - Fault and Error Modelling
7 pages
Statistical Debugging: A Hypothesis Testing-Based Approach
No ratings yet
Statistical Debugging: A Hypothesis Testing-Based Approach
17 pages
What Is A Fault and Why Does It Matter Nouv2015
No ratings yet
What Is A Fault and Why Does It Matter Nouv2015
14 pages
A Semantic Definition of Faults and Its Implications
No ratings yet
A Semantic Definition of Faults and Its Implications
8 pages
Manna MathematicalTheoryOfComputation Oct27 PDF
No ratings yet
Manna MathematicalTheoryOfComputation Oct27 PDF
16 pages
Lec 3
No ratings yet
Lec 3
30 pages
N-Version Programming A Fault-Tolerance Approach To Reliability Software Operation
No ratings yet
N-Version Programming A Fault-Tolerance Approach To Reliability Software Operation
7 pages
w9s1 FaultTolerance1
No ratings yet
w9s1 FaultTolerance1
34 pages
Overcoming Byzantine Failures Using Checkpointing
No ratings yet
Overcoming Byzantine Failures Using Checkpointing
11 pages
Dependability and Its Threats A Taxonomy
No ratings yet
Dependability and Its Threats A Taxonomy
31 pages
Manna MathematicalTheoryOfComputation Oct27
No ratings yet
Manna MathematicalTheoryOfComputation Oct27
16 pages
326 1262 1 PB Distributed Computing
No ratings yet
326 1262 1 PB Distributed Computing
51 pages
Abstract Interpretation Based Program Testing: Radhia - Cousot@
No ratings yet
Abstract Interpretation Based Program Testing: Radhia - Cousot@
10 pages
Actions and Objects
No ratings yet
Actions and Objects
22 pages
Semantics, Refinement: COMP2111 Lecture 2 Session 1, 2013
No ratings yet
Semantics, Refinement: COMP2111 Lecture 2 Session 1, 2013
42 pages
The Fault Detection Problem
No ratings yet
The Fault Detection Problem
15 pages
Slides 08 PDF
No ratings yet
Slides 08 PDF
95 pages
On Faults and Faulty Programs 2014
No ratings yet
On Faults and Faulty Programs 2014
17 pages
W01P2 FaultErrorFailure
No ratings yet
W01P2 FaultErrorFailure
23 pages
Fault Tolerant Computing
No ratings yet
Fault Tolerant Computing
4 pages
Lect8 FaultTolerance
No ratings yet
Lect8 FaultTolerance
37 pages
Chapter 3
No ratings yet
Chapter 3
40 pages
Q2 2020 Fundamental IT Engineer Examination (Morning)
No ratings yet
Q2 2020 Fundamental IT Engineer Examination (Morning)
29 pages
Fundamental Concepts of Dependability: Algirdas Aviz Ienis Jean-Claude Laprie Brian Randell
No ratings yet
Fundamental Concepts of Dependability: Algirdas Aviz Ienis Jean-Claude Laprie Brian Randell
6 pages
Cheru
No ratings yet
Cheru
10 pages
@airbus: Component Maintenance Manual With Illustrated Part List
No ratings yet
@airbus: Component Maintenance Manual With Illustrated Part List
154 pages
Rts
No ratings yet
Rts
44 pages
Design For Testability: N.Pitcheswara Rao Assistant Professor ECE Department
No ratings yet
Design For Testability: N.Pitcheswara Rao Assistant Professor ECE Department
47 pages
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
No ratings yet
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
52 pages
Challenging Malicious Inputs With Fault Tolerance Techniques
No ratings yet
Challenging Malicious Inputs With Fault Tolerance Techniques
8 pages
AUTOSAR SWS Persistency
No ratings yet
AUTOSAR SWS Persistency
96 pages
WRL0004 TMP
No ratings yet
WRL0004 TMP
9 pages
Fault Tolerance Techniques: Unit 3
No ratings yet
Fault Tolerance Techniques: Unit 3
40 pages
IC Technical Requirements Document Template
No ratings yet
IC Technical Requirements Document Template
7 pages
Web App Success
No ratings yet
Web App Success
369 pages
Design Patterns For Blockchain-Based Self-Sovereign Identity - European Conference On Pattern Languages of Programs
No ratings yet
Design Patterns For Blockchain-Based Self-Sovereign Identity - European Conference On Pattern Languages of Programs
15 pages
603 Multimedia Technology
No ratings yet
603 Multimedia Technology
11 pages
Dictionary of Computer Acronyms and Abbreviations B
No ratings yet
Dictionary of Computer Acronyms and Abbreviations B
10 pages
Mc0071 - Software Engineering SET-1 4. What About The Programming For Reliability?
No ratings yet
Mc0071 - Software Engineering SET-1 4. What About The Programming For Reliability?
8 pages
Bjib2033 Individual Assignment
No ratings yet
Bjib2033 Individual Assignment
5 pages
Ccna Training Ccnav3 - New Questions 4
No ratings yet
Ccna Training Ccnav3 - New Questions 4
31 pages
MCA Rtu Syllabuss
No ratings yet
MCA Rtu Syllabuss
6 pages
ICT603 Assessment Guide 2024S1 PDF
No ratings yet
ICT603 Assessment Guide 2024S1 PDF
24 pages
H30 LIS Protocol
No ratings yet
H30 LIS Protocol
11 pages
ADXL375 Example
No ratings yet
ADXL375 Example
32 pages
Peer To Peer File Sharing
No ratings yet
Peer To Peer File Sharing
64 pages
Traceback
No ratings yet
Traceback
2 pages
Report On Network Analysis in Dhulikhel Municipality - Group E
No ratings yet
Report On Network Analysis in Dhulikhel Municipality - Group E
14 pages
Mc0071-Software Engineering Test 4.what About The Programming For Reliability?
No ratings yet
Mc0071-Software Engineering Test 4.what About The Programming For Reliability?
4 pages
Automobile Dealership Management Software
100% (1)
Automobile Dealership Management Software
12 pages
SC 900
No ratings yet
SC 900
8 pages
Students Details For Nexjob - in
No ratings yet
Students Details For Nexjob - in
4 pages
Inter-VLAN Routing: Advanced Computer Networks
No ratings yet
Inter-VLAN Routing: Advanced Computer Networks
33 pages
Vulnerability Analysis
No ratings yet
Vulnerability Analysis
4 pages
SAP HANA Availability System Replication NZD Upgrade 1735697827
No ratings yet
SAP HANA Availability System Replication NZD Upgrade 1735697827
3 pages
Computer Science Paper 1 SL
No ratings yet
Computer Science Paper 1 SL
6 pages
MariaDB Encryption
No ratings yet
MariaDB Encryption
11 pages
ISO 19650 Workflow With Free ISO 19650 Templates
No ratings yet
ISO 19650 Workflow With Free ISO 19650 Templates
1 page
Icet Inst English
No ratings yet
Icet Inst English
8 pages
Java Loops II - Java Question
No ratings yet
Java Loops II - Java Question
4 pages
Study of High Performance Amba Ahb Reconfigurable Arbiter For On-Chip Bus Architecture
No ratings yet
Study of High Performance Amba Ahb Reconfigurable Arbiter For On-Chip Bus Architecture
5 pages
Fundamentals of Wireless Module 1 Answers
100% (1)
Fundamentals of Wireless Module 1 Answers
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CIS 763: Notes On Faults and Fault-Tolerances

Uploaded by

CIS 763: Notes On Faults and Fault-Tolerances

Uploaded by

CIS 763: Notes on Faults and Fault-tolerances

The faults that a distributed/network program is subject to may be categorized in a variety

where ? denotes a nondeterministically chosen binary-value.

M R satises the specication in the presence of one sensor corruption.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.