0% found this document useful (0 votes)
62 views13 pages

Lecture 7 - FAULT-TOLERANT COMPUTING

Fault tolerance is the ability of a system to continue operating properly despite failures or faults in some of its components. It allows systems to maintain correct functionality even in the presence of faults. Faults can be hardware or software related and can be transient, intermittent, or permanent. Common fault tolerance techniques include hardware redundancy, software redundancy, time redundancy, and information redundancy.

Uploaded by

Dj Mckenzie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views13 pages

Lecture 7 - FAULT-TOLERANT COMPUTING

Fault tolerance is the ability of a system to continue operating properly despite failures or faults in some of its components. It allows systems to maintain correct functionality even in the presence of faults. Faults can be hardware or software related and can be transient, intermittent, or permanent. Common fault tolerance techniques include hardware redundancy, software redundancy, time redundancy, and information redundancy.

Uploaded by

Dj Mckenzie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

FAULT-TOLERANT

COMPUTING
• INTRODUCTION :
• What is fault tolerance?
• Fault tolerance is the property that enables a system to continue
operating properly in the event of the failure of some of its
components.
• Fault tolerance is particularly sought in high-availability or life-critical
systems. It is the art and science of building computing systems that
continue to operate satisfactorily in the presence of faults.

• Fault Tolerance Requirements : The basic characteristics of fault
tolerance are:
• No single point of failure
• No single point of repair
• Fault isolation to the failing component
• Fault containment to prevent propagation of the failure
• system fails if it behaves in a way which is not consistent with its
specification. Such a failure is a result of a fault in a system
component.
• Systems are fault-tolerant if they behave in a predictable manner,
according to their specification, in the presence of faults
• ⇒there are no failures in a fault tolerant system.
• Several application areas need systems to maintain a correct
(predictable) functionality in the presence of faults
• What is correct functionality in the presence of faults?
• The answer depends on the particular application (on the
specification of the system):
• •The system stops and doesn’t produce any erroneous (dangerous)
result/behaviour.
• •The system stops and restarts after a while without loss of
information.
• •The system keeps functioning without any interruption and
(possibly) with unchanged performance
• fault can be:
1.Hardware fault: malfunction of a hardware component (processor,
communication line, switch, etc.).
2.Software fault: malfunction due to a software bug.
• A fault can be the result of:
1. Mistakes in specification or design: such mistakes are at the origin of
all software faults and of some of the hardware faults.
2. Defects in components: hardware faults can be produced by
manufacturing defects or by defects caused as result of deterioration in the
course of time.
3 .Operating environment: hardware faults can be the result of stress
produced by adverse environment: temperature, radiation, vibration, etc
• Fault types according to their temporal behavior:
• 1.Permanent fault: the fault remains until it is re-paired or the
affected unit is replaced.
• 2.Intermittent fault: the fault vanishes and reap-pears (e.g. caused by
a loose wire).
• 3.Transient fault: the fault dies away after sometime (caused by
environmental effects)
• Fault tolerance • A system or a component fails due to a fault • Fault
tolerance means that the system continues to provide its services in
presence of faults
• • A system may experience and should recover also from partial
failures
• • Fault categories in time
• Transient- Occurs once and disappear
• Intermittent- Occurs many times in an irregular way
• Permanent


• Fault-tolerant computing is the art and science of building computing
systems that continue to operate satisfactorily in the presence of
faults.
• A fault-tolerant system may be able to tolerate one or more fault
types including
• Transient, intermittent or permanent hardware faults,
• Software and hardware design errors,
• Operator errors, or
• Externally induced upsets or physical damage.
Techniques of Fault Tolerant Systems
• There are four main techniques which are
• the HW redundancy which done by using more than one unit of the
component to be tolerated.
• second technique is the SW redundancy this is done by using
additional programs, subprograms, and program
• The third technique is the time redundancy, which is useful in the
case of soft errors;
• . The last technique is the information redundancy, which relies on
the coding theory.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy