Equipment Failure Modes: Goble05.book Page 83 Thursday, March 31, 2005 10:39 PM
Equipment Failure Modes: Goble05.book Page 83 Thursday, March 31, 2005 10:39 PM
6
Equipment
Failure Modes
Introduction
A reliability engineer’s first design priority is successful operation. Great
effort must be made to ensure that things work. This priority is certainly
logical for most systems as failure mode is not relevant.
In safety instrumented systems, however, the failure mode is very
important. It makes a difference if the system fails and causes a false trip
versus a failure that prevents the automatic protection.
83
Goble05.book Page 84 Thursday, March 31, 2005 10:39 PM
Consider possible failure modes of a PLC with a digital input and a digital
output; both in a de-energize to trip (logic 0) design. The PLC failure
modes can be categorized relative to the safety function as shown in Table
6-2.
Table 6-2. PLC Failure Mode Categories
Instrument Failure Mode SIF Failure mode
Input stuck High Fail-Danger
Input stuck low Fail-Safe
Input circuit oscillates Fail-Danger*
Output stuck high Fail-Danger
Output stuck low Fail-Safe
Improper CPU execution 50% Fail-Safe
50% Fail-Danger
Memory transient failure 50% Fail-Safe
50% Fail-Danger
Memory permanent failure 50% Fail-Safe
50% Fail-Danger
Power supply low (out of tolerance) Fail-Danger*
Power supply high (out of tolerance) Fail-Danger*
Power supply zero Fail-Safe
Diagnostic timer failure Annunciation
Loss of communication link No Effect
Display panel failed No Effect
* unpredictable - assume worst case
Final element components will fail also, and again the specific failure
modes of the components can be classified into relevant failure modes
depending on the application. It is important to know whether a valve will
open or close on trip. Table 6-3 shows an example failure mode
classification based on a close to trip configuration.
Goble05.book Page 85 Thursday, March 31, 2005 10:39 PM
Fail-Safe
Most practitioners define “Fail-Safe” for an instrument as a failure that
causes a “false or spurious” trip of a safety instrumented function
unless that trip is prevented by the architecture of the safety
instrumented function. Many formal definitions have been attempted
that include “a failure which causes the system to go to a safe state or
increases the probability of going to a safe state.” This definition is useful
at the system level and includes many cases where redundant
architectures are used.
IEC 61508 uses the definition “failure which does not have the potential to
put the safety-related system in a hazardous or fail-to-function state.” This
definition includes many failures that do not cause a false trip under any
circumstances and is quite different from the definition practitioners need
to calculate the false trip probability. Using this definition, all failure
modes that are NOT dangerous are called “safe.” This definition is not
used in this book as most practitioners require more detail.
Fail-Danger
Many practitioners define “Fail-Danger” as a failure that prevents a
safety instrumented function from performing its automatic protection
function. Variations of this definition exist in standards. IEC 61508
provides a definition similar to the one used herein, which reads: “failure
which has the potential to put the safety-related system in a hazardous or
fail-to-function state.” The definition from IEC 61508 goes on to add a
Goble05.book Page 86 Thursday, March 31, 2005 10:39 PM
note: “Whether or not the potential is realized may depend on the channel
architecture of the system; in systems with multiple channels to improve
safety, a dangerous hardware failure is less likely to lead to the overall
dangerous or fail-to-function state.” The note from IEC 61508 recognizes
that a definition for a piece of equipment may not have the same meaning
at the safety instrumented function level or the system level.
Annunciation
Some practitioners recognize that certain failures within equipment used
in a safety instrumented function prevent the automatic diagnostics from
correct operation. When reliability models are built, many account for the
automatic diagnostics ability to reduce the probability of failure. When
these diagnostics stop working, the probability of dangerous failure or
false trip is increased. While these effects may not be significant, unless
they are modeled, the effect is not known.
No Effect
Some failures within a piece of equipment have no effect on the safety
instrumented function, nor cause a false trip, nor prevent automatic
diagnostics from working. Some functionality performed by the
equipment is impaired, but that functionality is not needed. These may
simply be called “No Effect” failures. They are typically not used in any
reliability model intended to obtain probability of a false trip or
probability of a fail-danger. Per IEC61508, these would be classified as
“Fail-Safe” or may be excluded completely from any analysis depending
on interpretation of the analyst.
Detected/Undetected
Failure modes can be further classified as “detected” or “undetected” by
automatic diagnostics. In this book the classification is done at the
instrument level, and the specific diagnostics are automatically performed
somewhere in the safety instrumented system.
PFS/PFD
There is a probability that a safety instrumented function will fail and
cause a spurious/false trip of the process. This is called probability of
failing safely (PFS). There is also a probability that a safety instrumented
function will fail such that it cannot respond to a potentially dangerous
condition. This is called probability of failure on demand (PFD).
PFDavg
PFD average (PFDavg) is a term used to describe the average probability
of failure on demand. PFD will vary as a function of the operating time
interval of the equipment. It will not reach a steady state value if any
periodic inspection, test, and repair is done. Therefore, the average value
of PFD over a period of time can be a useful metric if it assumed that the
potentially dangerous condition (also called hazard) is independent from
equipment failures in the safety instrumented function.
PFDavg is defined as the arithmetic mean over a defined time interval. For
situations where a safety instrumented function is periodically inspected
and tested, the test interval is correct time period. Therefore:
TI
1
PFDavg (TI ) =
TI ∫ (PFD )dt
0
(6-1)
Exercises
6-1. A solenoid is normally energized in normal process operation. It is
de-energized when a dangerous condition is detected and vents air
from a pneumatic actuator. If the solenoid coil fails short circuit
and burns out, the solenoid will de-energize. How should this
failure mode be classified?
6-3. The failure rate (λ) for a pressure transmitter is 1.2 × 10–6 f/hr. The
safe failure mode split is 50%. What is the dangerous failure rate?