0% found this document useful (0 votes)
116 views12 pages

Getting Failure Rate Data: Random Failures Versus Systematic Failures

This document discusses sources of failure rate data that can be used for probabilistic safety integrity calculations as required by functional safety standards. It notes that early on, failure rate data was limited but helped designers recognize problems. Industry failure databases exist but have limitations, as they may not distinguish between random and systematic failures or failure causes. They also may report multiple failures as one or include data from different technology classes. The document recommends using databases with an understanding of their limitations and focusing on avoiding systematic failures through good design practices.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views12 pages

Getting Failure Rate Data: Random Failures Versus Systematic Failures

This document discusses sources of failure rate data that can be used for probabilistic safety integrity calculations as required by functional safety standards. It notes that early on, failure rate data was limited but helped designers recognize problems. Industry failure databases exist but have limitations, as they may not distinguish between random and systematic failures or failure causes. They also may report multiple failures as one or include data from different technology classes. The document recommends using databases with an understanding of their limitations and focusing on avoiding systematic failures through good design practices.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Goble05.

book Page 117 Thursday, March 31, 2005 10:39 PM

8
Getting Failure
Rate Data

Introduction
When ISA84.01 (Ref. 1) was first released in 1996, several made the
comment, “No one has good failure rate data.” This led some to believe
that the whole idea behind probabilistic failure calculations was
impractical.
In the early years of the functional safety standards, industry failure
databases could provide failure data information. While this failure data
was not product specific or application specific, it helped designers
recognize problems in their designs. One such problem was the “weak
link” design (Ref. 2). Such a design included a high quality SIL3 safety
PLC that was connected to a switch and a solenoid. Many of the engineers
thought they had a SIL3 design until they did the safety verification
calculations. Such a design may not even meet SIL1 depending on proof
test effectiveness and manual proof test time interval!

Even with approximate data, the methods began to show how designers
could achieve higher levels of safety while optimizing costs. The safety
verification calculations required by the new functional safety standards
have shown designers how to design much more balanced designs. The
calculations have shown many how to do a better job. But, failure rate and
failure mode data for random failures on the chosen equipment is
required.

Random Failures versus Systematic Failures

The concept of random failures versus systematic failures was presented


in Chapter 3. One must understand the differences in order to understand
failure rate data. For safety instrumented function verification

117
Goble05.book Page 118 Thursday, March 31, 2005 10:39 PM

118 Getting Failure Rate Data

calculations, the failure rate data due to random failures during the useful
life of a product is required.

The “Well Designed System”

The concept of the “well designed system” was also presented in


Chapter 3. A simplistic definition of such a system would be one where all
the techniques and measures presented in our functional safety standards
to prevent systematic failures are followed. These techniques and
measures are planned to significantly reduce the chance of a systematic
fault to a tolerable level. Therefore, systematic failure rates caused by
human error including failures due to installation errors, failures due to
calibration errors and failures due to choosing equipment not suited for
purpose are not included in the calculation.
This is not to say that systematic errors cannot happen. It is clearly
recognized that these failures do occur and that they do impact safety
integrity. One field failure study done by one of the authors traced
instrument failure reports to specific end user sites. The results showed
that failure rates for the same instrument varied by over an order of
magnitude from site to site. There is no doubt that this is significant. But
the site specific and even person specific variables preclude an “average”
probabilistic approach. That is why it is so important to understand and
follow all the procedures, techniques and measures presented in the
functional safety standards to avoid and control systematic failures. It is so
important to have a “well designed system” for any safety instrumented
function.

Industry Failure Databases


Several industry failure databases exist. Analysts gather failure records,
make estimates of time in operation and calculate failure rates. The
resulting information is published in a book in various forms or provided
in a computer database. The main advantage of such documents is that
they provide actual field failure based information.

Several problems exist with this method of getting failure rate data. Often
needed information about a failure is not collected. This includes total
time in operation, failure confirmation, technology class, failure cause and
stress conditions. The results are usually a significantly higher failure rate
than the number needed for probabilistic SIF verification. This is due to:
1. Lack of distinction between random failures and wear out failures,

2. Lack of distinction between systematic failures and random


failures,

3. Merging of technology classes,


Goble05.book Page 119 Thursday, March 31, 2005 10:39 PM

Getting Failure Rate Data 119

4. Incomplete fault isolation, and

5. Other issues.

When total time in operation is not recorded, failures due to wear out
cannot be distinguished from random failures during the useful life. If
these failures are grouped together, the data analyst cannot distinguish
between them and will typically assume that all failures are random. The
resulting failure rate number is too high. In addition, the opportunity to
establish the useful life period is also lost.

In safety instrumented function verification calculations, the task is to


calculate the probability of failure on demand due to random failures. This
is done assuming that a preventative maintenance program has been
established per the requirements of IEC 61508 (Ref. 3) to replace
instruments before the end of their useful life.

When details about failure cause are not collected, failures due to
maintenance errors, calibration errors and other systematic faults cannot
be distinguished from random failures. The result is a number that can be
high.
When failure confirmation is not done, there are times when multiple
instruments are replaced during system restoration. When the exact cause
of a failure is not identified, multiple “failures” are reported when the
maintenance technician replaces several items in an effort to find out
which one has actually failed. During a period of unexpected downtime,
the emphasis is clearly on system restoration and often time is not
allocated for failure identification. This is understandable given many
restore situations where there may be a harsh environment, little time and
lack of test equipment. The result of recording multiple failures when only
one exists is a failure rate that is too high.

In some databases technology classes are mixed. The authors have seen
equipment more than fifty years old in operation in industrial processes.
Some of this equipment with vacuum tube technology has a significantly
different failure rate than solid state integrated circuit technology. When
failures from these different technology classes are mixed, the resulting
failure rate data is often too high being dominated by the older, less
reliable equipment.

In spite of their limitations, industry databases can be extremely valuable


especially when no other data source exists. If the failure rate data is too
high, the result will be a higher PFH/PFDavg. If this occurs and too much
safety integrity is designed into a safety instrumented function, that is
tolerable.
Goble05.book Page 120 Thursday, March 31, 2005 10:39 PM

120 Getting Failure Rate Data

Available Databases

One of the most popular failure rate databases is the OREDA database
(Ref. 4). OREDA stands for “Offshore Reliability Data.” This book presents
detailed statistical analysis on many types of process equipment. Many
engineers use it as a source of failure rate data to perform safety
verification calculations. It is an excellent reference for all who do data
analysis.
Other industry failure database sources include:
1. FMD-97: Failure Mode / Mechanism Distributions. Reliability
Analysis Center, 1997 (Ref. 5)

2. Guidelines for Process Equipment Reliability Data, with Data Tables.


Center for Chemical Process Safety of AIChE, 1989 (Ref. 6)

3. NPRD-95. Nonelectronic Parts Reliability Data. Reliability Analysis


Center, 1995 (Ref. 7)

4. IEEE Std. 500. IEEE Guide To The Collection and Presentation Of


Electrical, Electronic, Sensing Component, And Mechanical Equipment
Reliability Data For Nuclear-Power Generating Stations. IEEE, 1984
(Ref. 8)

Many companies have an internal expert who has studied these sources,
as well as their own internal failure records, and maintains the company
failure rate database. Some use failure data compilations found on the
Internet. While the data in industry databases is not product specific or
application specific, it does provide useful failure rate information for
specific industries (nuclear, offshore, etc.) and a comparison of the data
provides information about failure rates versus stress factors.

Failure Mode and Diagnostic Effectiveness Data

Failure rate data alone is not enough to do a good job with probabilistic
safety verification. A probability of fail-danger calculation for safety
verification purposes requires failure mode data. For each piece of
equipment, one must know the failure modes (safe versus dangerous) and
the effectiveness of any automatic diagnostics (the diagnostics coverage
factor). This information is included only in rough form if at all in industry
databases. So many engineers doing safety verification calculations
provide an educated and conservative estimate. For most electronic
equipment, the safe percentage is set to 50%. Relays have a higher
percentage of safe failures with many picking a value of 70% or 80%.
Mechanical components like solenoids might be more like 40% safe with
many failure modes causing stuck in place failures that end up being
dangerous in a safety protection application.
Goble05.book Page 121 Thursday, March 31, 2005 10:39 PM

Getting Failure Rate Data 121

Diagnostic coverage can also be estimated. If “normal” diagnostics are


available in a microprocessor based product, diagnostic coverage can be
conservatively credited to 50%. Diagnostics for mechanical devices is
usually given no credit, 0% detected failures, unless there is some special
testing like automatic partial valve stroke testing due to a smart valve
positioner.

In spite of their limitations industry databases have served an important


purpose. Using a combination of industry databases, company data and
experience, the data needed for our safety lifecycle can be estimated.
Fortunately other data sources are available also.

Product Specific Failure Data


It is clear that some are uncomfortable with the level of accuracy in the
failure data estimated from industry databases and experience. Questions
about failure rate versus stress conditions in particular applications come
up. Questions about specific products are constantly being asked
especially when one must attempt to pick a better product to achieve
higher safety.
Fortunately, several instrumentation manufacturers are providing detailed
analysis of their products to determine a more accurate set of numbers
useful for safety verification purposes. A Failure Modes Effects and
Diagnostic Analysis (FMEDA) will provide specific failure rates for each
failure mode of an instrumentation product. The percentage of failures
that are safe versus dangerous is clear and relatively precise for each
specific product. The diagnostic ability of the instrument is precisely
measured. Overall, the numbers from such an analysis are indeed product
specific and provide a much higher level of accuracy when compared to
industry database numbers and experience based estimates.

A FMEDA is done by examining each component in a product. For each


failure mode of each component, the random failure rate and effect on the
product is recorded. Will this resistor failure cause the product to fail
safely, fail dangerously or lose calibration? If the serial communication line
from the A/D to the microprocessor gets shorted, how does the product
respond? If this spring fractures does that cause a dangerous or a safe
failure? The failure rate of each component is entered according to
component failure mode and the various categories are added. The end
result is a product specific set of failure data that includes failure rates for
each failure mode, failure rates that are detected and undetected by
diagnostics, safe failure fraction calculations and often an explanation on
how to use the numbers to do safety verification calculations.

A FMEDA is sometimes done by the instrument manufacturer but


typically done by third party experts. Often a product manufacturer does
the work as part of an IEC61508 functional safety certification effort. Many
different types of instruments have had this analysis done. A listing of
Goble05.book Page 122 Thursday, March 31, 2005 10:39 PM

122 Getting Failure Rate Data

instrumentation assessment including FMEDA analysis is available on


www.exida.com/applications/sael/index.asp.

It should be emphasized that a FMEDA provides failure rates, failure


modes and diagnostic coverage effectiveness for random hardware
failures. If done properly, it does not include failure rates due to
“systematic” causes including incorrect installation, inadvertent damage,
incorrect calibration or any other human error.

A Comparison of Failure Rates


Failure rates obtained from industry databases, manufacturer FMEDA
analysis, manufacturer field failure studies, company failure records or
other sources can be compared. The results will be different as described
above.

Generally, less specific data turns out to be more conservative and that is
appropriate for safety verification purposes following the rule that “the
less one knows, the more conservative one must be.” Remember that
industry databases may include systematic failures, multiple technology
classes, wear out failures and possible multiple reports per failure. These
issues naturally cause the numbers from such sources to be high.

Table 8-1 shows a comparison of data for a pressure transmitter. The


failure rate numbers from the industry database sources are significantly
higher than the FMEDA reports.

Comprehensive Failure Data Sources


Recently some analysis organizations have compiled comprehensive
failure data source books and computer databases. The information is
formatted to give failure rate as a function of failure mode. Often
additional information about the product, such as Type A versus Type B,
is provided. Some example pages are shown in Figures 8-1, 8-2 and 8-3
(Ref. 9).

The Future of Failure Data


Although product specific FMEDA reports offer superior data sources
when compared to industry databases, they still do not account for
application specific stress conditions that may affect actual failure rates.
Ideally in the future manufacturers will be able to provide not only point
estimates of failure rates but perhaps even equations with application
specific variables to more precisely calculate the needed numbers. That
will happen if there is demand and the needed data is collected.

One effort in the right direction is the PERD (Process Equipment


Reliability Database) initiative (Ref. 10) from the Center for Chemical
Process Safety (CCPS) of the AIChE (www.aiche.org/ccps/perd/). That
Goble05.book Page 123 Thursday, March 31, 2005 10:39 PM

Table 8-1. Failure Rate Data for a Pressure Transmitter


Dangerous
Total Failure Rate % Safe Safe Cov.
Source Component (1/hr) Failures Factor (%)
Cov. Factor Range
(%)
CCPS-89 Transmitters - Differential Pressure 1.01E-06 - - - low
Transmitters - Differential Pressure 6.56E-05 - - - mean
Transmitters - Differential Pressure 2.54E-04 - - - high
NPRD-95 Transducer, Pressure 8.13E-06 - - - mean
FMEDA, 3051T Pressure Transmitter,
Rosemount exida 4.46E-07 64 100 27.5 FMEDA
FMEDA, ST3000 Pressure Transmitter,
Honeywell exida 4.90E-07 60 100 24.7 FMEDA

Getting Failure Rate Data


123
Goble05.book Page 124 Thursday, March 31, 2005 10:39 PM

124 Getting Failure Rate Data

Figure 8-1. SERH FMEDA Based Data Page (Ref. 9) (reprinted with permission of
exida)

group has defined data gathering techniques (Ref. 11) and failure
taxonomies for various types of process equipment. The important data
that must be collected for a failure event has been defined. Operating
companies from chemical, petrochemical, industrial gases and other
industries become members and are working to set up inspection and
failure reporting. They have created data collection software that members
use to report field failures to a central database. There is potential that this
Goble05.book Page 125 Thursday, March 31, 2005 10:39 PM

Getting Failure Rate Data 125

Figure 8-2. SERH Generic Failure Data – Switch (Ref. 9) (reprinted with permission of
exida)

information could someday become the best possible source of product


specific and application specific failure rate and failure mode data. We
look forward to better data with more accuracy as we move forward.
Goble05.book Page 126 Thursday, March 31, 2005 10:39 PM

126 Getting Failure Rate Data

Figure 8-3. SERH Generic Failure Data – Transmitter (Ref. 9) (reprinted with permis-
sion of exida)

Exercises
8-1. What are the key objectives and differences between FMEA and
FMEDA analysis?

8-2. Explain why the failure rate data for SIS components obtained
from field data can be considerably different from that obtained by
means of FMEDA analysis for the components.
Goble05.book Page 127 Thursday, March 31, 2005 10:39 PM

Getting Failure Rate Data 127

8-3. A valve manufacturer has issued a certificate for their valve stating
that according to IEC 61508 the Mean Time between Failures
(MTBF) for the valve was found to be 12000 years. No additional
performance data was listed on the certificate. Can this data be
used for SIL verification calculations?

8-4. The failure rate for an electronic component is 15.6 FITS. What is
the equivalent failure in units of failures per hour?

8-5. A FMEDA report has been issued for a Solenoid valve based on
IEC 61508 requirements. What parts of the solenoid valve would
have been included in the FMEDA analysis?

8-6. List some of the limitations in using industry database data for SIL
verification calculations.

8-7. Does FMEDA based failure rate data include failure rates due to
expected human error?

8-8. Based on the information in this Chapter, what is the expected


dangerous undetected failure rate for a generic DP/Pressure
transmitter?

REFERENCES AND BIBLIOGRAPHY

1. ANSI/ISA S84.01-1996. Application of Safety Instrumented Systems


for the Process Industries.

2. Gruhn, P. “Safety Systems: Peering Past the Hype.” Proceedings of


the 51st Instrumentation Symposium for the Process Industries. Texas
A&M University and ISA, 1996.

3. IEC 61508. Functional Safety of electrical / electronic / programmable


electronic safety-related systems, 2000.
4. OREDA – 97. Offshore Reliability Data. DNV Industry, 1997.

5. FMD – 97: Failure Mode / Mechanism Distributions. Reliability


Analysis Center, 1997.
6. Guidelines for Process Equipment Reliability Data. Center for
Chemical Process Safety, American Institute of Chemical
Engineers, 1989.
7. NPRD-95. Nonelectronic Parts Reliability Data. Reliability Analysis
Center, 1995.

8. IEEE Std. 500. IEEE Guide To The Collection and Presentation Of


Electrical, Electronic, Sensing Component, And Mechanical Equipment
Reliability Data For Nuclear-Power Generating Stations. IEEE, 1984.
Goble05.book Page 128 Thursday, March 31, 2005 10:39 PM

128 Getting Failure Rate Data

9. Safety Equipment Reliability Handbook. exida, 2003. (available from


ISA)

10. Arner, D. C. and W. C. Angstadt. “Where For (sic) Art Thou Failure
Rate Data.” ISA Technology Update, 2001.

11. Guidelines for Improving Plant Reliability Through Data Collection and
Analysis. Center for Chemical Process Safety, American Institute of
Chemical Engineers, 1998.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy