0% found this document useful (0 votes)
26 views

Fault Tree Analysis

- Fault Tree Analysis is a technique used to analyze potential causes of undesirable events. - It produces a visual model called a fault tree that shows the logical relationships between failures and how they could combine to cause an undesirable event. - The fault tree uses symbols like AND gates and OR gates to represent how lower level failures could cause higher level events in the system. - Fault Tree Analysis is useful for complex systems to help understand failure mechanisms and identify critical areas to focus risk reduction efforts.

Uploaded by

Aditya Bramantha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Fault Tree Analysis

- Fault Tree Analysis is a technique used to analyze potential causes of undesirable events. - It produces a visual model called a fault tree that shows the logical relationships between failures and how they could combine to cause an undesirable event. - The fault tree uses symbols like AND gates and OR gates to represent how lower level failures could cause higher level events in the system. - Fault Tree Analysis is useful for complex systems to help understand failure mechanisms and identify critical areas to focus risk reduction efforts.

Uploaded by

Aditya Bramantha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Fault Tree Analysis

Rodney J. Simmons, Ph.D., CSP


rod_simmons@me.com
Associate Professor of Health, Safety & Environmental Engineering, The Petroleum Institute, Abu Dhabi, UAE
Visiting Professor of Industrial Management, National Taiwan University of Science & Technology (2010-2011)
Visiting Professor of Safety Engineering, Hong Kong Polytechnic University (2008-2010)
Visiting Associate Professor of Safety Engineering, Tunghai University (2008-2009)
Associate Professor and Safety Program Director, Illinois State University (2001-2008)
Adjunct Associate Professor of Industrial Engineering, University of Cincinnati (1986-2007)

5th Edition, July 2011

Based on 4th Edition by P.L. Clemens (Jacobs-Sverdrup), February 2002

Topics Covered
! Fault Tree Definition
! Developing the Fault Tree
! Structural Significance of the Analysis
! Quantitative Significance of the Analysis
! Diagnostic Aids and Shortcuts
! Finding and Interpreting Cut Sets and Path Sets
! Success-Domain Counterpart Analysis
! Assembling the Fault Tree Analysis Report
! Fault Tree Analysis vs. Alternatives
! Fault Tree Shortcoming/Pitfalls/Abuses
2
First – A Bit of Background

! Origins of the technique


! Fault Tree Analysis defined
! Where best to apply the technique
! What the analysis produces
! Symbols and conventions

Origins

! Faulttree analysis was developed in 1962


for the U.S. Air Force by Bell Telephone
Laboratories for use with the Minuteman
system…was later adopted and extensively
applied by the Boeing Company…is one of
many symbolic logic analytical techniques
found in the operations research discipline.

4
The Fault Tree is

! A graphic “model” of the pathways within a


system that can lead to a foreseeable,
undesirable loss event. The pathways
interconnect contributory events and conditions,
using standard logic symbols. Numerical
probabilities of occurrence can be entered and
propagated through the model to evaluate
probability of the foreseeable, undesirable event.
! Only one of many System Safety analytical tools
and techniques.
5

Fault Tree Analysis is Best Applied to


Cases with

! Large, perceived threats of loss, i.e., high risk.


! Numerous potential contributors to a mishap.
! Complex or multi-element systems/processes.
! Already-identified undesirable events. (a must!)
! Indiscernible mishap causes (i.e., autopsies).
Caveat: Large fault trees are resource-hungry and should
not be undertaken without reasonable assurance of need.

6
Fault Tree Analysis Produces

! Graphic display of chains of events/conditions leading to the


loss event.
! Identification of those potential contributors to failure that are
“critical.”
! Improved understanding of system characteristics.
! Qualitative/quantitative insight into probability of the loss event
selected for analysis.
! Identification of resources committed to preventing failure.
! Guidance for redeploying resources to optimize control of risk.
! Documentation of analytical results.

Some Definitions
– FAULT
• An abnormal undesirable state of a system or a system element*
induced 1) by presence of an improper command or absence of a
proper one**, or 2) by a failure (see below). All failures cause
faults; not all faults are caused by failures. A system which has
been shut down by safety features has not faulted.
– FAILURE
• Loss, by a system or system element*, of functional integrity to
perform as intended, e.g., relay contacts corrode and will not pass
rated current closed, or the relay coil has burned out and will not
close the contacts when commanded – the relay has failed; a
pressure vessel bursts – the vessel fails. A protective device which
functions as intended has not failed, e.g, a blown fuse.
* System element: a subsystem, assembly, component, piece part, etc.
** This (#1) also describes a “command” failure, which is one of the features of the “state of component”
8 approach to fault tree analysis.
Definitions

– PRIMARY (OR BASIC) FAILURE


• The failed element has seen no exposure to environmental or
service stresses exceeding its ratings to perform. E.g., fatigue
failure of a relay spring within its rated lifetime; leakage of a
valve seal within its pressure rating.
– SECONDARY FAILURE
• Failure induced by exposure of the failed element to
environmental and/or service stresses exceeding its intended
ratings. E.g., the failed element has been improperly designed,
or selected, or installed, or calibrated for the application; the
failed element is overstressed/underqualified for its burden.

Assumptions and Limitations


! Non-repairable system.
! No sabotage.
! Markov…
– Fault rates are constant…! = 1/MTBF = K
– The future is independent of the past – i.e., future
states available to the system depend only upon its
present state and pathways now available to it, not
upon how it got where it is.
! Bernoulli…
– Each system element analyzed has two, mutually
exclusive states.
10
The Logic Symbols
TOP Event – forseeable, undesirable event, toward
which all fault tree logic paths flow,or
Intermediate event – describing a system state
produced by antecedent events. Most Fault Tree
“Or” Gate – produces output if any input Analyses can be
exists. Any input, individually, must be carried out using
OR only these four
(1) necessary and (2) sufficient to cause
the output event. symbols.
“And” Gate – produces output if all inputs co-exist. All inputs,
AND collectively, must be (1) necessary and (2) sufficient to cause
the output event.
Basic Event – Initiating fault/failure, not developed further. (Called
“Leaf,” “Initiator,” or “Basic.”) The Basic Event marks the limit of
resolution of the analysis.
Events and Gates are not component parts of the system being analyzed. They are
symbols representing the logic of the analysis. They are bi-modal. They function
11 flawlessly.

Steps in Fault Tree Analysis


1 Identify undesirable TOP event
3 Link contributors to TOP by
logic gates
2 Identify first-level contributors

5 Link second-level contributors


to TOP by logic gates

4 Identify second-level
contributors
Basic Event (“Leaf,” “Initiator,” or
“Basic”) indicates limit of analytical 6 Repeat/continue
resolution.
12
Some Rules and Conventions

Do use single-stem gate-


feed inputs.

NO YES

Don’t let gates feed


gates.
13

More Rules and Conventions

! Be CONSISTENT in naming fault events/conditions.


Use same name for same event/condition throughout
the analysis. (Use index numbering for large trees.)
! Say WHAT failed/faulted and HOW – e.g., “Switch Sw-
418 contacts fail closed”
! Don’t expect miracles to “save” the system. Lightning
will not recharge the battery. A large bass will not plug
the hole in the hull.

14
Some Conventions Illustrated

! MAYBE
Flat Tire – A gust of wind will come
along and correct the skid.

? – A sudden cloudburst will


extinguish the ignition
source.
– There’ll be a power outage
Air Tire
Escapes Pressure Tire when the worker’s hand
From Drops Deflates
Casing contacts the high-voltage
conductor.
Initiators must be statistically No miracles!
independent of one another.
15
Name basics consistently!

Identifying TOP Events

! Explore historical records (own and others).


! Look to energy sources.
! Identify potential mission failure contributors.
! Develop “what-if” scenarios.
! Use “shopping lists.”

16
Example TOP Events
! Wheels-up landing ! Dengue fever pandemic
! Mid-air collision ! Sting failure
! Subway derailment ! Inadvertent nuke launch
! Turbine engine FOD ! Reactor loss of cooling
! Rocket failure to ignite ! Uncommanded ignition
! Irretrievable loss of ! Inability to dewater buoyancy
primary test data tanks

TOP events represent potential high-penalty losses (i.e., high risk). Either
severity of the outcome or frequency of occurrence can produce high risk.

17

“Scope” the Tree TOP


Too Broad Improved
Computer Outage Outage of Primary Data Collection
computer, exceeding eight hours, from
external causes
Exposed Conductor Unprotected body contact with
potential greater than 40 volts
Foreign Object Ingestion Ingestion of foreign object weighing
more than 5 grams and having density
greater than 3.2 gm/cc
Jet Fuel Dispensing Leak Fuel dispensing fire resulting in loss
exceeding $2,500

“Scoping” reduces effort spent in the analysis by confining it to relevant


considerations. To “scope,” describe the level of penalty or the circumstances for
which the event becomes intolerable – use modifiers to narrow the event description.
18
Adding Contributors to the Tree
Examples:
(2) must be an INDEPENDENT*
FAULT or FAILURE CONDITION ! Electrical power fails off
(typically described by a noun, an ! Low-temp. Alarm fails off
action verb, and specifying ! Solar

q > 0.043 btu/ft2/ sec
modifiers)
! Relay K-28 contacts freeze
EFFECT closed
* At a given level, under
a given gate, each fault ! Transducer case ruptures
must be independent of ! Proc. Step 42 omitted
all others. However, the CAUSE
same fault may appear (1) EACH
at other points on the CONTRIBUTING (3) and, each element
tree. ELEMENT must be an immediate
contributor to the level
above
NOTE: As a group under an AND gate, and individually under an OR gate, contributing elements
must be both necessary and sufficient to serve as immediate cause for the output event.

19

Example Fault Tree Development

! Constructing the logic


! Spotting/correcting some
common errors
! Adding quantitative data

20
An Example Fault Tree
Late for Work Undesirable
Event

Sequence Transport Life Process and


Initiation Failures Support Misc. System
Failures Failures Malfunctions
Oversleep
Causative
Modalities*

* Partitioned aspects of system function, subdivided


21 ? as the purpose, physical arrangement, or sequence
of operation

Sequence Initiation Failures


Oversleep

No “Start”
Pulse Natural
Apathy

Artificial
Bio- Wakeup Fails
rhythm
Fails

22 ?
Verifying Logic
Oversleep

No “Start”
Does this Pulse Natural
“look” correct? Apathy
Should the
gate
be OR? Artificial
Bio-
rhythm Wakeup Fails
Fails

?
23

Test Logic in SUCCESS Domain


Oversleep Wakeup
Redraw – invert all Succeeds
statements and gates

“trigger” “motivation”

No “Start” “Start”
Pulse Failure Pulse Success Natural
Natural Works
Domain Apathy Domain High
Torque

Bio- Artificial Artificial


Bio-
Rhythm Wakeup Fails Rhythm Wakeup Works
Fails Works

? ?
24
If it was wrong here……it’ll be wrong here, too!
Artificial Wakeup Fails
Artificial
Wakeup
Fails

Alarm
Clocks
Fail Nocturnal
Deafness

Main Backup
Plug-in (Windup)
Clock Fails Clock Fails

Faulty Faulty Forget Forget


Power Innards Forget
to Mech- to to
Outage Set anism Set Wind

Electrical Mechanical
Fault Fault
What does the tree tell up about system
vulnerability at this point?
Hour Hour
Hand Hand
Falls Jams
Off Works
25

Background for Numerical Methods

! Relating PF to R
! The Bathtub Curve
! Exponential Failure Distribution
! Propagation through Gates
! PF Sources

26
Reliability and Failure Probability
Relationships
! S = Successes
! F = Failures
!
S
Reliability… R =(S+F)
! Failure Probability… PF = F
(S+F)
S + F !1
R + PF = (S+F)
(S+F)
1
! = Fault Rate = MTBF
27

Significance of PF
Fault probability is modeled acceptably
y)
M IN
" = 1 / MTBF

lit

well as a function of exposure interval (T)


O RN
ta
nt N
fa UR
or

Random
BU
UT
(In B

Failure by the exponential. For exposure intervals


that are brief (T < 0.2 MTBF), PF is
approximated within 2% by "T.
T
"0 PF # "T (within 2%, for "T $ 20%)
0 1.0
0 The Bathtub Curve t

Most system elements have fault rates (


0.63 PF = 1 – &–"T
" = 1/MTBF) that are constant ("0) over
0.5
long periods of useful life. During these % = &–"T
periods, faults occur at random times.

T
0
0 1 MTBF
Exponentially Modeled Failure Probability
28
% and PF Through Gates
OR Gate For 2 Inputs AND Gate
Either of two, independent, element Both of two, independent elements must fail to
failures produces system failure. produce system failure.
%T = %A %B R + PF ! 1 %T = %A + %B – %A % B
PF = 1 – % T PF = 1 – % T
PF = 1 – (%A %B) PF = 1 – (% A + % B – %A % B)

PF = 1 – [(1 – PA)(1 – PB)] PF = 1 – [(1 – PA) + (1 – PB) – (1 – PA)(1 – PB)]

PF = PA + PB – PA PB [Union / '] P F = PA P B [Intersection / (]

…for PA,B $ 0.2


“Rare Event
PF # PA + PB Approximation”
with error $ 11%
For 3 Inputs
PF = PA + PB + PC PF = PA PB PC
– PA PB – PA PC – PB PC Omit for
+ PA P B P C approximation

29

PF Propagation Through Gates


AND Gate… OR Gate…
TOP TOP
PT = ) Pe PT = P1 P2 PT # * Pe PT # P1+ P2
[Intersection / (] [Union / ']

1 2 1 2
P1 P2 P1 P2
1&2
are
INDEPENDENT
events.

PT = P1 P2 PT = P1 + P2 – P1 P2
Usually negligible
30
“Ipping” Gives Exact OR Gate Solutions

Failure
TOP Success
TOP Failure
TOP
PT = ? PT =
) Pe
PT =) (1 – Pe)

1 2 3 1 2 3 1 2 3
P1 P2 P3 P1 P2 P3
P1 = (1 – P1) P3 = (1 – P3)
The ip operator ( ) is the
) P2 = (1 – P2)
co-function of pi ()). It )
provides an exact solution PT = Pe= 1 – )(1 – Pe)
for propagating probabilities
through the OR gate. Its use PT = 1 – [(1 – P1) ( 1 – P2) (1 – P3 … (1 – Pn )]
is rarely justifiable.
31

More Gates and Symbols


Inclusive OR Gate…
PT = P1 + P2 – (P1 x P2)
Opens when any one or more events
occur.

Exclusive OR Gate…
PT = P1 + P2 – 2 (P1 x P2)
Opens when any one (but only one)
event occurs.

Mutually Exclusive OR Gate…


P T = P1 + P 2
Opens when any one of two or more
M events occur. All other events are then
precluded.
For all OR Gate cases, the Rare Event Approxi-
32
mation may be used for small values of Pe. PT # * Pe
Still More Gates and Symbols
Priority AND Gate
PT = P1 x P2
Opens when input events occur in
predetermined sequence.

Inhibit Gate External Event


Opens when (single) input An event normally
event occurs in presence of expected to occur.
enabling condition.

Conditioning Event
Undeveloped Event Applies conditions or
An event not further restrictions to other
developed. symbols.

33

Some Failure Probability Sources

! Manufacturer’s Data
! Industry Consensus Standards
! MIL Standards
! Historical Evidence – Same or Similar Systems
! Simulation/testing
! Delphi Estimates
! ERDA Log Average Method
34
Log Average Method*
If probability is not estimated easily, but upper and lower credible bounds can be judged…
• Estimate upper and lower credible bounds of probability for the phenomenon in question.
• Average the logarithms of the upper and lower bounds.
• The antilogarithm of the average of the logarithms of the upper and lower bounds is less
than the upper bound and greater than the lower bound by the same factor. Thus, it is
geometrically midway between the limits of estimation.

0.01 0.0 2 0.03 0.04 0.05 0.07 0.1

0.0316+
PL PU
Lower Log PL + Log PU (–2) + (–1) Upper
Probability Log Average = Antilog = Antilog = 10 –1.5
= 0.0316228 Probability
2 2 Bound 10–1
Bound 10–2

Note that, for the example shown, the arithmetic average would be…
0.01 + 0.1 = 0.055
2
i.e., 5.5 times the lower bound and 0.55 times the upper bound
* Reference: Briscoe, Glen J.; “Risk Management Guide;” System Safety Development Center; SSDC-11; DOE 76-45/11; September 1982.
35

More Failure Probability Sources

! WASH-1400 (NUREG-75/014); “Reactor Safety Study –


An Assessment of Accident Risks in US Commercial
Nuclear Power Plants;” 1975
! IEEE Standard 500
! Government-Industry Data Exchange Program (GIDEP)
! Rome Air Development Center Tables
! NUREG-0492; “Fault Tree Handbook;” (Table XI-1); 1986
! Many others, including numerous industry-specific
proprietary listings

36
Typical Component Failure Rates
Failures Per 106 Hours

Device Minimum Average Maximum

Semiconductor Diodes 0.10 1.0 10.0

Transistors 0.10 3.0 12.0

Microwave Diodes 3.0 10.0 22.0

MIL-R-11 Resistors 0.0035 0.0048 0.016

MIL-R-22097 Resistors 29.0 41.0 80.0


Rotary Electrical Motors 0.60 5.0 500.0

Connectors 0.01 0.10 10.0


37 Source: Willie Hammer, “Handbook of System and Product Safety,” Prentice Hall

Typical Human Operator Failure


Rates
Activity Error Rate
*Error of omission/item embedded in procedure 3 x 10–3
*Simple arithmetic error with self-checking 3 x 10–2
*Inspector error of operator oversight 10–1
*General rate/high stress/ dangerous activity 0.2-0.3
**Check-off provision improperly used 0.1-0.09 (0.5 avg.)
**Error of omission/10-item check-off list 0.0001-0.005 (0.001 avg.)
**Carry out plant policy/no check on operator 0.005-0.05 (0.01 avg.)
**Select wrong control/group of identical, labeled, controls 0.001-0.01 (0.003 avg.)

Sources: * WASH-1400 (NUREG-75/014); “Reactor Safety Study – An Assessment of Accident


Risks in U.S. Commercial Nuclear Power Plants,” 1975
**NUREG/CR-1278; “Handbook of Human Reliability Analysis with Emphasis on Nuclear
38 Power Plant Applications,” 1980
Some Factors Influencing Human
Operator Failure Probability
! Experience
! Stress
! Training
! Individual self discipline/conscientiousness
! Fatigue
! Perception of error consequences (…to self/others)
! Use of guides and checklists
! Realization of failure on prior attempt
! Character of Task – Complexity/Repetitiveness
39

Artificial Wakeup Fails


Artificial
Wakeup
Fails
KEY: Faults/Operation………...8. X 10–3 4.14 x 10–4
Rate, Faults/Year………. 2/1 approx. 0.1 / yr

Assume 260 operations/year Alarm


Clocks
Fail Nocturnal
Deafness
4.14 x 10–4 Negligible

Main Backup
Plug-in (Windup)
Clock Fails Clock Fails
2.03 x 10–2 2.04 x 10–2

Faulty Faulty Forget Forget


Power Innards Forget
to Mech- to to
Outage Set anism Set Wind
1.2 x 10–2 3. x 10–4 8. x 10–3 4. x10–4 8. x10–3 1.2 x 10–2
3/1 2/1 1/10 2/1 3/1

Electrical Mechanical
Fault Fault
3. x 10–4 8. x 10–8
1/15

Hour Hour
Hand Hand
Falls Jams
Off 4. x 10–4 Works 2. x 10–4
40 1/10 1/20
HOW Much PT is TOO Much?
Consider “bootstrapping” comparisons with known risks…
Human operator error (response to repetitive stimulus) # 10–2- 10–3/exp MH†
Internal combustion engine failure (spark ignition) # 10–3/exp hr†
Pneumatic instrument recorder failure # 10–4/exp hr†
Distribution transformer failure # 10–5/exp hr†
U.S. Motor vehicles fatalities # 10–6/exp MH†
Death by disease (U.S. lifetime avg.) # 10–6/exp MH
U.S. Employment fatalities # 10–7-10–8/exp MH†
Death by lightning # 10–9/exp MH*
Meteorite (>1 lb) hit on 103x 103 ft area of U.S. # 10–10/exp hr‡
Earth destroyed by extraterrestrial hit # 10–14/exp hr†
† Browning, R.L., “The Loss Rate Concept in Safety Engineering”
* National Safety Council, “Accident Facts”
‡ Kopecek, J.T., “Analytical Methods Applicable to Risk Assessment & Prevention,” Tenth International
System Safety Conference
41

Apply Scoping
What power outages are of concern?
Power
Outage Not all of them!
Only those that…
1 X 10–2
3/1 • Are undetected/uncompensated
• Occur during the hours of sleep
• Have sufficient duration to fault the system

This probability must reflect these conditions!


42
Single-Point Failure

“A failure of one independent element of


a system which causes an immediate
hazard to occur and/or causes the whole
system to fail.”
Professional Safety – March 1980

43

Some AND Gate Properties


Cost:
TOP Assume two identical elements having P = 0.1.
PT = 0.01
PT = P1 x P2 Two elements having P = 0.1 may cost much less
than one element having P = 0.01.
1 2

Freedom from single point failure:


Redundancy ensures that either 1 or 2 may fail without inducing TOP.
44
Failures at Any Analysis Level
Must Be
• Independent of each other
Don’t • True contributors to the level above Do
Mechanical Faulty
Fault Innards

Independent
Hand Hand
Hand Elect. Falls/ Gearing Other
Falls Off Jams Fault Mech.
Works Jams Fails
Works Fault

Alarm Alarm
Failure Failure

True Contributors

Alarm Toast Backup Alarm Backup


Clock Burns Clock Clock Clock
Fails Fails Fails Fails

45

Common Cause Events/Phenomena

“A Common Cause is an event or a


phenomenon which, if it occurs, will induce the
occurrence of two or more fault tree
elements.”

Oversight of Common Causes is a


frequently found fault tree flaw!

46
Common Cause Oversight –
An Example
Unannunciated
Intrusion by
Burglar

Microwave Electro- Seismic Acoustic


Optical Footfall

DETECTOR/ALARM FAILURES

Four, wholly independent alarm systems are provided to detect and


annunciate intrusion. No two of them share a common operating
principle. Redundancy appears to be absolute. The AND gate to the
TOP event seems appropriate. But, suppose the four systems share
a single source of operating power, and that source fails, and there
47
are no backup sources?

Common Cause Oversight Correction


Unannunciated
Intrusion by
Burglar

Detector/Alarm Detector/Alarm
Failure Power Failure

Microwave Basic Power Failure


Electro-Optical Emergency Power Failure
Seismic Footfall
Acoustic

Here, power source failure has been recognized as an event which, if it occurs, will
disable all four alarm systems. Power failure has been accounted for as a common
cause event, leading to the TOP event through an OR gate. OTHER COMMON
CAUSES SHOULD ALSO BE SEARCHED FOR.
48
Example Common Cause
Fault/Failure Sources
! Utility Outage ! Dust/Grit
– Electricity ! Temperature Effects
– Cooling Water (Freezing/Overheat)
– Pneumatic Pressure ! Electromagnetic
– Steam Disturbance
! Moisture ! Single Operator Oversight
! Corrosion ! Many Others
! Seismic Disturbance

49

Example Common Cause


Suppression Methods

! Separation/Isolation/Insulation/Sealing/
Shielding of System Elements.
! Using redundant elements having differing
operating principles.
! Separately powering/servicing/maintaining
redundant elements.
! Using independent operators/inspectors.

50
Missing Elements?
Unannunciated
Contributing elements Intrusion by SYSTEM
must combine to satisfy Burglar CHALLENGE
all conditions essential to
the TOP event. The logic
criteria of necessity and Detector/Alarm Intrusion By
sufficiency must be Failure Burglar
satisfied.

Detector/Alarm Detector/Alarm
System Failure Power Failure Burglar Barriers
Present Fail

Microwave Basic Power Failure


Electro-Optical
Seismic Footfall Emergency Power Failure
Acoustic

51

Example Problem – Sclerotic Scurvy


– The Astronaut’s Scourge
! BACKGROUND: Sclerotic scurvy infects 10% of all returning astronauts.
Incubation period is 13 days. For a week thereafter, victims of the
disease display symptoms which include malaise, lassitude, and a very
crabby outlook. A test can be used during the incubation period to
determine whether an astronaut has been infected. Anti-toxin
administered during the incubation period is 100% effective in preventing
the disease when administered to an infected astronaut. However, for an
uninfected astronaut, it produces disorientation, confusion, and
intensifies all undesirable personality traits for about seven days. The
test for infection produces a false positive result in 2% of all uninfected
astronauts and a false negative result in one percent of all infected
astronauts. Both treatment of an uninfected astronaut and failure to treat
an infected astronaut constitute in malpractice.
! Problem: Using the test for infection and the anti-toxin, if the test
indicates need for it, what is the probability that a returning astronaut will
be a victim of malpractice?
52
Sclerotic Scurvy Malpractice
What is the greatest
Malpractice contributor to this
probability?
0.019
Should the test be
Fail to Treat Treat
used?
Infection Needlessly
(Disease) (Side Effects)
0.001 0.018

False Infected False


Negative Healthy Positive
Astronaut Astronaut
Test Test
0.01 0.1 0.9 0.02

10% of returnees are infected – 90% are not infected

1% of infected cases test falsely negative, 2% of uninfected cases test falsely positive,
receive no treatment, succumb to disease receive treatment, succumb to side effects
53

Cut Sets

AIDS TO…

! System Diagnosis

! Reducing Vulnerability

! Linking to Success Domain

54
Cut Sets

!A CUT SET is any group of fault tree


initiators which, if all occur, will cause the
TOP event to occur.
!A MINIMAL CUT SET is a least group of
fault tree initiators which, if all occur, will
cause the TOP event to occur.

55

Finding Cut Sets


! Ignore all tree elements except the initiators (“leaves/basics”).
! Starting immediately below the TOP event, assign a unique letter to
each gate, and assign a unique number to each initiator.
! Proceeding stepwise from TOP event downward, construct a matrix
using the letters and numbers. The letter representing the TOP event
gate becomes the initial matrix entry. As the construction progresses:
! Replace the letter for each AND gate by the letter(s)/number(s) for
all gates/initiators which are its inputs. Display these horizontally, in
matrix rows.
! Replace the letter for each OR gate by the letter(s)/number(s) for all
gates/initiators which are its inputs. Display these vertically, in matrix
columns. Each newly formed OR gate replacement row must also
contain all other entries found in the original parent row.
56
Finding Cut Sets

! A final matrix results, displaying only numbers


representing initiators. Each row of this matrix is a
Boolean Indicated Cut Set.
! By inspection, eliminate any row that contains all
elements found in a lesser row. Also eliminate redundant
elements within rows and rows that duplicate other rows.
! The rows that remain are Minimal Cut Sets.

57

A Cut Set Example


! PROCEDURE: TOP
– Assign letters to gates. A
(TOP gate is “A.”) Do not
repeat letters.
– Assign numbers to basic B D
initiators. If a basic initiator
appears more than once, 1 2 4
represent it by the same
number at each C

appearance.
– Construct a matrix, starting 2 3
with the TOP “A” gate.
58
A Cut Set Example

A B D 1 D 1 D
C D 2 D 3

TOP event A is an AND gate; B is an OR gate; 1 & C is an AND gate;


gate is A, the B & D, its inputs, C, its inputs, replace it 2 & 3, its inputs,
initial matrix replace it vertically. Each replace it
entry. horizontally. requires a new row. horizontally.

These Boolean-
1 2 1 2 1 2
2 D 3 2 2 3
Indicated Cut Sets… 2 3
Minimal Cut Set
1 4 1 4 1 4 rows are least
2 4 3
D (top row), is an OR …reduce to these groups of initiators
gate; 2 & 4, its D (second row),
minimal cut sets.
inputs, replace it is an OR gate. which will induce
vertically. Each Replace as TOP.
requires a new row. before.

59

An “Equivalent” Fault Tree


Boolean
TOP Equivalent
An Equivalent Fault
Tree can be constructed Fault Tree
from Minimal Cut Sets.
For example, these
Minimal Cut Sets…

1 2 1 2 1 4 2 3

2 3
1 4
…represent this Fault Tree…
…and this Fault Tree is a Logic Equivalent of the original,
for which the Minimal Cut Sets were derived.
60
Equivalent Trees Aren’t Always
Simpler
4 gates
6 initiators This Fault Tree has this logic equivalent.

9 gates
1 2 3 4 5 6 24
TOP initiators

Minimal cut sets


1/3/5
1 3 5 1 3 6 1 4 5 1 4 6
1/3/6
1/4/5
1/4/6
2/3/5 2 3 5 2 3 6 2 4 5 2 4 6
2/3/6
2/4/5
2/4/6
61

Another Cut Set Example


TOP
! Compare this case
to the first Cut Set
A

example – note
B C
differences. TOP
gate here is OR. 1 6

In the first example, D


F

TOP gate was AND. 2 3 5

! Proceed as with first E


G

example. 3 4 4 1

62
Another Cut Set Example
Construct Matrix – make step-by-step substitutions…
A B 1 D 1 2 1 2
C F 6 F 6 3 5 G 6
1 E 1 E

Boolean-Indicated Cut Sets


Minimal Cut Sets

1 2 1 2 1 2 Note that there are four


3 5 G 6 3 5 4 6 1 3 Minimal Cut Sets. Co-
1 3 1 3 1 4 existence of all of the
1 4 1 4 3 4 5 6
3 5 1 6 initiators in any one of
them will precipitate the
TOP event.

An EQUIVALENT FAULT TREE can again be constructed…


63

Another “Equivalent” Fault Tree

These Minimal Cut Sets… 1 2


represent this Fault Tree 1 3
– a Logic Equivalent of the 1 4
original tree. 3 4 5 6

TOP

1 2 1 3 1 4 3 4 5 6

64
From Tree to Reliability Block
Diagram
Blocks represent functions of system elements.
TOP
Paths through them represent success.
A “Barring” terms ( n ) denotes
consideration of their success
properties. 3
B
2 3 4
C
5
1
1 6 4 1
F
D 6

2 3 5 TOP
G
E The tree models a system fault, in failure
domain. Let that fault be System Fails to Function
3 4 4 1 as Intended. Its opposite, System Succeeds to
function as intended, can be represented by a
Reliability Block Diagram in which success flows
through system element functions from left to right.
Any path through the block diagram, not interrupted
65 by a fault of an element, results in system success.

Cut Sets and Reliability Blocks


TOP

A
3
2 3 4
B C 5
1
4 1
1 6
D F 6

3 5 Note that
2 3/5/1/6 is a
G 1 2
E Cut Set, but
1 3 not a Minimal
3 4 4 1 Cut Set. (It
1 4
contains 1/3, a
Each Cut Set (horizontal rows in the matrix) 3 4 5 6 true Minimal
interrupts all left-to-right paths through the Minimal Cut Sets Cut Set.)
66 Reliability Block Diagram
Cut Set Uses

! Evaluating PT
! Finding Vulnerability to Common Causes
! Analyzing Common Cause Probability
! Evaluating Structural Cut Set “Importance”
! Evaluating Quantitative Cut Set “Importance”
! Evaluating Item “Importance”

67

Cut Set Uses/Evaluating PT


TOP Minimal Cut Sets
A
PT 1 2
1 3
B C 1 4
3 4 5 6
1 6
D
F Pt # * P k =
P 1 x P2 +
2 3 5
P1 x P3 +
G
E P1 x P4 +
P3 x P4 x P5 x P6
3 4 4 1
Note that propagating 1 2

Cut Set Probability (Pk), the product of probabilities through an 3 5 4 6


“unpruned” tree, i .e., using 1 3
probabilities for events within the Cut Set, is
Boolean-Indicated Cut Sets 1 4
the probability that the Cut Set being rather than minimal Cut 3 5 1 6
considered will induce TOP. Sets, would produce a
68 Pk = ) Pe = P1 x P2 x P3 x…Pn falsely high PT.
Cut Set Uses/Common Cause
Vulnerability
Uniquely subscript initiators, using letter indicators of
TOP common cause susceptibility, e.g….
A
l = location (code where)
m = moisture
h = human operator
Minimal Cut Sets
B C q = heat
f = cold 1v 2h
1v 6m v = vibration 1v 3m,v
F
…etc.
D 1v 4m
2h 3m,v 5m 3m,v 4m 5m 6m
G
E

3m,v 4m All Initiators in this Cut Set are


4m 1v
vulnerable to moisture. Moisture is a
Some Initiators may be vulnerable to several Common Common Cause and can induce TOP.
Causes and receive several corresponding subscript ADVICE: Moisture proof one or more
designators. Some may have no Common Cause items.
69
vulnerability – receive no subscripts.

Analyzing Common Cause


Probability
TOP
PT

System Common-Cause
Fault These Induced Fault
must be
OR

Analyze as …others
usual…
Moisture Human Heat
Vibration Operator

Introduce each Common Cause


identified as a “Cut Set Killer” at its
individual probability level of both
(1) occurring, and (2) inducing all
terms within the affected cut set.
70
Cut Set Structural “Importance”
Minimal Cut Sets
TOP
1 2
A
1 3
1 4
B C
3 4 5 6
1 6
F
D All other things being equal…
• A LONG Cut Set signals low
2 3 5
vulnerability
G • A SHORT Cut Set signals higher
E
vulnerability
3 4 4 1 • Presence of NUMEROUS Cut Sets
signals high vulnerability
…and a singlet cut set signals a
Potential Single-Point Failure.
Analyzing Structural Importance enables qualitative ranking of contributions to System Failure.
71

Cut Set Quantitative “Importance”


TOP
The quantitative importance of a Cut Set (Ik)
is the numerical probability that, given that
PT
A TOP has occurred, that Cut Set has
induced it.
Pk
Ik =
B C PT

…where Pk = ) Pe = P3 x P4 x P5 x P6
1 6
D
F Minimal Cut Sets
1 2
2 3 5
G 1 3
E
1 4
3 4 4 1 3 4 5 6
Analyzing Quantitative Importance enables numerical ranking of contributions to System Failure.
To reduce system vulnerability most effectively, attack Cut Sets having greater Importance.
Generally, short Cut Sets have greater Importance, long Cut Sets have lesser Importance.
72
Item ‘Importance”
The quantitative Importance of an item (Ie) is the numerical probability
that, given that TOP has occurred, that item has contributed to it.

Ne = Number of Minimal Cut Sets


Ne containing Item e
Ie # * Ike

Minimal Cut Sets Ike = Importance of the Minimal Cut Sets


containing Item e
1 2
1 3 Example – Importance of item 1…
1 4
3 4 5 6 (P1 x P2) + (P1 x P3) + (P1 x P4)
I1 #
PT
73

Path Sets

Aids to…
! Further Diagnostic Measures
! Linking to Success Domain
! Trade/Cost Studies

74
Path Sets

! A PATH SET is a group of fault tree initiators which, if


none of them occurs, will guarantee that the TOP event
cannot occur.
! TO FIND PATH SETS* change all AND gates to OR
gates and all OR gates to AND. Then proceed using
matrix construction as for Cut Sets. Path Sets will be the
result.
*This Cut Set-to-Path-Set conversion takes advantage of de Morgan’s
duality theorem. Path Sets are complements of Cut Sets.

75

A Path Set Example


TOP Path Sets are least
A groups of initiators
which, if they
B C cannot occur,
1 6
guarantee against
This Fault Tree has
D F TOP occurring
these Minimal Cut
sets 3 5 1 3
2
E
G 1 4
1 2 1 5
3 4 4 1
1 3 1 6
1 4 …and these Path Sets 2 3 4
“Barring” terms (n) denotes
3 4 5 6 consideration of their success properties
76
Path Sets and Reliability Blocks
TOP

A
3
2 3 4
B C 5

1 4 1
1 6
F 6
D

3 5
2 Each Path Set
E
G 1 3 (horizontal rows in the
1 4 matrix) represents a
3 4 4 1
1 5 left-to-right path
1 6 through the Reliability
Block Diagram.
2 3 4
77 Path Sets

Path Sets and Trade Studies


3 Path Set Probability (Pp) is the
2 3 4
Pp # * Pe probability that the system will
5
suffer a fault at one or more
1 4 1 points along the operational
route modeled by the path. To
6
minimize failure probability,
maximize path set probability.
Path Sets Pp $

a 1 3 PPa $a Sprinkle countermeasure resources amongst the


b 1 4 PPb $b Path Sets. Compute the probability decrement
$c for each newly adjusted Path Set option. Pick
c 1 5 PPc
the countermeasure ensemble(s) giving the most
d 1 6 PPd $d
favorable + Pp / + $. (Selection results can be
e 2 3 4 PPe $e verified by computing + PT/ + $ for competing
candidates.)
78
Reducing Vulnerability – A Summary
! Inspect tree – find/operate on major PT contributors…
– Add interveners/redundancy (lengthen cut sets).
– Derate components (increase robustness/reduce Pe).
– Fortify maintenance/parts replacement (increase MTBF).
! Examine/alter system architecture – increase path set/cut set ratio.
! Evaluate Cut Set Importance. Rank items using Ik.} Ik= Pk/ PT
Identify items amenable to improvement. N e

! Evaluate item importance. Rank items using Ie’


Identify items amenable to improvement.
} Ie # * Ike

Evaluate path set probability.


}P
!
Reduce PP at most favorable +P/+ $. p # * Pe

For all new countermeasures, THINK… • COST • EFFECTIVENESS • FEASIBILITY (incl. schedule)
AND
Does the new countermeasure… • Introduce new HAZARDS? • Cripple the system?

79

Some Diagnostic and


Analytical Gimmicks

! A Conceptual Probabilistic Model


! Sensitivity Testing
! Finding a PT Upper Limit
! Limit of Resolution – Shutting off Tree Growth
! State-of-Component Method
! When to Use Another Technique – FMECA

80
Some Diagnostic Gimmicks
Using a “generic” all-purpose fault tree…
TOP
PT

1 2
3 4 5

6 7 8 9

10 11 12 13 14 15

16 17 18 19 20 21

22 23 24 25 26 27 28 29

30 31 32 33 34

81

Think “Roulette Wheels”


TOP A convenient, thought-tool model of
PT probabilistic tree modeling…

1 2 3 4 5

6 7 Imagine a roulette wheel 8 representing


9
each initiator. The “peg count” ratio for
each wheel is determined by probability
10 11 12 13 14 15
for that initiator. Spin all initiator wheels
once for each system exposure interval.
16 17 Wheels18 “winning”
19 in gate-opening 20 21
combinations provide a path to the TOP.
22 26 27 28 29
23 24 25

P22 = 3 x 10–3
1,000 peg spaces
997 white 30 31
32 33 34
3 red

82
Use Sensitivity Tests
TOP Gauging the “nastiness” of
PT untrustworthy initiators…

1 2 3 4 5

Embedded within the tree, there’s a bothersome initiator with an


6 7 8 9
uncertain Pe. Perform a crude sensitivity test to obtain quick relief
from worry… or, to justify the urgency of need for more exact input
data:
10 11 12 13 14 15
1.Compute PT for a nominal value of Pe. Then, recompute PT for a
P10 = ?
new Pe = Pe ´+ "Pe. 20 21
" PT
16 17 now,
18 compute
19 ´ Pe =
the “Sensitivity” of
" Pe
If this sensitivity exceeds
~27, 0.1 in
~ a large tree, work to
22 23 24 26 28 29
25 Find a value for Pe having less uncertainty…or…
2.Compute PT for a value of Pe at its upper credible limit. Is the
corresponding PT acceptable? If not, get a better Pe.

30 31 32 33 34

83

Find a Max PT Limit Quickly


The “parts-count” approach gives a sometimes-useful early estimate of PT…
TOP
PT

1 2 3 4 5

6 7 PT cannot exceed an8 upper


9 bound given by:
PT(max) = * Pe = P1 + P2 + P3 + …Pn
10 11 12 13 14 15

16 17 18 19 20 21

22 23 24 25 26 27 28 29

30 31 32 33 34
84
How Far Down Should a Fault Tree
Grow?
TOP
Severity Probability
PT

Where do you stop the analysis? The analysis is a Risk Management enterprise. The TOP
statement gives severity. The tree analysis provides probability. ANALYZE NO FURTHER
1 2 3 4 5
DOWN THAN IS NECESSARY TO ENTER PROBABILITY DATA WITH CONFIDENCE. Is
risk acceptable? If YES, stop. If NO, use the tree to guide risk reduction. SOME
EXCEPTIONS…
6 7 8 9
1.) An event within the tree has alarmingly high probability. Dig deeper beneath it
to find the source(s) of the high probability.
2.) Mishap autopsies
10 11 12must sometimes
13 analyze
14 down to15the cotter-pin level to produce a
“credible cause” list.
16 17 18 19 20 21

? Initiators / leaves / basics define the LIMIT OF RESOLUTION of the analysis.

85

State-of-Component Method
WHEN – Analysis has proceeded to
the device level – i.e., valves,
Relay K-28
Contacts Fail pumps, switches, relays, etc.
Closed
HOW – Show device fault/failure in
the mode needed for upward
propagation.

Relay Install an OR gate.


Basic K-28
Failure/ Command
Relay Place these three events beneath the
K-28 Secondary OR.
Relay Fault Fault
K-28
This represents faults from
environmental and service
This represents internal “self” Analyze further to find the stresses for which the device is not
failures under normal source of the fault condition, qualified – e.g., component struck
environmental and service induced by by foreign object, wrong
stresses – e.g., coil burnout, presence/absence of component selection/installation.
spring failure, contacts drop external command “signals.” (Omit, if negligible.)
off… (Omit for most passive
devices – e.g., piping.)
86
The Fault Tree Analysis Report
Executive Summary (Abstract of complete report)
Scope of the analysis… Say what is analyzed
Title Brief system description and
TOP Description/Severity Bounding what is not analyzed.
Company Analysis Boundaries
Author Physical Boundaries Interfaces Treated
Date Operational Boundaries Resolution Limit
etc. Operational Phases Exposure Interval
Human Operator In/out Others…
The Analysis Show Tree as Figure.
Discussion of Method (Cite Refs.)
Software Used Include Data Sources, Cut
Presentation/Discussion of the Tree Sets, Path Sets, etc. as
Source(s) of Probability Data (If quantified) Tables.
Common Cause Search (If done)
Sensitivity Test(s) (If conducted)
Cut Sets (Structural and/or Quantitative Importance, if analyzed)
Path Sets (If analyzed)
Trade Studies (If Done)
Findings…
TOP Probability (Give Confidence Limits)
Comments on System Vulnerability
Chief Contributors
Candidate Reduction Approaches (If appropriate)
Conclusions and Recommendations…
Risk Comparisons (“Bootstrapping” data, if appropriate)
87
Is further analysis needed? By what method(s)?

FTA vs. FMECA Selection Criteria*

Preferred
Selection Characteristic
FTA FMECA
Safety of public/operating/maintenance personnel -
Small number/clearly defined TOP events -
Indistinctly defined TOP events -
Full-Mission completion critically important -
Many, potentially successful missions possible -
“All possible” failure modes are of concern -
High potential for “human error” contributions -
High potential for “software error” contributions -
Numerical “risk evaluation” needed -
Very complex system architecture/many functional parts -
Linear system architecture with little/human software influence -
System irreparable after mission starts -
*Adapted from “Fault Tree Analysis Application Guide,” Reliability Analysis Center, Rome Air Development Center.
88
Fault Tree Constraints and
Shortcomings
! Undesirable events must be foreseen and are only analyzed
singly.
! All significant contributors to fault/failure must be anticipated.
! Each fault/failure initiator must be constrained to two
conditional modes when modeled in the tree.
! Initiators at a given analysis level beneath a common gate
must be independent of each other.
! Events/conditions at any analysis level must be true,
immediate contributors to next-level events/conditions.
! Each Initiator’s failure rate must be a predictable constant.

89

Common Fault Tree Abuses

! Over-analysis – “Fault Kudzu”


! Unjustified confidence in numerical results – 6.0232 x 10–5…+/–?
! Credence in preposterously low probabilities – 1.666 x 10–24/hour
! Unpreparedness to deal with results (particularly quantitative) –
Is 4.3 x 10–7/hour acceptable for a catastrophe?
! Overlooking common causes – Will a roof leak or a shaking floor wipe
you out?
! Misapplication – Would Event Tree Analysis (or another technique)
serve better?
! Scoping changes in mid-tree
90
Fault Tree Payoffs
! Gauging/quantifying system failure probability.
! Assessing system Common Cause vulnerability.
! Optimizing resource deployment to control vulnerability.
! Guiding system reconfiguration to reduce vulnerability.
! Identifying Man Paths to disaster.
! Identifying potential single point failures.
! Supporting trade studies with differential analyses.

FAULT TREE ANALYSIS is a risk assessment enterprise. Risk Severity is


defined by the TOP event. Risk Probability is the result of the tree analysis.
91

Closing Caveats
! Be wary of the ILLUSION of SAFETY. Low probability does not mean
that a mishap won’t happen!
! THERE IS NO ABSOLUTE SAFETY! An enterprise is safe only to
the degree that its risks are tolerable!
! Apply broad confidence limits to probabilities representing human
performance!
! A large number of systems having low probabilities of failure means
that A MISHAP WILL HAPPEN – somewhere among them!
P1 + P2+ P3+ P4 + ----------Pn , 1

More…
92
Caveats
Do you REALLY have enough data to justify QUANTITATIVE ANALYSIS?
For 95% confidence…
We must have no failures in to give PF #… and R # …
Assumptions: 1,000 tests 3 x 10–3 0.997
! Stochastic System
Behavior 300 tests 10–2 0.99
! Constant System
Properties 100 tests 3 x 10–2 0.97
! Constant Service
Stresses 30 tests 10–1 0.9
! Constant
Environmental
10 tests 3 x 10–1 0.7
Stresses

Don’t drive the numbers into the ground!


93

Analyze Only to Turn Results Into


Decisions

“Perform an analysis only to reach a decision.


Do not perform an analysis if that decision can
be reached without it. It is not effective to do
so. It is a waste of resources.”
Dr. V.L. Grose
George Washington University

94
Bibliography
Selected references for further study…
! Center for Chemical Process Safety; “Guidelines for
Hazard Evaluation Procedures; 3rd Edition with
Worked Examples;” 2008 (576 pp); John Wiley & Sons
! Mannan, Sam (Ed): “Lee’s Loss Prevention in the
Process Industries, 3rd Edition;” Butterworth-Heinemann;
2004 (3680 pp – three volumes)
! Henley, Ernerst J. and Hiromitsu Kumamoto;
“Reliability Engineering and Risk Assessment;” 1981
(568 pp)
95

Additional Reading
! Clemens, P.L. and R.J. Simmons: System Safety and
Risk Management. Cincinnati, OH: National Institute for
Occupational Safety and Health, 208pp. (1998).
! Clemens, P.L. and R.J. Simmons: The Exposure Interval:
Too Often the Analysts’ Trap,” Journal of System Safety.
Vol 37, No. 1, pp 8-11, 1st Quarter, 2001.
! Clemens, P.L., Pfitzer, T.F., Simmons, R.J., Dwyer, S.,
Frost, J. and E. Olson (2005) The RAC Matrix: A
Universal Tool or a Toolkit? Journal of System Safety.
Vol. 41, No. 2, pp 14-19, 41-42, March/April 2005.

96

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy