Engineering Reliability - Richard E. Barlow
Engineering Reliability - Richard E. Barlow
Reliability
ASA-SIAM Series on
Statistics and Applied Probability
SIAM
T he ASA-SIAM Series on Statistics and Applied Probability is published jointly by
the American Statistical Association and the Society for Industrial and Applied Mathematics. The
series consists of a broad spectrum of books on topics in statistics and applied probability. The
purpose of the series is to provide inexpensive, quality publications of interest to the intersecting
membership of the two societies.
Editorial Board
Donald P. Gaver Andrew Solow
Naval Postgraduate School, Editor-in-Chief Woods Hole Oceanographic Institution
Alan F. Karr Werner Stuetzle
National Institute of Statistical Sciences University of Washington
John Lehoczky Grace Wahba
Carnegie Mellon University University of Wisconsin
Robert L. Mason Eric Ziegel
Southwest Research Institute Amoco Corporation
Robert Rodriguez
SAS Institute
Czitrom, V. and Spagon, P. D., Statistical Case Studies for Industrial Process Improvement
Engineering
Reliability
Richard E. Barlow
University of California, Berkeley
Berkeley, California
ASA
Society for Industrial and Applied Mathematics American Statistical Association
Philadelphia, Pennsylvania Alexandria. Virginia
1998 by the American Statistical Association and the Society for Industrial and Applied
Mathematics.
10 9 8 7 6 5 4 3 2 1
All rights reserved. Printed in the United States of America. No part of this book may be
reproduced, stored, or transmitted in any manner without the written permission of the
publisher. F'or information, write to the Society for Industrial and Applied Mathematics, 3600
University City Science Center, Philadelphia, PA 19104-2688.
Barlow, Richard E.
Engineering reliability / Richard E. Barlow.
p. cm. — (ASA-SIAM series on statistics and applied
probability)
Includes bibliographical references and index.
ISBN 0-89871-405-2 (pbk.)
1. Reliability (Engineering). I. Title. II. Series.
TA169.B36 1998
620'.004'52-dc2l 97-44019
•
is a registered trademark.
This work is dedicated to my wife. Barbara.
Contents
Preface xi
Acknowledgments xiii
Introduction xv
vii
viii CONTENTS
C h a p t e r 6: N e t w o r k Reliability 105
6.l Network Reliability Analysis: The Factoring Algorithm 105
6.2 General Methods for System Reliability Calculation 110
6.3 Analysis of Systems with Two Modes of Failure: Switching and
Monitoring Systems 115
6.4 Notes and References 119
References 191
Index 195
xi
xii PREFACE
Richard E. Barlow
Berkeley, CA
Acknowledgments
XV
xvi INTRODUCTION
where N is the number of experiments and NA is the number of times the event
A occurs. Note that this definition requires repetition. There are at least two
snags to this idea, namely, the following.
(1) No event is exactly repeatable. At the very least, the occurrence times
are different.
(2) Our probability may change as our knowledge changes.
*Feynman notes in Vol. I, p. 6-2 another rather "subjective" aspect of his definition of
probability.
INTRODUCTION xvii
T h e reason for the name is that we will use convex combinations of probability
weights in computing expectations.
(2) A d d i t i o n law. If E1 and E2 are any two mutually exclusive events, then
This too can be extended to include any n events. T h e event to the right of the
vertical bar is either given or assumed.
When writing P(E1 | E2), we mean the probability of E1 were E2 known. It
is important to use the subjunctive tense to assess the conditional probability,
since it is not necessary that E2 be known. Also it is important to emphasize
t h a t P(• | •) is a probability function of the first argument but not the second.
Bayes' Formula.
This is Bayes' formula in its most simple form. If event E\ is observed and we
wish t o calculate t h e conditional probability of an event E2 t h a t is unknown, then
we can use the above formula to make this inference. P(E2) in the formula is
our prior probability for the event E2. P(E1 | E2) is the conditional probability
of E1 conditional on E2. However, since we suppose t h a t E\ has been observed,
P(E1 | E2) is called t h e likelihood of E2 given the d a t a E\. It is important to
r e m a r k t h a t the likelihood is not a probability function of its second argument E2
b u t simply a nonnegative real valued function. P{E2 | E1) is called the posterior
probability of E2 conditional on E1.
xviii INTRODUCTION
(2)
for some density p(•) (or, more generally, a distribution). Bayesians would inter-
pret p(•) as a prior for . The parameter could be thought of as a "propensity"
or "chance" and is related to the long-run ratio of occurrences to trials if such
an infinite sequence of trials could be actually carried out. Given data, namely,
x occurrences in N trials, the conditional probability of an additional k occur-
rences in an additional n trials looks like (2) with p(7r) replaced by the posterior
density for 7r, namely, calculated via Bayes' formula using densities.
INTRODUCTION xix
Now suppose a light source is used to "see" each electron as it leaves the
original slit and is used to determine which of the two slits to the right that
it "goes through." In this case, when each electron event is observed at x, the
addition law does hold! These are experimental results that we are describing.
When we do not observe each electron's trajectory but only count frequencies
in a kind of objective manner, the laws of probability annunciated above do
not hold. When there is included in the experiment an observer who actually
observes each electron, his frequencies at positions x now satisfy the addition
law of probability. When we look, the distribution is different from that when
we don't look!
We may adopt a frequency as our probability, but it is a probability only after
we adopt it. Probability must be considered a degree of belief relative to some
observer or analyst even when we have large numbers of apparently repetitive
events. However, a consensus on probabilities is sometimes possible, and that is
usually what we desire. What is called "objectivity" is perhaps better described
as "consensus."
PART I
be this average lifetime. Let xN = (x1, x2,..., xN) and suppose we are indif-
ferent to vectors of size N of such possible lifetimes conditional on ; i.e.. our
generic joint probability function p for such vectors will satisfy
p(xN) =p(yN)
3
ENGINEERING RELIABILITY
in the sense that, conditional on 0. our joint probability function p(•) satisfies
p(xN) = p(yN)
when Then
constant
(1.1.3)
for
Now consider and make the
change of variable
so that
becomes
6 ENGINEERING RELIABILITY
can be calculated and used to make decisions concerning inspection intervals and
planned maintenance. The time horizon can be specified in terms of N equal
time units. The probability density for s is
THE FINITE POPULATION EXPONENTIAL MODEL 7
Exercises
1.1.1. Using computer graphics and (1.1.1), plot
versus for N = 2,5, and 10 on the same graph. Let = 10
in all cases.
1.1.2. Let X 1 , X 2 , . . . , X n be random quantities having joint probability
density conditional on given by (1.1.1).
(a) Show that X1 and X2 have the same univariate density.
(b) Calculate the conditional mean and variance of the random quantity X1;
i.e., and Var
1.1.3. Using the definition of Euler's V prove (1.1.2).
1.1.4*. Fix and let Use the joint density given
by (1.1.1) to derive the density for .
Show that
1.2. Likelihood.
The conditional density
where indicates "proportional to," meaning that the right-hand side is missing
a factor, not dependent on 6, required to make it a probability density. Since
the denominator in equation (1.2.2) does not depend on 6, it can be ignored
for most applications where data have been observed. However, in the case
of experimental designs, where data have not yet been observed, we would be
interested in the exact expression with the denominator intact.
for N — 2,5, and 10 when 0 = 10. Note that when N = 2, the conditional
density is the uniform density on [0,20], while for N = 10 the density is already
approaching the limiting exponential form although it is only positive for the
interval [0,100].
The likelihood for in the limiting case is
(1.2.3)
In the limiting form, the n lifetimes, before they are observed, are conditionally
independent given 9. This is not true in the finite population case.
population case,
(1.2.5)
is called the maximum likelihood estimator (MLE) for 9 and is the mode for
the posterior density
Exercises
1.2.1. Show that the MLE for in the finite population case is =
1.2.2. Suppose there are only TV = 10 special switches produced for a par-
ticular space probe. Also suppose n = 2 have been tested and have failed after
X\ = 3.5 weeks and x 2 = 4.2 weeks, respectively.
(a) What is 9 in this case?
(b) Graph the likelihood as a function of 9 when 9 — N
(c) Using a flat prior for 9 (i.e., p(9) = constant), graph the posterior density
for given the data.
1.3. Total time on test (TTT) for the finite population exponential
model.
Failures and survivors. Often failure data will consist of both item lifetimes
and survival times for items that have not yet failed. Since survival times also
constitute information, we would like to make use of them. To do this we need to
calculate the appropriate likelihood function. This can be done using the follow-
ing result letting X1,X2,..., XN be the unknown (random) lifetime quantities
corresponding to N labelled items.
THEOREM 1.3.1. Let N be the population size. Fix > 0 and assume that
we are indifferent to points on the manifold
in the sense that, conditional on , our joint probability function p(•) satisfies
10 ENGINEERING RELIABILITY
are the first r observed failure ages while n — r devices are still operating at age
x ( r ) . (These are the first r order statistics.)
We require the likelihood for with respect to the finite population expo-
nential model . In this case it is unnecessary
to calculate the exact conditional probability for the event: r items fail at ages
and n — r survive to age x ( r ) given . It
is enough to assume that the first r labelled devices failed at ages x 1 , x 2 , . . . , x r
and the remaining n — r devices survived to age xr and calculate the condi-
tional probability of this event. To obtain the exact conditional probability with
respect to we need only multiply this calculation by
where there are r terms 1! in the denominator. However, since this
factor does not involve 8, it will cancel when we compute the posterior.
Using (1.3.1) we take the r-fold partial derivative with respect to
letting
THE FINITE POPULATION EXPONENTIAL MODEL 11
(r,T(x(r))),
Exercises
1.3.1. Derive the MLE
in equation (1.3.4).
12 ENGINEERING RELIABILITY
1.3.2. Graph
where
1.3.4. Using (1.1.1) calculate the exact joint probability density for the order
statistics from a sample of size n given a population of
size N and conditional on
1.3.5*. Show that (1.3.1) still holds when n = N; i.e., show that
1.4. Exchangeability.
In Theorem 1.3.1, we derived the joint density of random quantities under the
indifference judgment relative to sums. Obviously in calculating a sum, the order
in which we do the sum is immaterial. The joint density in this case is said to
be exchangeable.
Let N items or devices be labelled {1,2,..., N}. Let X 1 , X 2 , . . . , XN be po-
tential measurements on these items with respect to some quantity of interest.
Then the items are said to be exchangeable with respect to this quantity of interest
THE FINITE POPULATION EXPONENTIAL MODEL 13
if in our judgment
(1.4.1)
for our generic joint survival probability function P(•) for any permutation
of the integers { 1 , 2 , . . . , N } and all vectors x N . In other words,
in our judgment items are similar, and so order does not count in the joint
probability function.
Notice that it is the items that are actually exchangeable and only with re-
spect to a specified quantity of interest such as lifetime. If we were, for example,
interested in lifetimes, the colors of the items might not be relevant so far as ex-
changeability with respect to lifetime is concerned. Also, the measurements are
potential. Once the measurements are known, the items are no longer necessarily
exchangeable. Although exchangeability is a probabilistic expression of a certain
degree of ignorance, it is a rather strong judgment and in the case of 0-1 ran-
dom quantities implies a joint probability function related to the hypergeometric-
distribution as we show in Chapter 3, section 3.1.
The judgment of exchangeability implies that univariate, bivariate, etc. dis-
tributions are all the same. It does not imply that exchangeable random quanti-
ties are necessarily independent. However, iid random quantities are exchange-
able.
Exercises
1.4.1. In the example concerning dike deterioration in section 1.1, what are
the exchangeable "items," and with respect to what are they exchangeable?
14 ENGINEERING RELIABILITY
for some density p(•) (or, more generally, a distribution). Bayesians would in-
terpret p(•) as a prior for . This was proved by Bruno de Finetti (1937).
Show that for infinite exchangeability for any i,j pair
in the sequence. Cov(•, •) means covariance of the quantities in question. Hence
random quantities in an infinite exchangeable sequence are positively correlated
unless the sequence is iid, in which case
Hint. You may want to use the following formula with respect to three related
random quantities, X, Y, and Z:
in his 1996 Ph.D. thesis, Optimal Maintenance Decisions for Hydraulic Struc-
tures under Isotropic Deterioration applied many of the ideas of Mendel's thesis
to maintenance problems involving dikes. Generic joint probability densities are
isotropic if
p(x1,x2,..., xN ) = p(y1,y2,..., yN )
Notation
vector elements or arguments
(not random quantities)
order statistics based on
x1,x2,..., xN (i.e., ordered
observations)
boldface denotes a vector;
the vector has size N
random quantities (also called
random variables)
probability of event E
the conditional probability of the
random quantity X given the
random quantity 8; this is an abuse
of notation wherein the arguments
tell us which random quantities
are being considered
a joint probability density function
parameters, usually = population
mean and
MLE of
16 ENGINEERING RELIABILITY
mean = variance =
CHAPTER 2
(2.1.1)
17
18 ENGINEERING RELIABILITY
To illustrate, we will use the so-called natural conjugate prior density for 9,
namely,
(2.1.2)
for a, b > 0. This is called the inverted gamma density, since, if is a random
quantity with density then
has a gamma density. We call A the failure rate. It only applies to the infinite
population model. In this case,
where only r of n possible lifetimes have been observed while n — r items survive
the interval (0, t]. Observation stops at time t. The likelihood L(9) is propor-
tional to
a+r and b + T.
LIFETIME DATA ANALYSIS 19
FIG. 2.1.1. Posterior mean, posterior standard deviation, and posterior coefficient of
variation as a function of elapsed test time.
For the inverted gamma posterior density in which the parameters are a + r (in
place of a) and b + T (in place of 6), the corresponding mean takes the form
occurs, the posterior mean, the posterior standard deviation, and the coefficient
of variation decrease.
In Figure 2.1.2 we have plotted the posterior density for 8 at selected times
during the life test. The posterior density for t = 0 is, of course, the prior
density. Notice the shape of the posterior density at t = 1- (i.e., just before the
first observed failure) and at t = 1 (i.e., just after the first failure).
Table 2.1.1 summarizes the properties of the natural conjugate prior density
and of the corresponding posterior density for the two possible parametrizations
of the exponential model.
TABLE 2.1.1
is Pareto(a,b); i.e.,
for x > 0. Having observed x 1 , p(x | x1) is Pareto(a + 1,b+ x1) so that p(x | x1)
is different from p(x).
Suppose now that the lifetime X of an item is judged exponential and is
observed to survive to time t. Then
so that X and X — t conditional on survival to time t do not have the same distri-
bution. Equation (2.1.5) is true for arbitrary nondegenerate prior distributions
for 0 [Barlow and Proschan (1985)]. It is obvious in the case of the IVG(a,b)
prior for 6.
The result given by (2.1.4) is called the memoryless property of the expo-
nential model. It is only valid conditional on 0. It is not true for the finite
population exponential model derived in Chapter 1.
Exercises
2.1.1. Show that if 8 has the inverted gamma density
2.2. T T T plots.
In the introduction we briefly discussed the frequency interpretation of proba-
bility. In this interpretation, lifetimes of items are considered to be outcomes of
repetitive trials generated by some "probability mechanism." This is similar to
the numbers that would be generated by Monte Carlo trials on a computer us-
ing a random number generator. This is the interpretation of probability which
resulted from mathematical studies of gambling.
In his fundamental paper on probability, "Foresight: Its Logical Laws, Its
Subjective Sources" (1937), Bruno de Finetti found the connection between the
subjective concept of probability and the frequentist concept of probability. The
connection is based on the concept of exchangeable random quantities. Random
quantities X1, X2,...,Xn are said to be exchangeable if and only if their joint
probability measure is invariant relative to their order. That is, the joint cumu-
lative distribution of lifetimes is invariant under permutations of the arguments;
i.e., It follows from this judgment
that all univariate (and multivariate) marginal distributions are the same. Ex-
changeability does not imply independence. However, iid random quantities are
exchangeable.
In 1937 de Finetti showed that an infinite sequence of exchangeable ran-
dom quantities X1, X 2 , . . . , Xn,... are independent, conditional on the common
univariate marginal distribution F. In introducing the concept of exchangeabil-
ity, de Finetti proves both the weak and strong laws of large numbers under
the judgment of exchangeabihty and finite first and second cross moments of
the exchangeable random quantities. These "laws" are valid conditional on the
univariate marginal distribution F. This is the mathematical result which "jus-
tifies," to a very limited extent, the T T T plotting method for analyzing life data
24 ENGINEERING RELIABILITY
TABLE 2.2.1
Jet engine corrective maintenance removal times.
Corrective
maintenance
removal times
ABC123 389
ABC124 588
ABC 133 840
ABC134 853
ABC135 868
ABC137 931
ABC139 997
ABC141 1060
ABC146 1200
ABC147 1258
ABC148 1337
ABC149 1411
E x a m p l e 2.2.1. J e t engine life. The turbofan jet engine entered service over
20 years ago as the modern means of propulsion for commercial aircraft. It has
provided a very economical and reliable way of transporting cargo and people
throughout the world. Maintenance on such jet engines is determined partly
by the operating time or the "time-on-wing" (TOW) measurement. If removal
of an engine requires penetration into a module (standardized breakdown of
the engine into workable sections), then it is classified as a shop visit. Such
corrective maintenance removals will be considered failure times in our analysis.
The following data on n = 12 corrective maintenance removal times in operating
hours was recorded in a recent paper on the subject. See Table 2.2.1.
The question we wish to answer is the following: Were we to observe a very
large population of removal times exchangeable with these observations, would
the empirical cumulative life distribution based on such a large population look
like the exponential cumulative distribution? Obviously a definitive answer based
on only 12 observations is not possible. However, there is a plotting technique
that can at least give us a clue to the answer.
multiplying all observations by the same constant would not change the plot.
Hence we can only "identify" a probability distribution up to a scale parameter.
It is important to emphasize that the plotting method depends on the a priori
judgment that items are exchangeable with respect to lifetime.
Consider the corrective maintenance removal data for jet engines in Ta-
ble 2.2.1. In that case n = 12. The ordered removal times correspond to the ages
(measured in TOW hours) at which the jet engines were removed for corrective
maintenance. To plot the sample data, first order the observed failure times in
increasing order. Let be the n ordered failure
ages. Let n(u) be the number of items "under test" at age u. i.e.. not in need of
corrective removal. Then
is the TTT to age t (actually T(t) is piecewise linear since n(u) is a step function).
If then
(2.2.1)
Plot
T T T Plot n = 12
Series 1
Series2
FIG. 2.2.1. Scaled TTT plot of engine data (Series I) and exponential transform (Series 2).
TABLE 2.2.2
Ordered random quantities distributed as
1 0.001
2 0.17
3 0.25
4 0.41
5 0.44
6 0.71
7 0.78
8 0.97
9 1.26
10 1.44
11 1.65
12 1.78
• Series1
• Series2
FlG. 2.2.2. Scaled TTT plot for exponential random quantities (Series 1) and exponential
transform (Series 2).
(2.2.2)
(2.2.3)
Comparison of Figures 2.2.1 and 2.2.3 suggests that perhaps a Weibull dis-
tribution could be used as a conditional probability model to analyze the data
in Table 2.2.1. The shape parameter could also be inferred but not the scale
parameter , since the plot is scale invariant.
the sum of all observed complete and incomplete lifetimes, or the TTT. This is
also a very useful statistic for more general models.
Now suppose that the life distribution F ( F ( 0 - ) = 0) is arbitrary. Another
interpretation of T(t) can be given in terms of the empirical distribution
(2.2.5)
(2.2.G)
The last equality follows from the expression for the TTT (2.2.4).
Assuming infinite exchangeability and conditional on F,
uniformly in We call
30 ENGINEERING RELIABILITY
i.e., G is transformed into the 45° line on [0,1]. We call the scaled TTT
transform.
An important property of the transform is that
(2.2.8)
where
is the failure rate function at x. It follows from (2.2.8) that if r(•) is increasing,
then is concave in .
Using (2.2.8) it is easy to see that HF determines F if F is absolutely con-
tinuous. Since absolutely continuous F's are dense in the class of all failure
distributions, we see that HF determines F in general.
(2.2.9)
Di = X(i) - X(i-1)
(2.2.10)
Note that the ith factor (n - i + 1) e-(n - i + 1)d, represents the marginal density
of Di obtained in (a) above. Since the joint density of D1 Dn factors into
the product of the individual densities of the D1. then the quantitiesD1....................Dn
are mutually independent. D
It follows from Theorem 2.2.1 that since the normalized spacings
Z = Y1 + .--- + Yr
has density
Y1 +--- + Yk = x,
Yk+1 = z—x, and the range [0. z] of integration corresponds to the set of mutually
exclusive and exhaustive outcomes for Y1 +--- + Yk. Hence
Thus (2.2.11) holds for n = k + 1 . By induction it follows that the theorem holds
for n = 1,2,....
THEOREM 2.2.3. Let
ordered random quantities from the distribution
and
Then
Exercises
2.2.1. Show that if the random quantity X has distribution
1 — e-x for then has a uniform distribution on [0. 1].
2.2.2. Generate n = 20 random exponentially distributed random quan-
tities using a random number generator with a computer using the result in
Exercise 2.2.1. Construct a T T T data plot using the 20 numbers generated and
count the number of crossings with respect to the 45° line.
2.2.3. Construct a T T T data plot for the lifetimes of Kevlar 49/epoxy
strands given in Table 2.2.3. The strands were put on life test at the 80% stress
level, i.e., at 80% of mean rupture strength. Does the TTT plot suggest that
an exponential model might be appropriate? If so, what calculations would you
perform next?
34 ENGINEERING RELIABILITY
TABLE 2.2.3
Lifetimes of Kevlar 49/epoxy strands loaded at 80% of mean rupture strength.
i.e., the probability that the first point on the T T T plot lies above the 45° line.
What is
2.2.6. State and prove Theorems 2.2.1 and 2.2.2 for the case
i.e., is not necessarily 1.
LIFETIME DATA ANALYSIS 35
TABLE 2.2.4
Breaking load strength of individual Kevlar 49/epoxy strands.
with density
(2.3.2)
(2.3.3)
for x > 0 and 0 otherwise. When , the failure rate is constant, and the
Weibull distribution reduces to the exponential distribution. For a > 1 (a < 1)
the failure rate is increasing (decreasing). The mean of the Weibull distribution
is
(2.3.4)
(2.3.5)
Sometimes three parameters are used where the third parameter is the smallest
possible judged value of X.
where the new Weibull parameters are a n d . The connection with data
analysis, however, is unclear!
Gnedenko (1943) showed that there are only three possible limiting distri-
butions for n independent minima when the (arbitrary) survival distribution is
normalized as
LIFETIME DATA ANALYSIS 37
TABLE 2.3.1
A 1 in the third column indicates a service removal while a 0 indicates survival and no
need for removal.
where and are suitable norming constants depending on n and the par-
ticular F. One of these distributions (the only one with positive support) is the
Weibull distribution. Again, it is not clear what, if any. connection exists with
inferential data analysis.
the parameters. This has far-reaching consequences for the way in which data
is analyzed, not only with respect to the Weibull distribution. But the Weibull
distribution by its form makes the likelihood particularly easy to compute. How-
ever, before we calculate the likelihood for the type of data in Table 2.3.1, it will
be convenient to calculate the likelihood for general failure rate functions
The likelihood for general failure rate functions. Both the life distribu-
tion F and the density / can be represented in terms of the failure rate function
To see this note that
so that
and
while
The following result will be very useful in computing the likelihood in the Weibull
case.
THEOREM 2.3.1. Given the failure rate function r(x), conditionally inde-
pendent observations are made. Let x1, x2, • • • ,xk denote k observed failure ages
(in Table 2.3.1, k = 12) and the m survival ages (in Table 2.3.1,
m = 17) The notation denotes loss, as lost from further observation. Let n(u)
denote the number of items or devices under observation at age and
let r(u) denote the failure rate function of the item at age u. Then the likelihood
corresponding to the failure rate function considered as a continuous parameter,
having observed data, is given by
(2.3.8)
Proof. To justify this likelihood expression, we first note that the underlying
random events are the ages at failure or withdrawal. Thus the likelihood of the
LIFETIME DATA ANALYSIS 39
observed outcome is specified by the likelihood of the failure ages and survival
ages until withdrawal.
To calculate the likelihood, we use the fact that given r(•).
is contributed to the likelihood. Thus, if no items fail during the test (i.e., k = 0),
the likelihood of the observed outcome is proportional to the expression given
for k = 0.
On the other hand, if an item is observed from age 0 until it fails at age xs
a factor
where the first sum is taken over items that failed while the second sum is taken
over items that were withdrawn. The upper limit is for simplicity and
introduces no technical difficulty, since n(u) = 0 after observation ends. D
The likelihood we have calculated applies for any absolutely continuous
life distribution. In the special case of an exponential life distribution model
the likelihood of the observed outcome takes the simpler form
(2.3.9)
40 ENGINEERING RELIABILITY
Back to the Weibull example. Although failures and losses will usually be
recorded in calendar time, for the purpose of analysis we relate all ages from the
"birth" date of the item or the time at which the item was put into operation.
Let x1, x2, • • • ,xk denote the unordered observed failure ages and n(u) the
number of devices surviving to age u. (Note that since observation must stop at
some finite time, n(u) = 0 for u sufficiently large. Furthermore, n(u) is a step
function.) Using Theorem 2.3.1 we can compute the likelihood for the Weibull
distribution, namely,
(2.3.10)
(2.3.11)
Two important deductions can be made from this.
(1) From the likelihood (2.3.10) we can see that the only sufficient statistic
for all parameters in the case of Weibull distributions is the entire data set. In
other words there is no lower-dimensional statistical summary of the data.
(2) No natural conjugate family of priors is available for all parameters. Con-
sequently, the posterior distribution must be computed using numerical integra-
tion.
Using three-dimensional graphics, contour plots for can be made cor-
responding to selected probability regions. Often .99, .95, .50 probability regions
are used although there is no special reason for this selection.
when is specified.
To determine accuracy of the MLE for A we plot the ratio of the likelihood
divided by the likelihood evaluated at A: i.e..
for in a suitably chosen interval about A. This is the modified posterior for
A corresponding to a fiat prior.
Exercises
2.3.1. Let X have a Weibull distribution with parameters .
(a) Show that the distribution of given and are exponential.
(b) What is the mean and variance of given ?
2.3.2. Assume = 3. Use the data in Table 2.3.1 to calculate the NILE .
Graph
to assess the accuracy of What is the MLE for the mean of the Weibull
distribution?
2.3.3. Show that (2.3.4) and (2.3.5) are the mean and variance, respectively,
of the Weibull distribution.
2.3.4. Why is the entire data set the only sufficient statistic for both param-
eters A and a in the case of the Weibull distribution?
42 ENGINEERING RELIABILITY
2.4. N o t e s a n d references.
Section 2 . 1 . The material on the influence of failures on the posterior den-
sity was taken from the 1985 paper, "Inference for the Exponential Life Distribu-
tion" by Barlow and Proschan. The appendix to that paper contains interesting
proofs of similar results for the exponential model and arbitrary prior distribu-
tions for 0.
Section 2.2. T T T plots were first described in a 1975 paper by Barlow
and Campo, "Total Time on Test Processes and Applications to Failure Data
Analysis." A 1977 paper by Bo Bergman derived the distribution for the number
of crossings of the T T T plot in the case of the exponential distribution model.
Further properties of the T T T transform were described in the 1979 paper by
Barlow, "Geometry of the Total Time on Test Transform."
The T T T plotting technique fails when data is incomplete. In the complete
data case, the argument for the technique rests on the assumption of exchange-
ability, a stochastic comparison with the exponential case and convergence for
infinite populations. The argument is not inductive.
Section 2.3. The basis for the Weibull analysis presented here is the likeli-
hood principle [cf. Basu (1988)].
The likelihood principle: If two experiments E1 and E2 resulting in corre-
sponding data sets x1 and x2 generate equivalent likelihood functions for the
parameter of interest, then the information provided by E1 is equal to the infor-
mation provided by E2.
The likelihood functions are equivalent if they are equal up to a constant fac-
tor independent of the parameter of interest. Information is anything concerning
the parameter of interest that changes belief, expressed as a probability, about
the parameter.
LIFETIME DATA ANALYSIS 43
The likelihood for general failure rate functions was described in the paper by
Barlow and Proschan (1988), "Life Distribution Models and Incomplete Data."
Notation
gamma density
Pareto density
TTT to time t
TTT transform
Often, the only failure information available for a particular item or device is the
number of defectives in a given sample or the number that failed in a given time
interval. In section 3.1 we derive conditional probability models for the number
of defectives relative to both finite and infinite populations. In sections 3.2. 3.3.
and 3.4 we consider conceptually infinite populations. In section 3.4 the number
of failures may be observed in finite time intervals of length and relative to a
finite time horizon T.
45
46 ENGINEERING RELIABILITY
is in fact a fairly strong judgment about items before they are tested. Of course,
after testing, the items would no longer be considered exchangeable.
Let X N = (x1, x2, • • • ,xN) and the parameter of interest be
(3.1.1)
the number of defective items in the lot. Using this notation, p(x N ) is invariant
under permutations of the coordinates; i.e.,
(3.1.2)
(3.1.4)
COUNTING THE NUMBER OF FAILURES 47
To motivate the beta-binomial prior for S, suppose we fix N,S,s and let S*
be the number defective in N* > N. See Figure 3.1.3. Now suppose S*,N*
in such a manner that . Then the number defective in N, namely, S,
is binomially distributed with parameters {N,p). If we use the beta prior for p,
namely, Be(A, B), then the unconditional probability distribution for S given N
PN,A,B(S)
will be the beta-binomial probability function given by (3.1.6) with N replacing
n and S replacing s.
In the same way, S - s given N, n, and p will be binomial with parameters
(N—n, p). Having observed s defectives in a sample of size n, we update the prior
for p to Be(A + s,B + n — s). Using this updated prior for p and the binomial for
S—s with parameters (N—n,p), we obtain the posterior probability distribution
for S—s given n, s as in (3.1.7), namely, the Bb(N—n, A+s, B+n—s) probability
distribution.
Failure diagnosis. Off-shore oil pipelines on the seabed corrode with time
and need to be tested for significant internal corrosion. This can be done using
COUNTING T H E NUMBER O F FAILURES 49
TABLE 3.1.1
Pipe corrosion test data.
an instrument called a "pig" which is pushed through the pipe. The accuracy of
the pig is to be determined and the probability calculated that a specified pipe
section is significantly corroded when this is indicated by the pig.
Let
if there is significant pipe corrosion,
otherwise
and
if the test indicates significant corrosion,
otherwise.
We know that the test, although not perfect, depends on the state of the pipe
section. Hence we want to reverse the arrow in Figure 3.1.4. But first, suppose
we implement a planned experiment to determine the accuracy of the pig test.
Suppose m pipe sections known to be corroded are tested and the test indicates
that x = 60 of the m are corroded. Suppose also that n pipe sections known to
be uncorroded are tested and the test indicates that y = 141 are uncorroded.
Table 3.1.1 summarizes the experiment outcome.
Let denote the proportion of times in a very large experiment that a
corroded pipe section is correctly detected. Let
Likewise, let
| pipe section is not corroded].
In engineering parlance, 1 — is called the probability of a "false negative,"
while 1 - is called the probability of a "false positive." In biometry, is
called the "sensitivity" of the test, while is called the "specificity" of the test.
We will extend Figure 3.1.4 to include this additional information; see Figure
3.1.5.
Suppose that initially and also . Reversing
the top arrows in Figure 3.1.5, and
Figure 3.1.5 is now transformed into Figure 3.1.6. Suppose that a priori we
judge
Reversing the arc from 6 to t and using Bayes' formula we finally calculate
The test has increased the probability of corrosion from 0.10 to 0.40 if t = 1.
Comments. In the 0-1 case, probability invariance with respect to permuta-
tions of vector components implies probability invariance with respect to sums
COUNTING THE NUMBER OF FAILURES 51
of vector components and vice versa. In Chapter 1 this was not true. Although
probability invariance with respect to sums does imply probability invariance
with respect to permutations of vector components, the converse is false in
general.
Exercises
3.1.1. Failure diagnosis. Pipes on the sea floor are tested for significant
internal corrosion using a magnetic flux pig. The accuracy of the pig is to
be determined and the probability calculated that a specified pipe section is
significantly corroded when this is indicated by the pig.
Let
1 if there is significant corrosion,
0 otherwise,
and
if the test indicates significant corrosion,
otherwise.
Let denote the proportion of times in a very large experiment that correctly
detects a corroded pipe section. Let
pipe section is corroded].
Likewise, let
| pipe section is not corroded].
Suppose n = 10 pipe sections on the sea floor are tested and the tests indicate
that all n are corroded. An "autopsy" is performed on each of the pipes and it
is found that x = 2 are not significantly corroded.
(a) If a priori and , what are the posterior
distributions of and.
52 ENGINEERING RELIABILITY
(b) Let denote the indicator for a different pipe section not yet tested. If a
priori , what is
3.1.2. Inspection sampling. An inspector records x = 2 defectives from a
sample but forgets to record the sample size n. The sample is from a production
process considered to be in statistical control; i.e., observations are considered to
be exchangeable and as a consequence each item has the same a priori chance
of being defective. The inspector remembers that n is either 10 or 12 with equal
probability. Suppose is judged to have a Be(l, 1) probability distribution.
(a) Draw an influence diagram containing nodes x, and n by adding the ap-
propriate arrows together with the appropriate conditional probabilities. Using
arc reversals, c a l c u l a t e .
(b) Using the influence diagram and arc reversals find a computing expression
f o r .
3.1.3. The problem of two designs. There are two possible designs for
constructing columns to be used in buildings. Each design involves the same
construction cost. Let be the "chance" of failure of a column due to
a large earthquake if design one (two) is used. Suppose some tests have been
performed using a shaker table (an earthquake simulator) and is our
probability function for . Suppose in addition that but
.
In each case below, let be the number of columns that fail in an earthquake
if you use design i(i = 1,2). What is the probability function for if
columns fail independently given the earthquake occurs?
(a) Suppose you are considering building a structure requiring only one col-
umn. If an earthquake occurs and the column fails, you lose $1 million and
nothing otherwise. Which design would you choose?
(b) Suppose you are considering building another structure requiring two
columns. If either or both columns fail in an earthquake, you lose $1 million
and nothing otherwise. Which design would you choose? (Assume columns fail
independently given the occurrence of an earthquake.)
(c) Suppose you are considering constructing yet another building requiring
three columns. Suppose is Be(4,6) and is Be(2,3) so that
but. If any columns fail in an earthquake, you lose $1 million
and nothing otherwise. Which design would you choose?
3.1.4*. Conditional dependence and unconditional independence in
the finite population binary case. Let and where
xi is either 0 or 1 and
COUNTING THE NUMBER OF FAILURES 53
if and only if p(5) is bi(N, p). In this case conditionally dependent random quanti-
ties X1,X2,...,Xn with conditional density given by (3.1.2) are unconditionally
independent when the prior distribution for S is binomial with parameters N
(the population size) and p specified.
3.1.5*. Conditional dependence and unconditional independence in
the finite population exponential case. Let and show that
TABLE 3.2.1
Comparison of two certifiers.
Certifier 1
Certifier 2 Top grade Not top grade
Top grade n11 = 21 n12 = 8 n10 = 29
Not top grade n21 = 3 n22 = 18 n20 = 21
n01 = 24 n02 = 26 n00 = 50
TABLE 3.2.2
Chance matrix of two certifiers.
Certifier 1
Certifier 2 Top grade Not top grade
Top grade
Not top grade
54 ENGINEERING RELIABILITY
they can never be operationally observed since they involve infinite populations.
In Table 3.2.2, is the chance that a future item, exchangeable with the
recorded items, will be given the designation top grade by both certifier 1 and
2. We are interested in estimating the chance that both certifiers
will agree that a given item is either top grade or not top grade. To do this
we need to first assign a joint prior probability function to.
Note, however, that since only three chances are free
variables.
if the joint probability distribution of (Y1, Y2, Y3, Yk) has joint probability density
function
(3.2.2)
probability density. Using this result we can finally answer the question posed at
the beginning of this section. Namely, what is the posterior probability distribu-
tion o f , the chance that both certifiers will give the same classification
to a future item exchangeable with the ones observed?
If we use an initial Dirichlet prior the answer is
TABLE 3.2.3
Missing failure mode data.
TABLE 3.2.4
Chance table for data Table 3.2.3.
so that there would not be much point in graphing the probability density.
so that the random quantity of interest with respect to this change of variables
is
and that and are independent. The likelihood in terms of the new
variables is
By matching the Dirichlet prior to this likelihood we obtain the posterior distri-
bution of. It can be shown that and are also independent
a posteriori and
It follows that
It follows then that the joint probability density function of (Z,Y 1 ,..., Yk) is
Exercises
3.2.1. A cheap method versus a master method. Exchangeable items
can be tested nondestructively using either a cheap method or a more expensive
master method. The following table gives the data from a test wherein each of
40 items were tested with both methods.
COUNTING THE NUMBER OF FAILURES 59
Master
method
Cheap
method Go No go
Go 23 6
No go 5 7
Using the same Dirichlet prior employed for the example with two certifiers,
calculate the expected proportion of times both methods would agree for a con-
ceptually infinite population exchangeable with the tested items.
3.2.2. Some failure m o d e s not reported. Calculate the expected pro-
portion of ruptured pipes in a conceptually infinite population exchangeable with
the items in the following table.
Using the same Dirichlet prior as used for the missing data example, calculate
the standard deviation of the posterior distribution for
3.2.3*. Sum of Dirichlet random quantities. If (Y1,Y 2 , . . ., Yk)
, show that has a
TABLE 3.3.1
(M denotes million seeks.)
(3.3.1)
Example 3.3.1. Hard disk failures. Two hard disk manufacturers were
competing for a very large government contract. One condition of the competi-
tion was that the winner should have a lower failure rate with respect to hard
disk failures. In Table 3.3.1 we have the following data based on field experience
for many hard disks. Company A claimed that since they had many more disks
in use, the apparent superiority of Company B was void. Would you agree? How
would you analyze this data? The number of seeks is a measure of time passage
in this case. We have recorded the total number of seeks over all hard disks of a
particular type.
TABLE 3.3.3
In both examples, the time horizons T in terms of the number of seeks in the
first example or the number of shots in the second example are very large. The
total time on test (TTT) for all hard disks in the Held in the first example is 250
million seeks for Company A and 1 million seeks for Company B. In the second
example, we multiply the number of shots. 700. by the number of such devices
in the system. Table 3.3.3 gives the MLEs per 1000 shots for the data in Table
3.3.2.
(3.3.2)
constant.
Remembering the form of the gamma density
f o r , we calculate
with MLE
If we use a "flat prior," p(A) = constant, the posterior density for A would be
Comparing (3.3.6) and (3.3.7) we see that there is virtually no overlap in the
posterior probability densities. It follows that the failure rate for hard disks
from Company B is significantly better than the failure rate for hard disks from
Company A.
Figure 3.3.2 demonstrates graphically the differences in the failure rates of
Companies A and B.
Exercises
Compute all posterior densities using the "flat" prior for A.
3.3.1. Using the data in Table 3.3.2, compute and graph the posterior den-
sities for fuse and capacitor failure rates. For fuses, compute and graph
We will derive the Poisson process starting with finite time interval and finite
time horizon models to show the assumptions underlying the Poisson process.
A random process (or stochastic process) is a collection of random quantities,
say, {X(t1),X(t2),..,X{tn)}, at time points (or epochs) t1< t2 < ... < tn. In
our modeling leading to the Poisson process, time intervals will originally have
length ; e.g.,
where T = is the time horizon and 9(T) is the total number of failures in
the time interval [0, T\.
Let be the number of failures in the ith time interval
[ti-1,ti) where In our derivation of the Poisson process, we consider
four related models depending on the finiteness of T and whether or not .
In the first case with both and T finite, we derive the finite population
version of the geometric discrete distribution. This is the discrete version of
the finite population joint exponential density presented in Chapter 1. The
parameter 9(T) is now the total number of failures in the time interval [0,T].
Marginal random variables, Y1. Y2 ...YK dependent given .
In the second case, we keep the length of an interval but allow the
time horizon T to become infinite. In this fashion we derive the usual form of
the geometric discrete distribution. This is the discrete version of the infinite
population joint exponential density derived in Chapter 1. Note that when
, Y1. Y2 ...YM are now conditionally independent given the parameter
In the third case, the time horizon T is kept finite and fixed while the time in-
terval is allowed to approach zero. In this way we drive the multinomial discrete
distribution with fixed parameter 9(T). Fix 0 < t1 < • • • < tm = T and let n(ti)
be the number of failures in [t i - 1 ,t i ). Random quantities n(t1). n(t2) n(tm)
are still dependent given 9(T) as in the first case. However, now we have a prop-
erty called orderliness. By orderliness we mean that two or more failures will
not occur in the same "infinitesimal" time interval with probability one.
In the fourth and final case, we have the Poisson process where the time
horizon has become infinite and the time interval is allowed to approach zero.
For Poisson processes, we have (1) independent increments Y(ti) = X(ti) -
X(ti-1), (2) orderliness, and (3) marginal random variables have the Poisson
distribution; i.e.,
TABLE 3.4.1
Derivation of the Poisson process.
I II
Finite population Independent
version of the geometrics
geometric (independent
discrete increments)
III IV
Multinomial Poisson process
(orderliness) (independent
increments
and
orderliness)
To prove this we need only prove that there are exactly vectors
whose sum of coordinates equals S. Denote the S defects by stars and indicate
COUNTING THE NUMBER OP FAILURES 67
(3.4.2)
where Is exchangeable?
For n = 1,
(3.4.3)
(3.4.4)
for y n = (y 1 ,y 2 ,...,y n )
In the limit ( a s ) random quantities Y1, Y2,..., Yn,... are condition-
ally independent with univariate geometric probability function
Remark. Note that exchangeability alone does not imply that all vectors with
the same sum have constant probability.
Letting and using the same argument as above we can show that
where A is interpreted as the average number of events occurring per unit time. In
general, fixing k and letting become infinite in such a way that ,
we have
Since ti - ti-1 = (ki — k i - 1 ) it follows that in taking the limit, ki — ki-1 must
go to infinity as .
COUNTING THE NUMBER OF FAILURES 69
as was to be shown.
Independence of n(t1),n(t2) n(tm) is true for each where ti =
> 0 in case II and therefore true in the limit as .
where these coincide with multiples of the r"s. Let n(t i ) be the number of events
that occur in [ t i - 1 , t i ) . Since the ti's are integer multiples of the ,
where and
We will only give the result for m = 2 since the general result follows along
similar lines. Fix t > T. We show that in the limit, as , n(t) given 6(t) and
T is binomial with parameters and.
70 ENGINEERING RELIABILITY
In the numerator, the first factor has n(t) terms while the second factor has
0(T) - n(t) terms. The denominator has 9(T) terms. Multiply numerator and
denominator by and let . We have
III —• IV. The second derivation of the Poisson process. We can derive
the Poisson probability function from the multinomial. To see how that approach
would work, fix m and t1 < t2 < ... < tm < tm < T. Since we are now going to
let t m+1 while consider an additional interval [tm,T) so that
As before, let n(ti) be the number of events in ti - ti-1 . We next let t m+1 = T
while keeping
COUNTING THE NUMBER OF FAILURES 71
Let so that
which is the joint probability function for independent Poisson random quanti-
ties.
Exercises
3.4.1. For the finite population exponential model
Notation
cumulative
number of
failures in [0, t]
Poisson
unconditional
when A
is gamma (a, b)
(negative binomial)
hypergeometric
binomial
beta
Be(A,B)
beta-binomial
Bb[N,A,B]
CHAPTER 4 *
Strength of Materials
4.1. Introduction.
There are two basic kinds of failure:
(1) wearout
and
(2) overstress.
Wearout implies that a device or material becomes unusable through long or
heavy use. It implies the using up or gradual consuming of material. We have
already discussed the analysis of wearout data. The jet engine data analyzed in
Chapter 2 are an example of the occurrence of wearout.
Overstress, on the other hand, refers to the event that an applied stress
exceeds the strength of a device or material. This is the type of failure examined
in this chapter. The problem is to determine whether or not a specified or
random load will exceed the strength of a device or material. Both the strength
and the load may be random. However, they are usually considered independent
of one another. In section 4.2 we consider the problem of assigning a probability
distribution to the strength of a material based on test data. In section 4.3
we discuss design reliability. That is, based on projected principal stresses, we
calculate the maximum shear stress that could result.
According to classical theory, the ultimate strength of a material is deter-
mined by the internal stresses at a point. We begin with a simple illustrative
example, called a uniaxial tensile test. A coupon of material is stressed by pulling
on it with an applied force until it yields (Figure 4.1.1). The applied force re-
quired to cause yielding is equivalent to the internal strength of the material. By
repeating this experiment for a given type of material, aluminum, for example,
we will obtain material strength data. However, experimental measurements will
give many results that may hardly agree with any deterministic theory. For this
reason, we require a probability distribution to predict strength.
How can we predict failure? We will consider two criteria relevant to this
prediction:
(1) the distortion energy criterion due to von Mises
and
(2) the maximum shear stress criterion. This will be discussed in section 4.3.
73
74 ENGINEERING RELIABILITY
Volumetric energy is the energy stored in a material due to a pure volume change.
For example, were we to submerge a cube of material at some depth in the ocean,
the symmetric effect of hydrostatic pressure on the cube would result in a pure
volume change.
According to the von Mises-Hencky criterion, when the distortion energy of a
material reaches a certain value, the material will yield. Since we are interested
in material failure, the distortion energy component of total elastic energy is
important to us. This is the only energy component relevant to a tensile test. A
basic reference is The Feynman Lectures on Physics, Vol. II, Chapter 38.
Hooke's law. Imagine pulling the coupon in Figure 4.1.1 until yielding occurs.
The stress imposed is by definition the force imposed per unit area. Although we
have given a two-dimensional picture of our coupon, it is really three dimensional,
and the area in question is not shown. As we pull the coupon, it elongates and
at the same time the widths contract. As the experiment proceeds, we obtain
the graph in Figure 4.1.2. This is called Hooke's law and is based on empirical
observations which indicate a linear relationship between stress and strain
at the moment of yielding, i.e., Hooke's law
(4.1.2)
STRENGTH OF MATERIALS 75
where A is the bottom area of the coupon and E is the modulus of elasticity, a
material constant. Since stress by definition is the force per unit area, we have
The work done to effect yielding is equal to the total elastic energy corresponding
to strength. This can be calculated as the area under the line in the graph to
the point of yielding or
(4.1.5)
where v is Poisson's ratio. The constants E and v completely specify the elastic
properties of a homogeneous isotropic (noncrystalline) material. (See Figure
4.1.3.)
76 ENGINEERING RELIABILITY
Exercises
4.1.1. Calculate the total elastic energy in terms of when instead of Hooke's
law we have
where m > 0.
(4.2.2)
Hence
(4.2.3)
is the Weibull survival distribution with shape parameter
78 ENGINEERING RELIABILITY
The Weibull shape parameter is not always 2. It does not follow, how-
ever, that a Weibull distribution with shape parameter 2, as in formula (4.2.3),
can always be defended as the distribution for strength of a material. For exam-
ple, suppose we judge the following:
(1) there exists a minimum distortion energy u o , which any specimen can
absorb before yielding, and
(2) the material is no longer linear elastic. The distortion energy for the ith
specimen in uniform tension (or compression) is for this case represented as a
more general function
Load application. Let the random quantity S denote the strength of a spec-
ified material. Suppose a "coupon" of this material is subjected to uniaxial
loading as in Figure 4.1.1. Let the random quantity L denote the maximum
loading from the beginning of application until the load is removed. We require
Exercises
4.2.1. Suppose that instead of formula (4.2.1), the distortion energy based
on Hooke's law, we use the formula
where K and m are positive constants for the material in question. Following
the method of deriving the Weibull distribution in this section where turned
STRENGTH OF MATERIALS 79
(4.2.6)
for
4.2.2. The strength S of a certain material is judged to have a Weibull
(a = 2,0) probability distribution. Suppose a maximum uniaxial load L is
applied to a "coupon" of the material as in Figure 4.1.1. If S and L are judged
independent and
where now a is the standard deviation of the normal distribution and is its
mean, then develop a computing formula for
P(S > L)
which can be easily calculated based on the cumulative normal probability dis-
tribution.
of the coordinate system we can eliminate the shear stresses, retaining only
the principal stresses. The two stress-free surfaces of the element are by
definition principal planes since they are subject to zero shear stress.
In the tensile test experiment, there was only one nonzero principal stress,
namely, . The other principal s t r e s s e s . In general, these principal
stresses are not zero and may be oriented in any direction along the principal
orthogonal coordinate axes. Assessing the distribution of the three principal
stresses is essential for predicting static failure under multi-axial stress.
According to the von Mises-Hencky criterion for failure, a material fails when
the total distortion energy per unit volume due to external loads exceeds the
internal distortion energy determined by a tensile test as in the previous section.
The total distortion energy per unit volume due to external stresses is
(4.3.2)
Setting this equal to the internal distortion energy determined by a tensile test
defines an ellipsoid within which external principal stresses can lie without caus-
ing failure.
(4.3.3)
STRENGTH OF MATERIALS 81
The volumetric energy can be shown to be
where = 1,2,3 are the stresses and strains at the moment of yielding.
Suppose 0; then by the principle of superposition, i.e., when a
material is stretched it contracts at right angles to the stretch.
(4.3.4)
It follows that
(4.3.5)
Mohr's circle. In the case of biaxial loading, all stresses on a stress element act
on only two pairs of faces as shown in Figure 4.3.2. The two stress-free surfaces
of the element are by definition principal planes since they are subjected to zero
82 ENGINEERING RELIABILITY
shear stress. The axis normal to the two stress-free surfaces is a principal axis.
The other two principal directions, therefore, are parallel to these stress-free
surfaces, and the magnitudes of these two principal stresses are denoted by
and as before. Mohr's circle provides a convenient means for representing
the normal and shear stresses acting on planes perpendicular to the stress-free
surfaces.
Let and denote the normal stress and shear stress acting on the x-
plane that is perpendicular to the stress-free surfaces. Given and , Mohr's
circle, Figure 4.3.3, corresponds to realizations of as we rotate the x-
and y-axes. Let be the angle between the positive direction of the x-axis and
the first principal stress direction. The rotation angle in Mohr's circle is .
The point on the circle corresponds to the state of stress acting on the
x-plane.
The state of stress on an infinitesimal cube surrounding a point in a piece of
material can be represented by a matrix. In the biaxial case this matrix is
(4.3.6)
(4.3.7)
STRENGTH OF MATERIALS 83
for min m a x .
Exercises
4.3.1. Justify equations (4.3.4) using the principle of superposition.
4.3.2. Given that and and are determined, use Mohr's circle
to calculate and in terms of and Assiune
84 ENGINEERING RELIABILITY
y = a ln x + b.
The slope of the line, a, "estimates" the shape parameter . while the scale
parameter is "estimated" by . The justification for these estimates rests
on the hope that
the limiting result is mathematically true. For a small sample (*) is indeed a
stretch!
Engineering textbooks dealing with probability advocate using probability
plots; see, e.g., Halm and Shapiro (1974). These authors feel that even with rel-
atively small sample sizes they can obtain estimates of probability distribution
parameters as well as a graphic picture of how well the distribution fits the data.
Statistics textbooks written by mathematical statisticians do not mention prob-
ability plots. Their reason may be that since estimates are not "true" parameter
values, confidence limits need to be determined for the parameters. To obtain
confidence limits they use other methods often likelihood methods.
where is the stress, and x u , x0, and m are parameters. The only one of these
parameters for which some physical interpretation applies is xu. The basis for
this selection of is purely empirical since it is determined by curve fitting.
STRENGTH OF MATERIALS 87
Section 4.3. For further discussion of Mohr's circles see Hoffman and Sachs
(1953). The derivation of
Notation
stress, i.e., force per unit area
shear modulus
shear stress acting on the x-plane in the y-direction
PART II
5.1. Introduction.
So far we have only concentrated on the uncertainty relative to component and
system lifetime. In practice, the cost of inspection and maintenance decisions
are often of paramount importance. These too are of course not known with
certainty, but it is conventional to use expected costs and compute expected
present worth or expected future worth. First, in section 5.2, we discuss the
economic analysis of replacement decisions based on discounted expected costs
and present worth but without considering the uncertainty in lifetime.
In Example 5.2.1, the impact of depreciation and other income tax consid-
erations relative to replacement decisions are not included as they should be in
a realistic comprehensive analysis. Also, the revenues that might be generated
by the machines considered are omitted. Due to the time value of money, it is
necessary to use a discount rate (or interest rate), call it i. Let N be the time
horizon under consideration. In our first example A' = 3.
In section 5.3 we discuss Deming's ALL or NONE rule in inspection sampling.
In section 5.4 we study age replacement when lifetimes are uncertain. In section
5.4, we use the expected present worth criterion with respect to continuous
discounting; i.e., the interest rate is calculated continuously for an unbounded
time horizon.
TABLE 5.2.1
Keep or replace policies.
Cost comparisons
Period Keep old Replace Optimal
policy
0 -$1500 -$8000
1 -$3000 -$2500
2 -$3500 -$2500
3 -$1800 -$3500
Total -$9800 -$9500 Replace
PW(12%) -$8249 -$9733 Keep old
To further bring our problem to life, suppose that our discount rate is i = 0.12
or 12% per year. This may represent our minimum attractive rate of return, i.e.,
the interest rate we could or at least would like to achieve. Using a spreadsheet
such as EXCEL, for example, P\V(i) is easy to calculate in this and in much
more complicated situations.
Table 5.2.1 summarizes our costs and calculations. The totals are the sums
of costs without discounting.
Since the discounted cost at rate i = 0.12 (or interest rate 12%) of keeping
the old machine for three years is $8249, while the discounted cost of replacing
the old machine with a new machine is $9733.90, it is clear that our decision
should be to keep the old machine.
Were we not to use discounting (i.e., i = 0) we would have computed $9800
for the cost of keeping the old machine and a cost of only $9500 for replacing
the old with a new machine. Hence without discounting we would have decided
to replace the old machine, whereas with discounting we would keep the old
machine. Cost decisions over more than one year should be based on discounting.
94 ENGINEERING RELIABILITY
Exercises
5.2.1. Using the data in Table 5.2.1 and an interest rate of i = 0.10, that
is, 10%, determine the optimal policy regarding whether or not to keep the old
machine.
5.2.2. Using the data in Table 5.2.1 and an interest rate of i = 0.10, compute
the annualized expected cost per time period starting with period number 1.
(By annualized we mean that the discounted cost is the same at the end of each
period, usually one year, and has equivalent PW.) What is the optimal policy
regarding whether or not to keep the old machine in this case?
5.2.3. Using the data in Table 5.2.1 and a random interest rate of either
i = 0.10 or i = 0.12, each with equal probability, determine the optimal policy
regarding whether or not to keep the old machine using the expected future worth
criterion. What is the optimal decision now?
5.2.4. Why are optimal decisions relative to alternative decision comparisons
based on future worth in general different from those based on present worth?
Percent defective specified. Let be the percent defective over many lots
obtained from the same vendor. Suppose we believe that the vendor's production
of items is in statistical control. A sequence of measurements will be said to be
in statistical control if it is (infinitely) exchangeable. That is, each item, in our
judgment, has the same chance of being defective or good regardless of how
many items we have already examined. Let p(7r) be our probability assessment
for the parameter based on previous experience. It could, for example, be
degenerate at, say, . Inspection results are independent only in the case that
is degenerate at and is specified.
THE ECONOMICS OF MAINTENANCE AND INSPECTION 95
Let k1 be the cost to inspect one item before installation. Let k2 be the cost
of a defective item that gets into the production line. This cost will include the
cost to tear down the assembly at the point where the defect is discovered. If
an item is inspected and found defective, additional items from another lot are
inspected until a good item is found. (We make this model assumption since
all items that are found to be defective will be replaced at vendor's expense.)
Figure 5.3.1 illustrates our production line problem.
We assume the inspection process is perfect; i.e., if an item is inspected and
found good then it will also be found good in the assembly test and if a defective
item is not inspected, it will be found defective at assembly test.
The all or none rule. It has been suggested [cf. Deming, 1986, Chapter
15] that the optimal decision rule is always to either choose n = N or n = 0,
depending on the costs involved and , the chance that an item is defective.
Later we will show that this is not always valid when is unknown.
In this example we consider the problem under the restriction that the initial
inspection sample size is either n = N or n = 0. Assuming that is known,
we cannot learn anything new about by sampling. On the other hand, we
can discover defective units by inspecting all N. Hence the only decisions to be
considered are n = N and n = 0. Figure 5.3.2 is an influence diagram for this
problem, where
x = number of defectives in the lot and
y = number of additional inspections required to find good replacement items
for bad items found either at the point of installation or the point of assembly
test. These come from another lot.
The value (loss) function is
Hence n = 0 is best if
(5.3.1)
and n = N is bast if
In the case that is unknown and the only two decisions to be considered
are either n = N or n = 0, then the solution is also valid if we replace b y .
If or , then we may as well let n = N since we may obtain additional
information without additional expected cost.
Suppose now that we a l l o w . If we are certain that , i.e.,
for , then n = 0 is always best. On the other hand, if we are
certain that then n = N is always best.
THE ECONOMICS OF MAINTENANCE AND INSPECTION 97
the optimal sample inspection size may be neither 0 nor N. In section 10.4, we
consider the optimal solution when can also be considered.
or n = 0 if
and n = N otherwise.
If
if an item is defective,
otherwise,
then n = 0 if
(5.3.2)
and n = N otherwise.
98 ENGINEERING RELIABILITY
The intuitive reason for the difference between (5.3.1) and (5.3.2) is as follows.
If the variance is small, then is either close to zero or close to one. If the variance
is close to zero, there will be few defective items, so we should not inspect before
installation. On the other hand, if we believe is close to one, then there will be
so many defective items that the cost of inspecting items before installation will
tend to exceed the cost due to downgrading the item at the point of assembly
test.
Exercises
5.3.1. Deming's inspection problem for a unique lot. Suppose we are
considering whether or not to inspect items in a unique lot of size N. That is,
there are no other items except those in this lot. Figure 5.3.1 still applies with
some exceptions. There are no additional inspections in this case because the lot
is unique.
Let 8 be the (random) number defective in the lot of size N. If an inspection
sample size of n is chosen and x are the number found defective in the sam-
ple, then we have a second decision to make, namely, either stop inspection or
continue on and inspect all items.
(i) If the decision is to stop inspecting, the total inspection cost is
.
(ii) If the decision is to continue on and inspect all remaining items, the total
inspection cost is k\N.
If the prior distribution for 9 is beta-binomial(N, A, B), determine the opti-
mal decision after inspecting (specified) items; i.e., when is it optimal
to stop inspection and when is it optimal to continue on when x defectives are
found in a sample of size n?
with probability
with probability
Binomial. If Z1, Z2, ..., Zn are judged independent given 7r, then
and
R(T) is the long run expected discounted cost when the replacement interval
T is set at each stage. If r(x) is continuous and strictly increasing t o , then
and is unique. Note that we need only specify the ratio of costs and
not the costs separately.
If we take the limit
we obtain the long run average cost per unit of time. There is no discounting of
costs, and the optimal replacement interval T minimizes long run average cost.
If the failure rate r(x) is strictly increasing t o , the minimizing x = T* satisfies
Note that we need only specify the ratio of costs and not the costs separately.
Then
THE ECONOMICS OF MAINTENANCE AND INSPECTION 101
TABLE 5.4.1
Calculation of optimal replacement interval.
x in Cumulative Left-hand
years normal side
0.1 0.54 0.03
0.2 0.58 0.04
0.3 0.62 0.09
0.4 0.66 0.17
0.5 0.69 0.28
0.53 0.70 0.32
0.535 0.70 0.33
0.54 0.71 0.34
0.55 0.71 0.35
0.6 0.73 0.43
0.7 0.76 0.63
0.8 0.79 0.90
to find T*.
This can be done using a spreadsheet such as EXCEL, for example. Table
5.4.1 shows the calculations. From the table. T* = 0.535 years or 6.42 months
compared with the mean life of 10.71 months.
Exercises
5.4.1. Solve for T* using
Network Reliability
if arc i works.
otherwise.
Let
if the system works,
(6.1.1)
otherwise.
Such abstract systems are among a class of systems called coherent. The structure
function is the system organizing function that relates component operation
to system operation.
DEFINITION. A system of components is called coherent if (a) its structure
function is increasing coordinatewise and (b) each component is relevant.
A physical system in general (there are exceptions) has the property that
replacing a failed component by a functioning component causes the system to
pass from a failed state to a nonfailed state, or at least the system is no worse
than before the replacement. This is our justification for requirement (a).
105
106 ENGINEERING RELIABILITY
Component i is irrelevant if
for all choices of ; i.e., it does not matter whether component i is working or
not working. Component i is relevant otherwise. It follows that ( 1 , 1 , . . . , 1) = 1
since all components in a coherent system are relevant.
Figure 6.1.2 shows an example of a modified network that is not coherent
because arc 9 is not relevant to the question regarding the existence of a working
path from the source to the terminal. It may, however, be relevant in some other
configuration of distinguished nodes.
Let X1, X2, ..., X8 be binary random variables corresponding to arc success;
i.e.,
1 if arc i works,
0 otherwise.
Given P(X i = 1) = pi for i = 1 , . . . ,8 and assuming conditional independence
given the p i 's, our problem is to compute the probability that there is at least
one working path at a given instant in time between the distinguished nodes s
and t. For a network with n arcs we will denote this probability by
(6.1.2)
It follows that (l, 1 , . . . , 1) = 1 since all components in a coherent system are
relevant.
FIG. 6.1.7. Illustration of the pivoting algorithm: the binary computational tree.
and
NETWORK RELIABILITY 109
Then
Exercises
6.1.1. What is the sum of the coefficients of the reliability polynomial for an
arbitrary undirected network with n arcs?
6.1.2. Calculate the reliability polynomial for the two terminal undirected
network in Figure 6.1.1 using the factoring algorithm. Show your work.
6.1.3. Use the factoring algorithm to calculate the reliability polynomial
for the all terminal undirected network below, i.e.. the probability that any
node can communicate with any other node in the network when all arc success
probabilities are set equal to p. In this case all nodes are distinguished and each
needs to communicate with all other nodes.
110 ENGINEERING RELIABILITY
6.1.4. The network reliability problem for the figure below is source to
terminal connectivity and arcs are unreliable.
(a) Find the domination and signed domination for the undirected network
below.
(b) Is this network coherent?
(c) Using the factoring algorithm, would it be permissible to first pivot on
arc 6 and then on arc 7? Why or why not?
6.1.5. Show that the domination of the undirected two-terminal graph below
is 2 n - 1 . The point of this example is to show that the factoring algorithm is
not a polynomial time algorithm. The intermediate nodes between s and t are
labelled.
For example, if arcs 1 and 4 in P1 are working but all other arcs art' failed, then
there is still a working path from source s to terminal t. In addition, the set P1
cannot be reduced and still constitute a working path. So {1,4,3} would also
constitute a path set but would not be minimal.
Likewise for the bridge, the minimal cut sets are
K1 = {l,2}, K2 = {4,5}, K3 = {1,3,5}, A, = {2.3,4}. (0.2.2)
If arcs 1 and 2 are failed, for example, then there is no path from s to t regardless
of the state of other arcs and K1 is called a cut set. As in the case of path sets,
we are only interested in minimal cut sets.
Using either the minimal path sets or the minimal cut sets we can define a
Boolean representation for a network that, although artificial, provides another
method for calculating network reliability.
(6.2.3)
where is the system structure function for the j t h minimal path set.
There are p minimal path sets. Minimal path sets are in turn connected in
parallel. The representation is not necessarily physically realizable since the
same arc (or component) can occur in several minimal path sets.
112 ENGINEERING RELIABILITY
A similar representation holds with respect to minimal cut sets where now
the network system structure function is
The factoring algorithm does not work for rooted directed graphs. A
directed graph (or digraph) has arcs with arrows indicating direction. Figure
6.2.3 is a rooted directed graph. The "root" is vertex s, the unique node with no
entering arcs. The problem is to compute the probability that s can communicate
with both v and t. The topology for this problem is defined by the family of
minimal path sets
P = [{2,5},{1,3,5},{2,e,4},{1,3,4},{1,2,4}]. (6.2.5)
Graphically, a minimal path for this problem is a "rooted tree." For example,
the acyclic rooted directed graph in Figure 6.2.4 is a "tree" corresponding to
a minimal path for the network in Figure 6.2.3. Unfortunately, the factoring
algorithm is not valid for this problem or for directed graphs in general. The
binary computational tree shown in Figure 6.2.5 illustrates what goes wrong.
Note that arc 3 is now a self-loop after nodes u and v are coalesced. If arc e
is perfect, then the corresponding minimal path sets are
whereas from the leaf graph on the left we would infer that {1,4} is also a
minimal path which is FALSE!
minimal path sets for the coherent system. Let Er, r = 1,2,... ,p be the event
that all components (or arcs in the case of a network) in a minimal path set of
the coherent system work. By the inclusion-exclusion principle, we can calculate
114 ENGINEERING RELIABILITY
system reliability
Exercises
6.2.1. Use Boolean reduction to calculate the reliability for the network in
Figure 6.2.3.
6.2.2. What is the domination number for the directed graph in Figure
6.2.3?
6.2.3. Show that h(p) is nondecreasing coordinatewise for coherent systems.
NETWORK RELIABILITY 115
6.3. Analysis of systems with two modes of failure: Switching and
monitoring systems.
Switching systems can not only fail to close when commanded to close but can
also fail to open when commanded to open. Likewise, safety monitoring systems
can not only fail to give warning when they should but can also give false alarms.
In either case, the switching or the monitoring systems are not working properly.
It is desirable to design switching and monitoring systems that are reliable and
also do not have too many failures of the second type. The design of such systems
can be analyzed using coherent system duality and the reliability polynomial.
where Pj and Kj are minimal path and minimal cut sets, respectively. If we
interchange the OR (i.e., 11) and AND (i.e., II) operations to get , we have
where now Pj is a minimal cut set for and Kj is a minimal path set for.
It follows that
since
P1 = { 1 , 2 , } , P 2 = {1,3}, P 3 = {2,3},
116 ENGINEERING RELIABILITY
Let p be the probability that a detector alarms when smoke is present. Then the
reliability polynomial for the 2-out-of-3 alarm system is
The dual coherent system and two modes of failure. First we consider
a switching system which is subject to two modes of failure: failure to close
NETWORK RELIABILITY 117
and failure to open. Similarly, circuits constructed of switches (to mitigate such
failures) are subject to the same two modes of failure.
Let be a coherent switching system with n exchangeable and statistically
independent components. Let xi = 1 if the ith switch correctly closes when
commanded to close and xi = 0 otherwise. Let if the circuit responds
correctly to a command to close and 0 otherwise. Then represents the
structure function representing the correct system response to a command to
close.
Now let yi = 1 if the ith switch responds correctly to a command to open
(that is, opens) and 0 otherwise. Let be a similar structure function such that
if the circuit responds correctly to a command to open (that is. opens)
and 0 otherwise. Then
from (6.3.1) and since the minimal cuts for are the minimal
paths for . Recall that a minimal cut set of switches for is a set of switches
whose failure to close means the switch system cannot close and therefore the
system is open when the switches in a minimal cut set for are open.
Let Xi = 1 with probability p so that
where qi = 1 — pi.
Expand and sum coefficients of zj for j = k,..., n to obtain the probability
2
that at least k components operate. Computing time is of the order for fixed
k. A computer program written in BASIC follows:
Exercises
6.3.1. Verify equation (6.3.3).
6.3.2. Calculate and graph when is a 4-out-of-5 system. Use
to calculate and graph the dual system.
6.3.3. Show that the dual of a k-out-of-n system is an n — k + 1-out-of-n
system.
6.3.4. Use the BASIC program above to calculate when is a 3-out-
of-5 system with component reliabilities
(a) Assuming each component has reliability p. compute the reliability func-
tion for each structure.
(b) Which system is most reliable for p close to 1?
(c) Which system is most prone to false alarms when the probability of a
component false alarm is very low?
Notation
121
122 ENGINEERING RELIABILITY
1. Safety
(a) Fire due to a system fault
(b) Explosion due to a system fault
(i) Tank rupture
(ii) Gas air explosion
(c) Toxic effects
(i) Gas asphyxiation
(ii) Carbon monoxide poisoning
2. Reliability
(a) Failure to supply heated water
(b) Water supplied at excess temperature
(c) Water supplied at excess pressure
Event symbols. The symbols shown in Figure 7.2.1 represent specific types
of fault and normal events in fault tree analysis. The rectangle defines an event
that is the output of a logic gate and is dependent on the type of logic gate and
the inputs to the logic gate. A "fault event" is an abnormal system state. It is
not necessarily due to a component failure event. For example, the fault event
could occur due to a command or a communication error.
The circle defines a basic inherent failure of a system element when operated
within its design specifications. We refer to this event as a primary basic event or
"end of life" failure event. The diamond represents a failure, other than a primary
SYSTEM FAILURE ANALYSIS: FAULT TREES
125
failure that is purposely not developed further. We call this a secondary basic
event. Basic events are either primary events (denoted by a circle) or secondary
basic events (denoted by a diamond). The switch event represents an event that
is expected to occur or to never occur because of design and normal conditions
such as a phase change in the system.
Logic g a t e s . Fault trees use "OR gates" and "AND gates." OR gates are
inclusive in the sense that the output occurs if one or more input events occur.
The output event from an AND gate occurs only if all input events occur. In
addition, there may be a NOT gate in a fault tree. A NOT gate has a single
input event. The output occurs if the input event does not occur. Any logical
proposition can be represented using OR and AND together with NOT logic-
symbols.
The output event from an EXCLUSIVE OR gate occurs in the case of two
input events if either one input occurs or the other input occurs but not both.
III. For each event in a rectangle we ask the question, Can this fault consist
of a component failure? If the answer is Yes, classify the event as a "state-
of-component fault." Add an OR gate below the event and look for primary,
secondary, and command faults.
If the answer is No, classify the event as a "state-of-system" fault event.
In this case, the appropriate gate could be OR, AND, or perhaps no gate at
all. In the last case, another rectangle is added to define the event more pre-
cisely.
IV. The no miracles allowed rule. If the normal functioning of a component
propagates a fault sequence, then it is assumed that the component functions
normally.
V. All inputs to a particular gate should be completely defined before further
analysis of any one of them is undertaken.
SYSTEM FAILURE ANALYSIS: FAULT TREES
129
Exercises
7.2.1. Using the information on the hot water tank, construct a fault tree
similar to Figure 7.2.3 for the top event: failure to supply heated water.
SYSTEM FAILURE ANALYSIS: FAULT TREES 131
7.2.2. For the simple battery/motor diagram below, construct a fault tree
for the top event: motor does not start. Assume the wires and connections are
perfect. Consider two additional components of the motor: (1) the brushes and
(2) the coil. Initially both switches are open.
7.2.3. Redraw the pressure tank fault tree. Figure 7.2.5. with the following
design modification. Put the timer contacts in both the control circuit and the
pump circuit.
7.2.4. Represent an EXCLUSIVE OR gate as a fault tree using only OK,
AND, and NOT gates. Make a "truth table" for an EXCLUSIVE OH gate and
see if the tree constructed satisfies the truth table. The following is a truth table
for an INCLUSIVE OR gate.
use the term "cut set" since the top event is usually a failure event even though
the term "path set" may seem more natural. The set is minimal if it cannot be
reduced and still cause the top event to occur. We can calculate the probability
of the top event using the minimal cut sets and the probabilities of primary
events and undeveloped events.
An algorithm called MOCUS (method for obtaining cut sets) starts at the
top and works its way down. Using Figure 7.3.1 to illustrate, start with the top
event. Since the gate below it is an OR gate, list immediate input events in
separate rows as follows:
1
2
Gl
1
2
G2
G3
SYSTEM FAILURE ANALYSIS: FAULT TREES 133
Since G2 is an AND gate, put its input events in the same row. obtaining
1
2
G4, G5
G3
We continue in this way, always
(1) listing input events in separate rows if the gate is an OR gate, and
(2) listing input events in the same row if the gate is an AND gate.
Finally, we obtain
1
2
4,6
4,7
5,6
5.7
3
6
These are the Boolean indicated cut sets (BICS). However, they are not the
minimal cut sets since basic event 6 appears in three rows. We discard the
supersets containing event 6 since they are not minimal, leaving the family of
minimal cut sets:
Dual fault t r e e s and B I C S . The dual tree to Figure 7.3.1 can be found by
replacing OR gates by AND gates and AND gates by OR gates. Basic events and
gate events are replaced by their negation. The minimal cut sets of either the
primal tree or its dual can be used to calculate the probability of the top event.
Sometimes it is advantageous to use the dual tree. We can quickly determine the
number of BICS for both the primal tree and its dual and use this information
to determine which tree to use to find the minimal cut sets.
Calculating the maximum size of BICS. We can also determine the max-
imum number of events in a BICS as follows. As before assign weight 1 to each
basic event in the tree. Starting at the bottom of the tree,
(1) for each OR gate assign the MAXIMUM of the input weights;
(2) for each AND gate assign the SUM of the input weights.
The final weight assigned to the top event is the maximum number of events
in a BICS. This is a very fast algorithm.
Boolean reduction. If the fault tree has no NOT gates we can use the
method of Boolean reduction to calculate the probability of the top event. Let
K1, K2,..., Kk be the k minimal cut sets for a given fault tree. Then
(7.3.2)
The structure function in this case is 1 if the top event occurs (a failure event)
and 0 otherwise. This was called the minimal path representation in Chapter 6,
but because we are now failure oriented it is now the minimal cut representation
for fault trees.
Substituting random quantities Xi for xi and taking the expectation we ob-
tain
SYSTEM FAILURE ANALYSIS: FAULT TREES 135
The principle of inclusion-exclusion. If the fault tree has no NOT gates
we can use the principle of inclusion-exclusion to calculate the probability of the
top event. Let Ei be the event in which all basic events in K, occur. Then by
the principle of inclusion-exclusion we have
Exercises
7.3.1. Use the following probabilities of failure per hot water tank use for
events in the fault tree in Figure 7.2.3 to calculate the probability of the top
event. The probability for tank rupture is 1 0 - 6 per use. All other primary basic
events (denoted by circles) have probability 10" 4 per use. Diamond events or
undeveloped events have probability 10-3 per use.
7.3.2. For the pressure tank fault tree. Figure 7.2.5. find the minimal cuts
using the MOCUS algorithm. Calculate the probability of the top event assuming
the following probabilities. Pressure tank rupture is 10 - 8 per use while the event
that the pressure switch contacts are jammed closed is 10 - 4 . All other primary
(circle) events are 1 0 - 5 per use. Assume that secondary diamond events have
probability 0 per use.
7.3.3. (a) Apply both the number of the BICS algorithm and the maximum
size of the BICS algorithm to Figure 7.3.1.
(b) Give short proofs of both the number of the BICS algorithm and the
maximum size of the BICS algorithm.
7.3.4. Give the Boolean representation for the structure function correspond-
ing to an EXCLUSIVE OR gate.
System Availability
and Maintainability
In this chapter we assume some familiarity with renewal theory although must of
the formulas used will be intuitive. (See Barlow and Proschan (1996). Chapter
3, section 2.)
8.1. Introduction.
Since many systems are subject to inspection and repair policies, availability is
often the main reliability measure of interest. In sections 8.3 and 8.4 we discuss
methodology for this calculation.
As in Chapter 6, we start with a network representation of a system of in-
terest. In place of unreliable arcs we now refer to unreliable components. Again
we assume that component failure events are statistically independent. However,
now component indicator random variables will depend on time as the following
notation suggests. We refer to component positions since with repair there may
be many components used in a particular position, Let
if the component in position i is working at time t,
otherwise
for i = 1,2,... , n.
Let be a coherent system structure (organizing) function as in Chapter 6
and let
if the system is working at time t,
otherwise.
(8.1.2)
Availability at time t, A(t), is defined as
i.e., the probability that the system is working at time t. With repair possible,
the system could have failed before time t but could also have been repaired
before time t. Even assuming component failure events are independent, A(t) is
in general difficult to calculate. For this reason we will obtain asymptotic
results, which should provide good approximations for long time intervals or, at
the very least, some rule-of-thumb measures.
137
138 ENGINEERING RELIABILITY
a s . This is intuitively true since, roughly, the probability that the system
is up will be the mean uptime divided by the mean length of a cycle consisting
of an uptime and a downtime.
When we consider a complex system with many component positions with
different component lifetimes and component repair times, it is not easy to cal-
culate system availability.
T = minimum
Then, without the possibility of repair, which we assume at this point, T is the
time to system failure.
Given Fi(t), i = 1,2,... ,n, our first problem is to compute F(t) = P(T >
t), the probability that the system survives t time units in the absence of any
component repair.
Fix t for the moment and let
where pi stands for the probability that the component in the ith position is
working at time t (or the arc reliability at time t as in Chapter 6). We claim
that network system reliability at time t is, in our new notation, the same as the
probability that the system survives to time t so that
This is so since the system is coherent and thus fails as soon as the first network
minimal cut set fails (in the sense that all components in some minimal cut set
fail first before all components fail in any other minimal cut set).
To compute F(t) = P(T > t), we need only employ a suitable network
reliability algorithm from Chapter 6 to compute the system reliability at time
SYSTEM AVAILABILITY AND MAINTAINABILITY 139
then the system failure rate function is also constant, say. A. and
(8.2.5)
(8.2.G)
Exercises
8.2.1. Use the reliability function for the bridge from equation (6.1.3),
namely,
Since components in position i have mean lifetime and mean repair time
vi and since components are separately maintained, system availability is, in the
limit a s ,
(8.3.2)
This can be computed using the network algorithms for calculating h(p) in Chap-
ter 6.
SYSTEM AVAILABILITY AND MAINTAINABILITY 141
and
We may ask many more questions concerning system performance in the case
of separately maintained components in coherent systems:
(1) How often do components in position i fail and "cause" the system to
fail? By "cause" we mean that the component in position i fails and the system
then fails due to component i failure.
(2) What is the long run (asymptotic a s ) system failure rate, i.e., the
expected frequency of system failures per unit time?
(3) What is the long run average system up- (down) time?
In each component position we assume that we have a renewal counting
process. The renewal counting process in component position i will be
0}. Let Ni(t) be the number of times in [().t] that components in position * fail
and cause system failure in [0, t]. We will show that, asymptotically, the expected
frequency with which components in position i fail and cause system failure is
(8.3.3)
where and
Proof of formula (8.3.3). To show (8.3.3) it is sufficient to note that since the
system is assumed coherent
This follows because if the component in position i were to fail in the next instant
after time t, the system would also fail.
142 ENGINEERING RELIABILITY
if a failure occurs in
otherwise.
Hence
Let Mi(t) = E[Ni(t)}, the renewal function for components in position i. Again,
approximately
is the probability that the component in position i fails in the interval [u, d + du).
Hence, a s ,
and
to prove (8.3.4). D
Let be the number of system failures in [0.t]. It follows
that the long run expected frequency of system failures is
by equation (8.3.3).
System uptimes and downtimes. At any one time the system may be down
due to more than one component. Hence we cannot simply add up component
downtimes contributing to system failure.
Since are successive system uptimes and are
successive system downtimes, then it can be shown that
This is true by the strong law of large numbers and since we assume
exists.
Since
Also, the long run average of system downtimes can be similarly calculated,
namely,
(8.3.9)
(8.3.10)
while the long run series system expected frequency of system failures from
(8.3.5) is
(8.3.11)
(8.3.12)
(8.3.13)
Exercises
8.3.1. For the network in Figure 8.3.1, calculate for each of
the three components in the figure. Use these to c a l c u l a t e . What
SYSTEM AVAILABILITY AND MAINTAINABILITY 145
(8.4.1)
almost surely. (Recall that shut-off components are in a state of suspended ani-
mation.) For a proof, see Barlow and Proschan (1973).
From (8.4.1) we can calculate the limiting average a v a i l a b i l i t y , namely,
(8.4.2)
(8.4.3)
so that for n = 1 (8.4.2) agrees with the availability formula (8.1.3) given at the
beginning of this chapter.
Let Di(t) be the total system downtime in [0, t] due to failures of component
i a n d . Let
be the limiting average of series system uptimes [cf. (8.3.12)]. Using Theorem
8.4.1 we can prove the following corollary.
COROLLARY 8.4.2. For series systems with independent component failure
and repair events and the shut-off policy in Figure 8.4.1, the limiting average
system downtime due to component i is
while the limiting average system downtime due to all component failure events
Exercises
8.4.1. Verify that when n = 1, all formulas for systems with separately main-
tained components and systems with suspended animation components agree.
8.4.2. Compare system availability when constant for i = 1,2, . . . , n
using the formula for separately maintained components versus the formula for
the case when shut-off components are in suspended animation.
8.4.3* (this exercise presumes knowledge of Markov processes).
Suppose components 1 and 2 are in series. Component 1 shuts down compo-
nent 2 but not vice versa. When component 1 fails and shuts down component
2, component 2 is in suspended animation and cannot fail. On the other hand, if
component 2 fails, component 1 continues to operate and may fail while compo-
nent 2 is being repaired. Repair commences immediately upon failure of either
148 ENGINEERING RELIABILITY
Assume all failure and repair distributions are exponential with failure rates
Xi,9i for components i = 1,2. Remember that exponential distributions have the
memoryless property.
(a) Identify all possible system states. Let 0 denote the state when both
components are working. Draw a sample space diagram of a possible scenario.
(b) Draw a state space diagram with transition rates indicated.
(c) If {Z(t),t > 0} is the system state process, how would you calculate
Notation
A(t) system availability at time t
A asymptotic expected system failure rate
MTBF system mean time between failures
asymptotic system MTBF
CHAPTER 9 *
Influence Diagrams
149
150 ENGINEERING RELIABILITY
A double circle (or double oval) denotes a deterministic node, which is a node
with only one possible state, given the states of the adjacent predecessor nodes;
i.e., it denotes a deterministic function of all adjacent predecessors. Thus, to
include the background information H in the graph, we would have to use a
double circle around H. See Figure 9.2.1.
The following concepts formalize the ideas used in drawing the diagrams.
DEFINITION. A directed graph is cyclic and is called a cyclic directed graph,
if there exists a sequence of ordered pairs in A such that the initial and terminal
vertices are identical; i.e., there exists an integer and a sequence of k arcs
of the following type:
ties associated with these nodes are conditionally independent given the states
of all adjacent predecessor nodes. For example, in Figure 9.2.2 root node ran-
dom quantities X1 and X2 are independent, while the deterministic function
depends on both X1 and X2.
Remark. Let xi and xj represent two nodes in a probabilistic influence dia-
gram. If there is no arc connecting xi and xj. then xi and xj are conditionally
independent given the states of the adjacent predecessor nodes; i.e.,
where wi(wj) denotes the set of adjacent predecessor nodes to only xi(xj), while
wij denotes the set of adjacent predecessor nodes to both xi and xj.
Figure 9.2.3 illustrates conditional independence implied by the absence of
an arc connecting x and y. In a probabilistic influence diagram, if two nodes, xi
and xj , are root nodes then they are independent.
Exercises
9.2.1. In the following influence diagram, which nodes are root nodes?
Which nodes are conditionally independent? Which nodes are independent?
Which nodes are sink nodes?
152 ENGINEERING RELIABILITY
9.2.2. In the following influence diagram, which nodes are root nodes?
Which nodes are conditionally independent? Which nodes are independent?
Which nodes are sink nodes?
FlG. 9.3.1. Equivalent probabilistic influence diagrams for two random quantities.
Hence, Figure 9.3.1 presents the three possible probabilistic influence diagrams
that can be used in this case showing the two ways of splitting node (x, y).
The following property is also a direct consequence of the laws of probability
and it is of special interest for statistical applications.
Property. Let x be a random quantity represented by a node of a probabilistic
influence diagram and let f(x) be a (deterministic) function of x. Suppose we
connect to the original diagram a deterministic node representing using a
directed arc from x to f(x). Then the joint probability distributions for the two
diagrams are equal. (Sec Figure 9.3.2 for illustration.)
Proof. Let w and y represent the sets of random quantities that precede and
succeed x, respectively, in a list ordering. Note that
x) = 1 and consequently from the product law
That is, node x may be replaced by node {x.f(x)) without changing the joint
probability of the graph nodes. Using the splitting node operation in node
{x,f(x)) with x preceding f(x), we obtain the original graph with the additional
deterministic node f{x) and a directed arc from to Note also that
no other arc is necessary since f(x) is determined by and
Merging nodes. The second probabilistic influence diagram operation is the
merging of nodes. Consider first a probabilistic influence diagram with two
nodes x and y with a directed arc from x to y. The product law states that
p{x,y) = p(x)p(y | x). Hence, without changing the joint probability of x and
y, the original diagram can be replaced by a single node diagram representing
the vector (x,y). The first two diagrams of Figure 9.3.1 in the reverse order
illustrate this operation. In general, two nodes, x and y, can be replaced by a
single node, representing the vector (x,y), if there is a list ordering such that x
is an immediate predecessor or successor of y.
154 ENGINEERING RELIABILITY
FIG. 9.3.3. Diagram with adjacent nodes w and y not allowed to be merged.
FIG. 9.3.4. Reversing arc operation in a two node probabilistic influence diagram.
must appear in the representation for the joint probability function for all prob-
abilistic nodes based on the list ordering. Since
Hence, by using the theorem of total probability and Bayes' formula when
performing an arc reversal operation, we can go directly from the left diagram
to the right one in Figure 9.3.4 without having to consider the one in the center.
Although the diagrams are different, they have the same joint probability
function for node random quantities. This fact is formalized in the following
definition.
DEFINITION. TWO probabilistic influence diagrams are said to be equivalent
in probability if they have the same joint probability function for node random
quantities.
Consider the diagram of Figure 9.3.5 where wx, wy, and wxy are sets of
adjacent predecessors of x and (or) y as indicated by the figure. If arc [x, y] is
the only directed path from node x to node y, we may add arcs [wx, y] and [wy, x]
to the diagram without introducing any cycles. (See left diagram of Figure 9.3.6.)
156 ENGINEERING RELIABILITY
FIG. 9.3.5.
The first equality is due to the fact that x and wy are conditionally indepen-
dent given (wx,wy,wxy), and y and wx are conditionally independent given
(wy,wx,wxy). (See Figure 9.3.5.) The other two equalities follow from the prod-
uct law.
Replacing in the product of the conditional
probability functions for the original diagram by
we obtain the product of the conditional probability functions for the second
diagram. This proves that the joint probability functions of the two diagrams
are equal. Finally, we notice that if there were another directed path from x to
y, we would create a cycle by reversing arc [x.y], which is not allowed. D
In general, reversing an arc corresponds to applying Bayes' formula and the
theorem of total probability. However, it may also involve the addition of arcs,
and such arcs in some cases represent only pseudo dependencies. In this sense,
some relevant information may have been lost after arc reversal.
The following example of the influence diagram for a fault tree OR gate illus-
trates the arc reversal operation. The deterministic node becomes probabilistic
after arc is reversed.
Exercises
9.3.1. Fill in the conditional probabilities in the following influence dia-
gram for a fault tree A N D gate. The fill-in should be similar to the O R gate
illustration.
otherwise.
INFLUENCE DIAGRAMS 159
Let x be the number found defective in a sample of size n and y the number
defective in the remainder. What are the initial distributions of and
Reverse the relevant arcs in the right sequence in the following diagram to
obtain p(n \ x) and p(y \ x). What are the distributions of and
9.3.4. Unique lot example. Suppose a unique lot contains N items and
that is all there are. Let 6 be the number defective in the lot. Suppose a sample
of size n is taken and x are found to be defective. Initially we have the influence
diagram
Let represent the proportion of people in the population with blood type A.
and let represent any jury member's probability that the suspect is guilty
before the juror has learned about the blood stain evidence. Use the following
probability assignments:
Using the influence diagram above and appropriate arc reversals, calculate the
probability of guilt given the evidence, n a m e l y , .
9.4*. Conditional independence. The objective of this section is to
study the concept of conditional independence and introduce its basic proper-
ties. We believe that the simplest and most intuitive way that this study can
be performed is by using the total visual force of the probabilistic influence
diagrams.
We now introduce the two most common definitions of conditional indepen-
dence.
DEFINITION 9.4.1 (intuitive). Given random quantities x, y, and z, we say
that y is conditionally independent of x given z if the conditional distribution of
y given (x,z) is equal to the conditional distribution of y given z.
The interpretation of this concept is that, if z is given, no additional infor-
mation about y can be extracted from x. The influence diagram representing this
statement is presented in Figure 9.4.1.
DEFINITION 9.4.2 (symmetric). Given random quantities x, y, and z, we say
that x and y are conditionally independent given z if the conditional distribution
of (x, y) given z is the product of the conditional distributions of x given z and
that of y given z.
The interpretation is that, if z is given, x and y share no additional infor-
mation. The influence diagram representing this statement is displayed in Fig-
ure 9.4.2.
162 ENGINEERING RELIABILITY
Using the arc reversal operation, we can easily prove that the probabilistic
influence diagrams in Figures 9.4.1 and 9.4.2 are equivalent. Thus, Definitions
9.4.1 and 9.4.2 are equivalent, which means that in a specific problem we can use
either one. To represent the conditional independence 2 described by both Figures
9.4.1 and 9.4.2, we can write either or . This is a very general
notation since x, y, and z are general random quantities (scalars, vectors, events,
etc.). If in place of we use , then x and y are said to be strictly dependent
given z. We obtain independence (dependence) and write if
z is an event that occurs with probability one. It is important to notice that
the symbol _L corresponds to the absence of an arc in a probabilistic influence
diagram. However, the existence of an arc only indicates possible dependence.
Although is the negation of ±, the "absence of an arc" is included in the
"presence of an arc."
The following proposition introduces the essence of the DROP/ADD princi-
ples for conditional independence, which are briefly discussed in the sequel.
P R O P O S I T I O N 9.4.3. If , then for every f = f(x) we have
(i) and
(ii)
The proof of this property is the sequence of diagrams of Figure 9.4.3. First
note that (by the splitting nodes operation) to obtain the second diagram from
the first we can connect to x a deterministic node f using arc [x, f] without
changing the joint probability function. Consequently, by reversing arc [x, / ]
we obtain the third diagram. To obtain the last diagram from the third we
use the merging nodes operation. Relations (i) and (ii) of Proposition 9.4.3 are
represented by the second and the third diagrams of Figure 9.4.3.
As direct consequences of Proposition 9.4.3 we have
(C1) If g = g(z) then if and only i f .
(C2) Let / = f(x,z) and g = g(y,z). If then and
2
In the literature, the symbol LI is often used instead of X.
INFLUENCE DIAGRAMS 163
Figure 9.4.4 is the proof of Proposition 9.4.4. Again, only the basic prob-
abilistic influence diagrams operations are used. The second graph is obtained
from the first by merging nodes w and y. The third graph is obtained from
the second by splitting node (w,y), and the first is obtained from the third by
reversing arc [w,y].
The above simple properties are very useful in some statistical applications,
and they are related to the concept of sufficient statistic.
Notation
x and y are independent
x and y are independent conditional on z
x and y are dependent
x and y are dependent given z
CHAPTER 10
10.1. Introduction.
The previous chapter was restricted to questions that could be answered by
computing conditional probabilities, that is. the calculation of probabilities on
the receipt of data, or by information processing. This chapter is concerned with
the evaluation and use of information through the use of influence diagrams with
decision nodes and value nodes. In section 10.5 we compare influence diagrams
and decision trees. Both can be used for decision making.
In real life we are continually required to make decisions. Often these de-
cisions are made in the face of a great deal of uncertainty. However, time and
resources (usually financial) are the forcing functions for decision. That is, de-
cisions must be made even though there may be a great deal of uncertainty
regarding the unknown quantities related to our decision problem.
In considering a decision problem, we must first consider those things that are
known as well as those things that are unknown but are relevant to our decision
problem. It is very important to restrict our analysis to those things that are
relevant, since we cannot possibly make use of all that we know in considering a
decision problem. So, the first step in formulating a decision problem is to limit
the universe of discourse for the problem.
A decision problem begins with a list of the possible alternative decisions that
may be taken. We must seriously consider all the exclusive decision alternatives
that are allowed. That is, the set of decisions should be exhaustive as well as
exclusive. We then attempt to list the advantages and disadvantages of taking
the various decisions. This requires consideration of the possible uncertain events
related to the decision alternatives. From these considerations we determine the
consequences corresponding to decisions and possible events. At this point, in
most instances, the decisions are "weighed" and that decision is taken that is
deemed to have the most "weight." It is this process of '-weighing" alternative
decisions that concerns us in this chapter.
An important distinction needs to be made between decision and outcome.
A good outcome is a future state of the world that we prefer relative to other
possibilities. A good decision is an action we take that is logically consistent with
the alternatives we perceive, the information we have, and the preferences we feel
165
166 ENGINEERING RELIABILITY
at the time of decision. In an uncertain world, good decisions could lead to bad
outcomes and bad decisions can conceivably lead to good outcomes. Making the
distinction allows us to separate action from consequence. The same distinction
needs to be made between prior and posterior. In retrospect, a prior distribution
may appear to have been very bad. However, based on prior knowledge alone, it
may be the most logical assessment. The statement, "Suppose you have a 'bad'
prior" is essentially meaningless unless "bad" means that a careful judgment was
not used in prior assessment.
The purpose of this chapter is to introduce a "rational method" for mak-
ing decisions. By a "rational method," we mean a method which is internally
consistent—that is, it could never lead to a logical contradiction. The method
we will use for making decisions can be described in terms of influence diagrams.
So far we have only discussed probabilistic influence diagrams.
Example 10.2.1 (two-headed coins). Suppose your friend tells you that he
has a coin that is either a "fair" coin or a coin with two heads. He will toss the
coin and you will see which side conies face up. If you correctly decide which
kind of a coin it is, he will give you one dollar. Otherwise you will give him one
dollar. If you accept his offer, what decision rule should you choose? That is,
based on the outcome of the toss, what should your decision be? In terms of
probabilistic influence diagrams we only have Figure 10.2.1,
where
fair if the coin is fair,
two-headed otherwise;
and
if the toss results in a tail,
otherwise.
FIG. 10.2.1.
Clearly, if x = T, you know the coin does not have two heads. The question
is, what should you decide if x = H? To solve your problem, we introduce a
decision node which is represented by a box or rectangle. The decision node
MAKING DECISIONS USING INFLUENCE DIAGRAMS 167
Attached to the decision node is a set of allowed decision rules, which depend
on the outcome of the toss x. For example, one decision rule might be
d1 if x = T.
d2 if x = H.
= d1 for all x.
The influence diagram helpful for solving our problem is Figure 10.2.2.
In this diagram, the double bordered node represents a value or utility node.
The value node represents the set of consequences corresponding to possible pairs
of decisions and states (d,6). Attached to the value node is the value function
v(d, 6), a deterministic function of the states of adjacent predecessor nodes. In
this example,
{ $1
-$1
if d = d1 and
otherwise.
= fair or d = d2 and = 2 heads,
The reason for initially drawing the arc rather than the arc is that,
in general, it is easier to first assess p{8) and p{x \ 8) rather than to directly assess
p(x) and p(6 | x).
The optimal decision will depend on our initial information concerning 6,
namely, p(9). However, since 6 is unknown at the time of decision, there is no
a r c . At the time of decision, we know x but not 6. Input arcs to a decision
node indicate the information available at the time of decision. In general, there
can be more than one decision node, as the next example illustrates.
168 ENGINEERING RELIABILITY
a unit is inspected and found defective, additional units from another lot are
inspected until a good unit is found. (We make this model assumption since
all defective units that are found will be replaced at vendor's expense.) Figure
10.2.4 illustrates our production line problem.
We assume the inspection process is perfect; i.e., if a unit is inspected and found
good, then it will also be found good in the assembly test.
T h e all or n o n e rule. It has been suggested [cf. Deming (l986)] that the
optimal decision rule is always to either choose n = 0 or n = N, depending on
the costs involved and , the chance that a unit is defective. Later we will show
that this is not always valid.
In this example we consider the problem under the restriction that the initial
inspection sample size is either n = 0 or n = N. The decisions are n = 0 and
n = N. Figure 10.2.5 is an influence diagram for this problem, where
x = number of defectives in the lot and
y = number of additional inspections required to find good
replacement units for bad units.
We have already solved this problem in Chapter 5, section 5.3. The solution
is that n = 0 is best if and n = N otherwise. See Figure 10.2.5.
FIG. 10.3.2.
knowledge, not possible knowledge. In this sense the arc is different from arcs
between probability nodes that can indicate only possible dependence. Since
decision nodes imply a time ordering, the corresponding directed arcs can never
be reversed.
Directed arcs emanating from a decision node denote possible dependence of
adjacent successor nodes on the decision taken. These arcs can likewise never be
reversed.
Example 10.3.3 (selling a car). Suppose you plan to sell your car tomorrow,
but the finish on your car is bad. Your decision problem is whether or not to have
your car painted today in order to increase the value of your car tomorrow. Note
that tomorrow's selling price depends on today's decision d. Let x be yesterday's
blue book value for your car and c the cost of a paint job. Let 0 be the price
you will be able to obtain for your car tomorrow. Let v(d,8) be the value to
you of today's decision d and tomorrow's selling price 0. The influence diagram
associated with your decision problem is shown in Figure 10.3.2. Obviously you
cannot reverse arc [x, d], since today's decision cannot alter yesterday's blue book
value. Likewise, you cannot reverse a r c , since we cannot know today, for
sure, what tomorrow will bring.
The value node is similar to a deterministic node. What makes it different is
that it has no successor nodes.
DEFINITION 10.3.4. A value node is a sink node that
(1) represents possible consequences corresponding to the states of adjacent
predecessor nodes (a consequence can. for example, be a monetary loss or gain),
(2) has an attached utility or loss function which is a deterministic function
consequence represented by the value node itself.
A value node shares with a decision node the property that input arcs cannot
be reversed. However, if a value node v has a probabilistic adjacent predecessor
node, say, , and v is the only adjacent successor of 8 (as in Figure 10.3.2), then
node 8 can be eliminated.
We now give a formal definition of a decision influence diagram.
172 ENGINEERING RELIABILITY
Hence you might start by drawing the decision node d with the two allowable
choices d1 and d2. You might then draw the value node v, since the value node
denotes the objective of your decision analysis. Your objective is to calculate
v(d), the unconditional value function, as a deterministic function of the deci-
sion taken d. However, it may be easier to first determine the conditional value
function as a deterministic function of relevant unknown quantities as well as
perhaps prior decisions relevant to your problem.
DEFINITION 10.3.6. A value function is called unconditional if it only de-
pends on the decision taken and not on relevant unknown quantities. It is called
conditional when it also depends on relevant unknown quantities.
Since it is easier to think of the value function as a conditional deterministic
function of your decision and the property of the coin, say, 0, you also need to
draw a node for 0. Since 6 is unknown to you, node 0 is a probabilistic node.
Since the value node v depends on both the decision taken d as well as the
property of the coin 0, draw arcs [d, v] and [0, v\. Attach a deterministic function
to node v. Figure 10.3.3 shows the influence diagram at this stage of the
analysis.
You are allowed to see the result of one coin toss, and this information is
available at the time you make your decision. Hence draw a node x for the
outcome of the toss. Since the outcome of the toss (before you see it) is an
unknown random quantity for you, draw a probabilistic node for x. Since the
probability function for x depends on 0, draw arc [0, x] and assess p(x | 0) and
. Draw arc [x, 6] since you will know x at the time you make your decision
d. The diagram now looks like Figure 10.2.2, which we repeat below.
MAKING DECISIONS USING INFLUENCE DIAGRAMS 173
or
d2: decide two-headed.
174 ENGINEERING RELIABILITY
You must eliminate 9 since, as it stands, the optimal decision would depend on
9, which is unknown. If the arc [9, v] were the only arc emanating from 9 you
would be able to do this immediately by summation or by integration. Since
there is also another arc, namely, [9, x], emanating out of 9, this is not possible.
Solution of value added to substrate. Delete node y2 and all arcs into and
out of this node in Deming's first model. Note that when n = N, the STOP or
CONTINUE decision node does not apply since there is no remainder to inspect.
Also when n = N, x2 = 0 since there can be no defectives in an empty sample.
In general, we should not have an arc from a decision node to a probability
node. In the case of the x 2 node, the distribution of x 2 does not depend on
the decision taken but does depend on n a n d . The technical name of this
requirement is the "sure thing principle" or axiom 0 in Appendix B. Node n is a
special case since it is the initial condition determining sampling distributions.
See Figure 10.4.3.
At this point we can calculate the expected loss if we STOP and also if we
CONTINUE, namely,
if we STOP, the LOSS is k1(n +y1) + k2E[x2 |n,y 1 ];
if we CONTINUE, the LOSS is k1(N + y1).
We take that decision that has minimum expected loss given n,y1, thus elim-
inating the STOP or CONTINUE decision node. See Figure 10.4.7.
Next we calculate the minimum of the expected losses following either the
STOP or CONTINUE policy:
minimum[k 1 (n+ y1)+ k2E[x2| n , y 1 ] , k 1 ( N + y1)].
178 ENGINEERING RELIABILITY
Exercises
10.4.1. Two-headed coins. Let and determine your optimal
decision rule as a function of the outcome of the coin toss x a n d .
10.4.2. Deming's inspection problem. Figure 10.4.2 models Deming's inspec-
tion problem. Fix on n and N > n. Assume your prior for the percent defective
is Be(A,B). The optimal solution should depend on the costs, N,n,x\, and
y\. Note that at the time of decision, you do not know either x2 or y2.
(a) Why i s ?
(b) Determine the optimal inspection decision after sampling; i.e., should we
STOP inspection or CONTINUE and inspect all of the remaining items?
10.4.3. Find the optimal decisions for the sequential decision-making exam-
ple, Figure 10.2.3, as a function of the following prior for 6:
is the unique path from the root node to the top right-hand leaf node in Figure
10.5.2. There are 12 such paths corresponding to the 12 leaves in Figure 10.5.2.
All possible paths corresponding to leaves constitute the sample space. The
decision tree diagram can be used to enumerate all such possible states in the
sample space. Obviously, the paths to leaf nodes constitute a set of mutually
exclusive and exhaustive states corresponding to our decision problem.
D E F I N I T I O N 10.5.2. A decision tree is a rooted directed graph that is a tree
in the graph theory sense and in which
MAKING DECISIONS USING INFLUENCE DIAGRAMS 181
10.6. N o t e s a n d references.
Section 1 0 . 1 . An excellent discussion of decision influence diagrams is the
paper, "From Influence to Relevance to Knowledge" by R. A. Howard (1990).
Section 10.2. The two-headed coin example is due to Carlos Pereira. Ex-
ample 10.2.2 on sequential decision making is considered in much greater detail
in de Finetti (1954). He considers the problem of combining different opinions
in that paper.
Section 10.3. The intent of this section is to demonstrate the thinking
process behind influence diagram construction.
Section 10.4. The intent of this section is to demonstrate the thinking
process behind solving influence diagrams. The algorithmic process is called
backward induction since we start with the value node. There are several com-
puter programs available for solving decision influence diagrams. We mentioned
Netica and MSBN in Chapter 9, section 9.5.
Section 10.5. An introduction to decision trees is also available in Lindley
(1985).
APPENDIX A
We have not mentioned in this book most of the methodology of so-called clas-
sical statistics. The reason is that classical statistics is based on deductive anal-
ysis (the logic of mathematics), whereas statistical inference and decision theory
are concerned with inductive analysis (probability judgments). Why is classical
statistics logically untenable? D. Basu (1988). in his careful examination of the
methodology due to R. A. Fisher, summed up his answer to this question as
follows: "It took me the greater part of the next two decades to realize that
statistics deals with the mental process of induction and is therefore essentially
antimathematics. How can there be a deductive theory of statistics?"
To be more concrete, we consider two favorite ideas of classical statistics,
namely, unbiasedness and confidence intervals.
183
184 ENGINEERING RELIABILITY
To find this c, c o n s i d e r , and note that E(Y) = 1. Then we need only find
c such that
so that and
Suppose we accept this probability statement and that T is now observed. Con-
sider the following hypothetical bet:
(i) if we lose the a m o u n t ;
(ii) if we win the a m o u n t .
APPENDIX A 185
We can pretend that the "true" is somehow revealed and bets are paid off.
If we believe the probability statement above, then such a bet is certainly fair,
given T.
Now let us compute our expected gain before T is observed (preposterior
analysis). This is easily seen to be (conditional on A)
which is negative for all A > 0. Note that this is what we would expect were we
to make this bet infinitely often.
But this situation is again not self-consistent. Classical statistics is not self-
consistent though it does claim to be "objective" and "scientific." Objectivity
is really nothing more than a consensus, and scientists use induction as well as
deduction to extrapolate from experiments.
APPENDIX B
The influence diagram operations developed in Section 10.4 depend on the prin-
ciple that you should be self-consistent in making decisions. Bayesian decision
analysis is, above all, based on the premise that you should, at the very least,
be self-consistent in the way in which you make decisions. Self-consistency in
decision making has been defined by H. Rubin (1987) in terms of what he calls
the axioms of rational behavior.
Of course, by self-consistency we do not mean that you are not capable of
revising your opinion and possibly changing your decision (if allowed) upon re-
ceipt of new information. The Bayesian approach to decision analysis, which we
use in this book is self-consistent. But what is, perhaps, even more important is
that self-consistency implies Bayesian behavior.
The two influence diagram graph operations that we now justify are
(1) the elimination of decision nodes and
(2) the elimination of probability nodes.
187
188 ENGINEERING RELIABILITY
FIG. B.l.
FIG. B.2.
y and a gamble, gp, where you would choose x with probability p and z with
probability 1 -p. That is, c({y,gp}) = {y,gp}.
In addition to the axioms above, there is an axiom stating that for every deci-
sion problem based on 1, 2, or 3 actions only, the choice set should be nonempty.
This, together with the continuity axiom, is required for the construction of a
utility function. In addition, H. Rubin has required two purely technical axioms
to take care of the fact that we must deal with infinite sets since the use of
randomization will result in conceptually infinite choice sets. He proves, using
these five axioms, that if you are self-consistent in the sense of the axioms, then
using your choice set function is equivalent to basing your decision on a utility
function which is real valued and unique up to positive linear transformations.
Rubin's first theorem states that to base your decision on a utility function
v{d) means to prefer d' to d" if and only if v(d') > v(d").
FIG. B.3.
ematical justification for this elimination operation depends on the sure thing
principle or axiom 0 in Rubin's (1987) paper.
BARLOW, R. E., 1979, Geometry of the total time on test transform, Naval Res. Logist. Quar-
terly, 26, pp. 393-402.
B A R L O W , R. E. and R. C A M P O , 1975, Total time on test processes and applications to failure
data analysis, in Reliability and Fault Tree Analysis, R. E. Barlow, J. B. Fussell, and N.
Singpurwalla, eds., SIAM, Philadelphia.
BARLOW, R. E. AND K. D. HEIDTMANN, 1984, Computing k-out-of-n system reliability, IEEE
Trans. Reliability, R-33, pp. 322-323.
BARLOW, R. E. AND S. IYER, 1988, Computational complexity of coherent systems and the
reliability polynomial, Probab. Engrg. Inform. Sci., 2, pp. 461-469.
BARLOW, R. E. AND M. B. MENDEL, 1992, De Finetti-type representations for life distribu-
tions, J. Amer. Statist. A s s o c , 87, pp. 1116-1122.
BARLOW, R. E. AND C. A. PEREIRA, 1991, Conditional independence and probabilistic influ-
ence diagrams, in Topics in Statistical Dependence, IMS Lecture Notes-Monograph Series,
V. 16, H. W. Block, A. R. Sampson, and T . H. Savits, eds., Institute of Mathematical
Statistics, Hayward, CA, pp. 19-33.
BARLOW, R. E. AND C. A. PEREIRA, 1993, Influence diagrams and decision modeling, in
Reliability and Decision Making, R. E. Barlow, C. Clarotti, and F. Spizzichino, eds.,
Chapman and Hall, New York, pp. 87-99.
BARLOW, R. E. AND F. PROSCHAN, 1973, Availability theory for multicomponent systems, in
Multivariate Analysis III, Academic Press, Inc., New York, pp. 319-335.
BARLOW, R. E. AND F. PROSCHAN, 1975, Importance of system components and fault tree
events, Stochastic Process. Appl., 3, pp. 153-173.
BARLOW, R. E. A N D F . PROSCHAN, 1981, Statistical Theory of Reliability and Life Testing,
To Begin With, c/o Gordon Pledger, 1142 Hornell Drive, Silver Spring, MD 20904.
B A R L O W , R. E. A N D F . P R O S C H A N , 1985, Inference for the exponential life distribution, in
Theory of Reliability, A. Serra and R. E. Barlow, eds., Soc. Italiana di Fisica, Bologna,
Italy.
B A R L O W , R. E. AND F . P R O S C H A N , 1988, Life distribution models and incomplete data, in
Handbook of Statistics, Vol. 7, P. R. Krishnaiah and C. R. Rao, eds., pp. 225-249.
B A R L O W , R. E. A N D F . P R O S C H A N , 1996, Mathematical Theory of Reliability, SIAM, Philadel-
phia, PA.
B A R L O W , R. E. A N D X. Z H A N G , 1986. A critique of Deming's discussion of acceptance sam-
pling procedures, in Reliability and Quality Control, A. P. Basu, ed., Elsevier-North Hol-
land, Amsterdam, pp. 9-19.
BARLOW, R. E. AND X. ZHANG, 1987, Bayesian analysis of inspection sampling procedures
discussed by Deming, J. Statist. Plann. Inference, 16, pp. 285-296.
BASU, D., 1988, Statistical Information and Likelihood, Lecture Notes in Statistics 45, Springer-
Verlag, New York.
191
192 REFERENCES
BASU, D. AND C. A. P E R E I R A , 1982, On the bayesian analysis of categorical data: The problem
of nonresponse, J. Statist. Plann. Inference, 6, p p . 345-362.
B A Z O V S K Y , I., N. R. M A C F A R L A N E , A N D R. W U N D E R M A N , 1962, Study of Maintenance Cost
Optimization and Reliability of Shipboard Machinery, Report for O N R contract Nonr-
37400, United Control, Seattle, WA.
B E R G M A N , B., 1977, Crossings in the total time on test plot, Scand. J. Statist., 4, pp. 171-177.
B I C K E L , P. J. A N D D. B L A C K W E L L , 1967, A note on Bayes estimates, Ann. Math. Statist.,
38, pp. 1907-1911.
BlRNBAUM, Z. W., J. D. ESARY, AND A. W. M A R S H A L L , 1966, Stochastic characterization of
wearout for components and systems, Ann. Math. Statist., 37, pp. 816-825.
BlRNBAUM, Z. W., J. D. ESARY, AND S. C. S A U N D E R S , 1961, Multi-component systems and
structures and their reliability, Technometrics, 3, pp. 55-77.
CHANG, M. K. AND A. SATYANARAYANA, 1983, Network reliability and the factoring theorem,
Networks, 13, pp. 107-120.
COLBOURN, C. J., 1987, The Combinatorics of Network Reliability, Oxford University Press,
Oxford, England.
D A W I D , A. P., 1979, Conditional independence in statistical theory, J. Roy. Statist. Soc. Ser.
B, 41, pp. 1-31.
D E F I N E T T I , B., 1937, Foresight: Its logical laws, its subjective sources, Ann Inst. H. Poincare,
pp. 1-68; in Studies in Subjective Probability, 2nd ed., H. E. Kyburg, Jr. and H. E. Smokier,
eds., Robert E. Krieger P u b . Co., Huntington, NY, 1980, pp. 53-118 (in English).
DE FlNETTl, B., 1954, Media di decisioni e media di opinioni, Bull. Inst. Inter. Statist., 34,
pp. 144-157; in Induction and Probability, a cura di Paola Monari e Daniela Cocchi,
Cooperativa Libraria Universitaria Editrice Bologn, 40126 Bologna- Via Marsala 24, 1993,
pp. 421-438 (in English).
DE FlNETTl, B., 1970 (reprinted 1978), Theory of Probability, Vols. I and II, J. Wiley & Sons,
New York.
D E G R O O T , M. H., 1970, Optimal Statistical Decisions, McGraw-Hill, New York.
D E M I N G , W. E., 1986, Out of the Crisis, MIT Center for Advanced Engineering Study, Cam-
bridge, MA.
E P S T E I N , B. AND M. S O B E L , 1953, Life Testing, J. Amer. Statist. Assoc, 48, pp. 486-502.
F E Y N M A N , R., 1964 (sixth printing 1977), The Feynman Lectures on Physics, Vols. I and II,
Addison-Wesley, Reading, MA.
F o x , B., 1966, Age replacement with discounting, Oper. Res., 14, pp. 533-537.
F U S S E L L , J. B. AND W. E. V E S E L Y , 1972, A new methodology for obtaining cut sets for fault
trees, American Nuclear Trans., 15, pp. 262-263.
GERTSBAKH, I. B., 1989, Statistical Reliability Theory, Marcel Dekker, New York.
G N E D E N K O , B. V., 1943, Sur la distribution limite du terme maximum d'une sirie aleatorire,
Ann. of Math., 44, pp. 423-453.
H A H N , G. J. AND S. S. SHAPIRO, 1974, Statistical Models in Engineering, John Wiley, New
York.
H E S S E L A G E R , O., M. B. M E N D E L , A N D J. F. S H O R T L E , 1995, When to Use the Poisson Model
(and when not to), manuscript.
HOFFMAN, O. AND G. S A C H S , 1953, Introduction to the Theory of Plasticity for Engineers,
McGraw-Hill, New York.
HOWARD, R. A., 1990, From influence to relevance to knowledge, in Belief Nets and Decision
Analysis, R. M. Oliver and J. Q. Smith, eds., John Wiley, New York, p p . 3-23.
H O W A R D , R. A. A N D J. E. M A T H E S O N , 1984, Influence Diagrams, in T h e Principles and
Applications of Decision Analysis, Vol. II, R. A. Howard and J. E. Matheson, eds., Strategic
Decisions Group, Menlo Park, CA.
H0YLAND, A. A N D M. R A U S A N D , 1994, System Reliability Theory, John Wiley, New York.
LEITCH, R. D., 1995, Reliability Analysis for Engineers, Oxford University Press, Oxford,
England.
REFERENCES 193
LlNDLEY, D. V., 1985, Making Decisions, 2nd ed., John Wiley, New York.
LlNDLEY, D. V. AND M. R. N O V I C K , 1981, The role of exchangeability in inference, Ann.
Statist., 9, pp. 45-58.
LlNDQUlST, E. S., 1994, Strength of materials and the Weibull distribution, Probab. Engrg.
Mech., 9, pp. 191-194.
M E N D E L , M. B., 1989, Development of Bayesian Parametric Theory with Applications to
Control, Ph.D. thesis, MIT, Cambridge, MA.
MOORE, E. F . , AND C. E. SHANNON, 1956, Reliable circuits using less reliable relays, J.
Franklin Inst., 262, Part I, pp. 191-208, and 262, Part II, pp. 281-297.
PARK, C. S., 1997, Contemporary Engineering Economics, 2nd ed., Addison-Wesley, Menlo
Park, CA.
PEREIRA, C. A., 1990, Influence diagrams and medical diagnosis, in Influence Diagrams, Belief
Nets and Decision Analysis, R. M. Oliver and J. Q. Smith, eds., John Wiley. New York,
pp. 351-358.
R A I , S. AND D. P. A G R A W A L , 1990, Distributed Computing Network Reliability, IEEE Com-
puter Society Press, Los Alamitos, CA.
ROSS, S. M., 1975, On the Calculation of Asymptotic System Reliability Characteristics, in
Reliability and Fault Tree Analysis, SIAM, Philadelphia.
RUBIN, H., 1987, A weak system of axioms for "rational" behavior and the non-separability of
utility from prior, Statist. Decisions, 5, pp. 47-58.
SHACHTER, R., 1986, Evaluating influence diagrams, Oper. Res., 34, pp. 871-882.
SHORTLE, J. AND M. B. MENDEL, 1994, The Geometry of Bayesian Inference, Bayesian Statis-
tics 5, J. M. Bernardo, A. P. Dawid, J. Q. Smith, eds., Oxford University Press, Oxford,
England.
SHORTLE, J. AND M. B. MENDEL, 1996, Predicting dynamic imbalance in rotors, Probab.
Engrg. Mech., 11, pp. 31-35.
TSAI, P., 1994, Probability Applications in Engineering, Ph.D. thesis, University of California,
Berkeley, CA.
VAN NOORTWIJK, J. M., 1996, Optimal Maintenance Decisions for Hydraulic Structures under
Isotropic Deterioration, Ph.D. thesis, Delft University of Technology.
V E S E L Y , W . E., F . F. G O L D B E R G , N. H. R O B E R T S , A N D D. F. H A A S L , 1981, Fault Tree
Handbook, NUREG-0492, G P O Sales Program, Division of Technical Inf. and Doc. Control,
U.S. Nuclear Regulatory Commission, Wash. D.C. 20555.
VINOGRADOV, O., 1991, Introduction to Mechanical Reliability: A Designer's Approach, Hemi-
sphere P u b . Corp., New York.
WEIBULL, W., 1939, A statistical theory of the strength of materials, Ingeniorsvetenskap-
sakademiens Handlingar, 151, pp. 1-45.
Index
195
196 INDEX