MATH1299 - Maintenance and Reliability
MATH1299 - Maintenance and Reliability
1 MAINTENANCE .................................................................................... 1
1.1 PREVENTIVE MAINTENANCE VERSUS BREAKDOWN MAINTENANCE ....................................... 1
1.2 EXPECTED VALUE MODEL FOR OPTIMIZATION .................................................................. 2
1.3 EXERCISES ................................................................................................................. 3
2 RELIABILITY.......................................................................................... 5
2.1 FAILURE RATES ........................................................................................................... 5
2.2 EXERCISES ................................................................................................................. 7
2.3 CALCULATING RELIABILITY............................................................................................. 8
2.4 EXERCISES ............................................................................................................... 10
2.5 MEAN TIME BETWEEN FAILURES (MTBF) ..................................................................... 10
2.6 EXERCISES ............................................................................................................... 12
2.7 RELIABILITY OF SERIES AND PARALLEL SYSTEMS ............................................................... 13
2.7.1 Series Systems ....................................................................................................................... 13
2.7.2 Parallel Systems .................................................................................................................... 13
2.7.3 Series-Parallel Systems ......................................................................................................... 16
2.8 EXERCISES ............................................................................................................... 17
3 ANSWERS TO EXERCISES .................................................................... 19
1 Maintenance
Systems (machines, devices, etc.) are optimal when they are properly maintained. The goal of a maintenance
plan is to keep a system running and thereby avoid major downtimes which can be costly. There are two main
categories of maintenance:
Breakdown Maintenance which involves emergency repairs of components that have failed to
operate. This may be a costly approach when the operation of the system is critical to production.
Allocating more money and personnel to preventive maintenance is a strategy used to reduce the number of
breakdowns. As shown in the figure below, the more we spend on preventive maintenance, the less it costs
for breakdown maintenance.
That depends on how expensive the preventive maintenance is and how frequent the breakdowns are and
how costly they are to production. In the figure below, we see that the total cost will begin to rise when the
increase in preventive maintenance costs is greater than the decrease in breakdown maintenance costs.
Cost Total
Costs
Preventive
Maintenance
Costs
Breakdown
Maintenance
Costs
Maintenance
Optimal
Point
1
1.2 Expected Value Model for Optimization
In this section we examine how to determine whether or not a preventive maintenance plan actually reduces
the total cost of operations. This procedure involves four steps:
Step 2. Compute the total cost per unit time with no PM.
Again, using historical data, determine the cost of breakdowns per unit time.
Step 3. Compute the total cost per unit time with PM.
This involves the PM costs plus the breakdown costs per unit time. If a PM plan has been
implemented you can calculate the cost of breakdowns per unit time. Otherwise, use an estimate
based on the PM plan.
Example 1
Atticus Finch and Associates is a law firm with computerized processing and reports.
Over the past two years, the computer system has broken down as indicated:
Number of Breakdowns 0 1 2 3
Number of Months that Breakdowns Occurred 5 7 8 4
Each computer breakdown currently costs the firm $400 for emergency repairs.
The firm is considering a preventive maintenance contract that will cost $125 per month.
The contract will ensure an average of only one computer breakdown per month.
Should the firm adopt the new preventive maintenance arrangement?
Step 1:
Total number of breakdowns = 5 x 0 + 7 x 1 + 8 x 2 + 4 x 3 = 7 + 16 + 12 = 35
35 breakdowns
Expected number of breakdowns with no PM 1.4583 breakdowns per month.
24 months
Step 2:
1.4583 breakdowns $400
Total cost with no PM
month breakdown
$583.33 per month.
Step 3:
Total cost with PM $125 $400 $525 per month.
Step 4:
Accepting the preventive maintenance contract will result in average monthly savings of $58.33.
Therefore, the firm should adopt the new preventive maintenance agreement.
2
Example 2
Marcello Manufacturing employs An Ounce of Prevention at $450 per week for routine maintenance. For 30
weeks prior to the contract with An Ounce of Prevention, the company experienced the following pattern of
breakdowns:
Number of Breakdowns 0 1 2 3 4 5 6 7 8
Number of Weeks that Breakdowns Occurred 1 2 5 4 8 3 1 0 6
Every repair to machinery on an emergency basis costs $175 in service charges and lost production. Despite
the preventive maintenance, Marcello Manufacturing still experiences 2 breakdowns per week on average.
Should Marcello Manufacturing continue to employ An Ounce of Prevention?
Step 1:
Total number of breakdowns = 1x0 + 2x1 + 5x2 + 4x3 + 8x4 + 3x5 + 1x6 + 0x7 + 6x8 = 125
125 breakdowns
Average number of breakdowns with no PM 4.16 breakdowns per week.
30 weeks
Step 2:
4.16 breakdowns $175
Total cost with no PM
week breakdown
$729.17 per week.
Step 3:
Total cost with PM $450 2 $175 $800 per week.
Step 4:
The total cost with PM exceeds that with no PM, so the preventive maintenance schedule should be
terminated. Marcello Manufacturing can be expected to save $70.83 each week.
1.3 Exercises
1. Evan Enterprises has experienced the following computer breakdowns throughout the past year.
Number of Breakdowns 0 1 2 3 4
Number of Months that Breakdowns Occurred 1 3 2 5 1
Each failure costs the corporation $500 in repairs and production losses. The corporation is
considering a preventive maintenance schedule, which will limit the breakdowns to an average of one
per month. The preventive maintenance will cost the corporation $400 each month. Determine the
most viable maintenance arrangement.
3
2. ABC Electronics produces general circuit boards for the computer industry. Over the past 15 weeks,
their manufacturing process has experienced the following numbers of breakdowns that are quite
typical:
Number of breakdowns 2 4 3 2 5 4 5 3 2 6 7 2 4 3 2
To cut costs, they are entertaining proposals from three different preventive maintenance suppliers.
Alpha PM guarantees an average of only 1 breakdown/wk and charges $425/wk for their service.
Beta PM guarantees an average of only 2 breakdowns/wk and charges $350/wk for their service.
Alpha PM guarantees an average of only 3 breakdowns/wk and charges $300/wk for their service.
As Production Manager, you must decide which scheme is best for the company and justify your
decision to the board of directors. Which one, if any, should you recommend?
3. The manufacturing process in a steel rolling mill is subject to breakdowns. Over the past 18 months
there have been 81 breakdowns at an average cost of $750 each.
ACME Maintenance is offering a preventive service and they guarantee an average of only 2
breakdowns per month.
Suppose you want to hire ACME but only if your total monthly cost for breakdowns is $500 per month
less than it currently is. What is the maximum monthly amount you would be willing to pay ACME?
4. Dynamic DVD has multiple production lines. A breakdown on a line is not only costly to repair, but
also hinders the company’s production. Every breakdown is projected to cost the company $850 in
repairs and production losses. The following are the recorded breakdowns over the past 15 months.
Number of Breakdowns 0 1 2 3 4
Number of Months that Breakdowns Occurred 2 4 5 3 1
Dynamic DVD is considering a preventive maintenance schedule which will limit the breakdown to an
average of one per month. This package will cost the company $550 per month. Should they adopt
this maintenance contract? Why?
5. The main pump at a chemical plant is prone to breakdowns. Over the past year it has broken down 21
times. The average costs to repair each breakdown are $150 in parts, $300 in labour and $500 in lost
production.
The plant is entertaining hiring a preventative maintenance company. This PM company would
overhaul the pump once per month and thereby guarantee no breakdowns. The PM company would
supply the labour and parts, but there would be a short downtime of the pump each month resulting
in a lost production cost of $350. The PM company would charge the plant $1500 per month for this
service.
4
2 Reliability
Reliability is the probability that a device or system will perform the intended purpose, in the
specified manner, for a specified duration of time.
If we have a large number of devices that we can test over a long period of time, we can expect
some of them to fail as time goes by. We can compute the reliability of the device as:
Obviously, at the start of the test (t = 0), none of the items have failed so,
R(t) is the probability that a device or system experiences no failures during the time interval [0, t].
Note: Reliability can be expressed as a decimal number or as a percent (e.g., 0.875 or 87.5%).
But how can we predict reliability? For example, what is the probability that a device will work for 200 hours
without a failure?
The failure rate of a device or system (represented by the Greek letter λ, lambda) is defined as the probability
of it failing in one unit of time.
nf
t
ns t
For example, suppose that after 500 hours there are 200 surviving devices. In the next 2 hours, 6 devices fail.
The failure rate after 500 hours is given by:
6
500 0.015 or 1.5% per hour.
200 2
5
Example 1
A researcher tests 1000 devices and records the number of failures at 50-hour intervals.
For these data, the failure rates and reliabilities are calculated as shown in this table:
Survivors at
Failures in Failure Rate
Time start of Reliability
next 50h (per hour)
interval
0 1000 85 0.17% 100%
50 915 55 0.12% 92%
100 860 39 0.09% 86%
150 821 38 0.09% 82%
200 783 39 0.10% 78%
250 744 36 0.10% 74%
300 708 36 0.10% 71%
350 672 34 0.10% 67%
400 638 31 0.10% 64%
450 607 32 0.11% 61%
500 575 29 0.10% 58%
550 546 26 0.10% 55%
600 520 52%
38
For example, at 150 hours, 150 0.0009 0.09% per hour.
(821)(50)
708
For example, after 300 hours, R300 0.708 or 70.8% .
1000
6
Here are graphs of the failure rates and reliability:
Reliability
100%
80%
60%
40%
20%
0%
0 50 100 150 200 250 300 350 400 450 500 550 600 650
Time
2.2 Exercises
1. At a testing facility, 500 electronic calculators are tested continuously. Each 8-hour shift at the facility
records the number of failures that occurred during the shift.
When the 6th shift starts, 37 of the calculators have already failed and 7 fail on that shift.
When the 15th shift starts, 103 of the calculators have already failed and 6 fail on that shift.
When the 27th shift starts, 169 of the calculators have already failed and 5 fail on that shift.
When the 42nd shift starts, 235 of the calculators have already failed and 4 fail on that shift.
When the 62nd shift starts, 301 of the calculators have already failed and 3 fail on that shift.
a) Using the formula for λ(t) in this section, calculate the failure rate (per hour) at the start of each
shift.
b) Comment on your answers from part a).
7
2. Two thousand transistors were tested for 1200 hours and the results are tabulated below:
a) Using these data, calculate the failure rates per hour over time and graph your results.
b) Using these data, calculate the reliabilities over time and graph your results.
Early Life
The failure rate at the left side of the curve is sharply decreasing. This phase of the life cycle is known as
the Early Life. Initially, the failure rate is high, but quickly stabilizes. This can be attributed to “infant
mortality” – initially there are many failures because of a lack of adequate testing of the devices or a
defect in the manufacturing procedure. This levels off after an initial burn-in or other stress procedures.
Useful Life
The middle section of the curve characterizes a constant failure rate. It is during this Useful Life phase that
we will predict reliability.
Wear-out
The final phase, Wear-out, shows the failure rate increasing. This can be attributed to devices wearing out
or to a lack of maintenance. At this stage, devices should be repaired or replaced.
8
From Example 1 in the previous section, we saw the following failure rate:
This graph indicates an early life (infant mortality) phase, a useful life (constant failure rate) phase, but no
wear-out phase (likely because testing was limited to 600 hours).
Reliability calculations can be made only in the Useful Life phase of a device or system. So, in order to make
reliability predictions, we use only the time interval where the failure rate is constant. Because it is constant
over this interval, we represent it as simply λ instead of λ(t). The reliability is then determined by the following
exponential equation:
Rt e t
x
where e is a constant ≈ 2.718281. (Use the e key on your calculator.)
For example, suppose the failure rate, λ, is 10% or 0.10 per hour. Then, after 3 hours,
To see how good this estimate is, imagine we started with 1000 devices.
After one hour, about 10% of the 1000 devices will have failed, leaving 900 survivors.
After two hours, about 10% of the remaining 900 will have failed, leaving 810.
After three hours, about 10% of the 810 will have failed, leaving 729.
Thus, the reliability after three hours is 0.729 or 72.9%.
This result is close to that obtained using the exponential formula. The reason they differ is that the
exponential formula takes into account that failure can occur at any time – not just every hour.
If you look back at the graph of reliabilities in Example 1 in the previous section, it is interesting to note that
the graph resembles an exponential curve – except for the steeper section at the beginning.
9
Example 1
Referring to the example in the previous section (A researcher tests 1000 devices and records the number of
failures at 50-hour intervals ...), you should notice that there are time periods when the failure rate is
approximately constant. If we wanted to predict reliability for these devices, we would do so using the useful
life data where the failure rate is approximately constant at about 0.10% per hour. That is, λ = 0.001.
For example, we can predict the reliability of these devices after 300 hours:
2.4 Exercises
1. Think of some everyday objects that exhibit a bathtub curve for failures.
2. The failure rate of a heavily-used electronic gadget is fairly constant over time at around 0.05% per
hour.
a) Predict the reliability of this gadget after 100 hours.
b) Predict the reliability of this gadget after 1000 hours.
c) Predict the reliability of this gadget after 5000 hours.
d) Would you purchase one of these gadgets that has been used for 100 hours? 1000 hours?
5000 hours?
Reliability of a product is often expressed as the Mean Time Between Failures or MTBF. It refers to the average
time between failures for repairable items and is calculated as:
For example, suppose a power supply has failed 3 times over a period of 48000 hours.
Then the MTBF = 48000/3 = 16000 hours.
t
Rt e MTBF
10
T
Notice that if the probability of no failures before time T is given by RT e MTBF ,
T
then the probability of a failure before time T is given by 1 RT or 1 e MTBF .
Example 1
A manufacturer has determined that its stereos have an expected life that is exponential with a mean time
between failures of 5 years.
a) Determine the probability of failure before four years of ownership.
b) Determine the likelihood of experiencing the first breakdown after 6 years of service.
4
a) The probability of no failures before 4 years is R4 e 5 e 0.8 0.449 or 44.9%.
6
b) The probability of no breakdown before 6 years is R6 e 5 e 1.2 0.301 or 30.1%
Example 2
A watch has an MTBF of 4 years. Determine the probability of failure between the second and seventh years.
7
Probability of a failure before 7 years: 1 R(7) 1 e 4 1 e 1.75 1 0.174 0.826 or 82.6%.
2
Probability of a failure before 2 years: 1 R(2) 1 e 4 1 e 0.5 1 0.607 0.393 or 39.3%.
Probability of a failure between the second and seventh year is 0.826 – 0.393 = 0.433 or 43.3%.
Example 3
After 50 weeks of regular use, a product has a reliability of 45%. Determine the mean time between failures
for the product.
50
0.45 R50 e MTBF
50
50
ln0.45 ln( e MTBF )
MTBF
50 50
Rearranging, we solve: MTBF 62.62 weeks
ln( 0.45) 0.799
11
2.6 Exercises
1. A hard disk drive manufacturer claims that the MTBF for their drives is 500,000 hours. What is the
constant failure rate per hour? What is the reliability of one such drive after 1250 hours of continuous
operation?
2. Suppose a certain transistor has a reliability rating of 97% at 7500 hours. If the reliability has an
exponential distribution, what is the MTBF of these transistors?
3. A manufacturer of switches makes a switch that has a constant failure rate and an MTBF of 500
hours. Determine the reliability of these switches at 300 hours and at 1500 hours.
4. Another manufacturer produces switches claimed to have an MTBF of 1500 hours. What is the
probability of these switches operating without a failure for 300 hours? For 1500 hours?
5. In a certain computer lab at Niagara College, a computer mouse has an MTBF of 7500 hours.
a) What is the probability that such a mouse will last more than one year?
b) How many years will it take until the reliability of such a mouse is 10%?
6. A microwave oven has a useful life that is exponential with an MTBF of 36 months.
a) Determine the probability that a given oven will operate for at least 45 months.
b) Determine the probability that a given unit will fail sooner than 27 months.
c) After what length of service is it likely that 80% of the units will have failed?
7. A scientific calculator has an expected life that is exponentially distributed with an MTBF of 3 years.
a) Determine the probability that a calculator will last at least 5 years.
b) Determine the likelihood of a calculator lasting no longer than two years.
c) Determine the probability of a calculator failing between the first and third year.
8. What is the probability that a device will fail before it reaches its MTBF?
12
2.7 Reliability of Series and Parallel Systems
Most products or systems are composed of several components connected in series or parallel. To determine
the overall system reliability, the individual component reliabilities must be known. The configuration of the
components must also be established. The configuration may have the components linked in series, in parallel
or a combination of the two.
In a series configuration, each individual component must operate. If any one component fails, the entire
system fails. This configuration is observed in a string of light bulbs that fails to light if a single bulb is burnt
out.
The system reliability for a series configuration is less than any individual component reliability.
RS R1 R2 R3 ... Rn
R1 R2 R3 Rn
Example 1
Suppose we have a system composed of 4 devices in parallel. The devices have reliabilities of 90%, 95%, 75%
and 85%.
A parallel system consists of redundant components, which provide backup. The system will continue to
function as long as any one component is operational. Although such a system may be more expensive to
install, the benefit is that it is less likely to fail.
In a parallel configuration, the overall reliability of the system is greater than all the individual component
reliabilities.
13
Two Parallel Components
R1
R2
RS 1 1 R1 1 R2
Example 1
.96
.94
R2
R3
Rn
RS 1 1 R1 1 R2 1 R3 1 Rn
14
Example 2
Suppose we have four devices of reliability 85%, 70%, 90% and 87% connected in parallel as shown below.
.85
.70
.90
.87
When all the components have the same reliability, the formula for the system reliability simplifies to:
RS 1 1 R n
where R is the common reliability of each component and n is the number of components.
Example 3
Suppose a system has six redundant components in parallel, each with reliability 70%.
.70
.70
.70 RS 1 1 .76
1 0.36
.70
1 .000729
.70 0.999271
.70
15
2.7.3 Series-Parallel Systems
.80
.65 .80
.70
.75 .80 .90 .85
.90 .80
.85 .80
.80
A B C D E
RC 0.90
RE 0.85
RS RA RB RC RD RE
0.986875 0.99968 0.90 0.916 0.85
0.691
16
2.8 Exercises
.85
.79
3. A production line consists of four machines with reliabilities of .90, .80, .70, and .60.
a) The machines are arranged such that if one fails to operate, the entire line shuts down.
Determine the reliability of this system.
b) An alternate design would provide an identical backup line.
Determine the reliability of this design.
c) A final possibility would provide an identical backup for each machine.
Determine the reliability of this design.
.89
.74
.91
.85
.85
.85
.85
17
6. Determine the reliability of this system:
.60
.60 .65
.80 .70 .80
.60 .95
.50
.90 .85
.90
.60 .85
.70
.60
0
.80
.80
.80
.80
.80
.80
.80
.80
.80
.80
.80
.80
.80
.80
18
3 Answers to Exercises
1.3 Exercises
1. They should adopt the PM to save $183.33 per month.
2. Total costs:
Without PM: $684 per week
With PM: Alpha - $615 /wk, Beta - $730 /wk, Gamma - $870 /wk.
You should recommend Alpha to save $69 per week.
3. $1375 per month.
4. Yes. They will save $130 per month.
5. No. It would be $187.50 more per month than the current costs.
2.2 Exercises
1. a) 0.18898%; 0.18892%; 0.18882%; 0.18868%; 0.18844%
b) The failure rates are quite similar, suggesting a constant failure rate. We note that the number of
failures per shift must gradually decrease in time to maintain this constancy.
2. Survivors at
Failures in Failure Rate
Time start of Reliability
next 100h (per hour)
interval
0 2000 100 0.05% 100%
100 1900 55 0.03% 95%
200 1845 54 0.03% 92%
300 1791 48 0.03% 90%
400 1743 50 0.03% 87%
500 1693 47 0.03% 85%
600 1646 43 0.03% 82%
700 1603 44 0.03% 80%
800 1559 42 0.03% 78%
900 1517 40 0.03% 76%
1000 1477 38 0.03% 74%
1100 1439 38 0.03% 72%
1200 1401 70%
0.05% 80%
0.04%
60%
0.03%
40%
0.02%
0.01% 20%
0.00% 0%
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 0 100 200 300 400 500 600 700 800 900 1000 1100 1200
Time Time
19
2.4 Exercises
1. Examples could include:
Human life. Infant mortality is generally higher than later on in life. The death rate is then somewhat
constant during the average lifetime. It increases as people get older.
Cars. During the first few months problems due to assembly flaws, failing parts, etc. might occur more
often than later on in the cars life. With regular maintenance, the failure rate should remain constant
during the car`s average lifetime. As the car gets much older, many parts tend to fail due to wear and
corrosion.
2. a) 95%; b) 61%; c) 8% d) Perhaps after 100 hours, but certainly not after 5000 hours.
2.6 Exercises
1. a) 0.0002% per hour. b) 99.75%
2. 246231 hours.
3. a) 54.9% b) 5.0%
4. a) 0.819 or 81.9% b) 0.368 or 36.8%
5. a) 0.311 or 31.1% b) 1.97 years.
6. a) 0.287 or 28.7% b) 0.528 or 52.8% c) 57.9 months.
7. a) 0.189 or 18.9% b) 0.487 or 48.7% c) 0.349 or 34.9%
8. 0.632 or 63.2%
2.8 Exercises
1. 42.6%
2. 96.9%
3. a) 30.2% b) 51.3% c) 72.6%
4. 99.7%
5. 99.95%
6. 84.3%
7. 94.1%
20