Questions of Reliability Centered Maintenance
Questions of Reliability Centered Maintenance
Plucknette.
Abstract
Reliability-Centered Maintenance (RCM) is a phrase coined thirty years ago to describe
a cost effective way of maintaining complex systems. The RCM method uses the
answers to seven very basic questions to help determine the best maintenance tasks to
implement in an Equipment Maintenance Plan (EMP). This paper focuses on those
seven questions and how they help determine the EMP.
Introduction
On December 29th, 1978 F. Stanley Nowlan and Howard F. Heap published report
number A066-579, "Reliability-Centered Maintenance". The report was the culmination
of several years of work aimed at determining a new, more cost effective way of
maintaining complex systems. The called it Reliability-Centered Maintenance (RCM)
because programs developed through RCM "are centered on achieving the inherent
safety and reliability capabilities of equipment at a minimum cost". RCM is a time
consuming, resource intensive process. Many practitioners have tried to reduce the
amount of time and resources required to accomplish RCM projects with varying
degrees of success. The most successful ones have focused on understanding the basic
goals of RCM, and on the seven basic questions that need to be asked about each asset.
In this paper we will concentrate on understanding each of the seven questions and how
the answers to those questions help determine a Reliability-Centered approach to asset
management.
The Definition of Reliability
In the book Maintainability, Availability, and Operational Readiness Engineering
Dimitri Kececioglu defines reliability as:
"The probability that a system will perform satisfactorily for given period of time
under stated conditions."
Nowlan and Heap define Inherent Reliability as:
"the level of reliability achieved with an effective maintenance program. This level
is established by the design of each item and the manufacturing processes that
produced it. "
In The Fault Tree Analysis Guide a system is defined as:
"A composite of equipment, skills, and techniques capable of performing or
supporting an operational role, or both. A complete system includes all equipment,
related facilities, material, software, services, and personnel required for its operation
and support to the degree that it can be considered self-sufficient in its intended
operational environment."
When we look at these definitions in conjunction it becomes very evident that any asset
management program must address system development through all phases of a systems
life. There is no maintenance program that can improve the reliability of a poorly
designed system. Additionally, whatever maintenance program is developed is
determined by the design of the system and the goals of the organization.
The Goal of Reliability-Centered Maintenance (RCM)
The primary goal of Reliability-Centered Maintenance (RCM) should therefore be to
insure that the right maintenance activity is performed at the right time with the right
people, and that the equipment is operated in a way that maximizes its opportunity to
achieve a reliability level that is consistent with the safety, environmental, operational,
and profit goals of the organization. This is achieved by addressing the basic causes of
system failures and ensuring that there are organizational activities designed to prevent
them, predict them, or mitigate the business impact of the functional failures associated
with them.
The Seven Questions of RCM
There are seven basic questions used to help practitioners determine the causes of
system failures and develop activities targeted to prevent them. The questions are
designed to focus on maintaining the required functions of the system.
1. What are the functions of the asset?
2. In what way can the asset fail to fulfill its functions?
3. What causes each functional failure?
4. What happens when each failure occurs?
5. What are the consequences of each failure?
6. What should be done to prevent or predict the failure?
7. What should be done if a suitable proactive task cannot be found?
What Are The Functions of the Asset?
Every facility is uniquely designed to produce some desired output. Whether it is tires,
gold, gasoline, or paper the equipment is put together into systems that will produce the
end product. Each facility may have some unique equipment items, but in many cases
common types of equipment are just put together in different ways. Within every RCM
analysis we have two types of functions. First, the Main or Primary function, this
function statement will describe the reason we have acquired this asset and the
performance standard we expect it to maintain. Second, are the Support Functions,
which list the function of each component or maintainable item that makes up the
system. The Support Functions are provided by the bottom level of equipment in most
facilities such as pumps, electric motors, valves, rollers, etc. Each of those maintainable
items has one or more easily identifiable functions that enable the system to produce its
required output. It is the loss of these functions that lead to variation in the Main or
Primary function of the system and the safety, environmental, operational, and profit
output of the facility.
The key thing to remember when describing equipment functions is that we are
interested in what the equipment does in relation to its operating context, not what it is
capable of doing. For example, a cooling tower pump may be capable of pumping 100
gpm at 275 ft of head, but may only need to pump 75 10 gpm at that same pressure. It
is necessary to focus on the required and secondary functions within the system
operating context in order to analyze asset functions. Our main function statement for
this system would address the functionality within the operating context; Be able to
pump cooling tower water at a rate of 75 10 gpm at 275 15 ft of head while
maintaining all quality, health, safety and environmental standards.
The rate, the head requirement, quality, health, safety and environmental standards are
all performance standards for the pump.
Functions need to be well defined. Statements such as pump water from the pond
dont lend themselves well to understanding what functional failure would look like. A
statement such as pump 1000 100 gpm at 275 15 ft of head from the pond make it
easy to understand what a functional failure might look like. If we can only pump 800
gpm then we obviously have an unacceptable variation in output.
yet be considered failed. Many condition monitoring programs dont achieve their
desired output because those running the program do not recognize that a failure has
occurred as soon as an unsatisfactory condition is detected. They often try to run the
equipment as long as possible or until they get closer to the F of the P-F curve. At
Allied Reliability we call this managing to the F. More mature programs manage to
the P, meaning that they take action as soon as the unsatisfactory condition is
recognized. Remember, the further we go along the P-F curve the higher the level of
business risk we are accepting.
It is equally important to recognize that there is significant value in ensuring that
equipment is installed and commissioned properly.
The I-P-F curve shown above is the standard P-F curve with an I-P portion added. Point
I is defined as the point of installation of the component. The I-P portion of the I-P-F
curve is the failure free period. This is the time during which the operation is defect
free. The I-P interval for machines that were installed improperly may be just a few
seconds. The I-P interval for machines installed by well trained crafts people using well
designed procedures, precision techniques, and precise measuring equipment, and
commissioned by operators using well designed operating procedures may be years.
The graphic above shows what the I-P-F curve for two differently installed identical
machines might look like. The machine with the longer I-P interval was installed by
well trained crafts personnel using a properly designed procedure and precision
measuring devices, and commissioned by operators using a well designed operating
procedure. The machine with the shorter I-P interval was installed by inadequately
trained personnel using either no procedure or a poorly designed procedure without
precision measuring devices and techniques, and commissioned by operators using
either no procedure or a poorly designed procedure. The difference in lengths of the I-P
portions of the curve for the two pieces of equipment may represent large sums of
money. The dollars represent the additional cost of parts and labor and also the amount
of additional foregone production as a result of the extra maintenance work that had to
be performed.
Looking at an organizations shift in focus from F toward I is a more effective way to
determine its maturity than by looking at the age of their maintenance program. Many
organizations reactively maintain equipment for a long time. An organization that is
constantly focused on Point F and staying clear of it, will undoubtedly be a reactive
culture. Typical things heard around this organization might be How long can we run
it before it fails? and Just how bad is it?.
An organizations first step toward maturity will be to shift its focus from Point F to
Point P. The organization then focuses its efforts on understanding how things fail and
their ability to detect these failures early. Typical things overheard in this organization
may be something like: Is this the best way to detect these defects early? or I
appreciate you letting me know about this problem, even though its very early.
It is very important to describe these causes or failure modes in a way that allows us to
create a living program for improving asset management. Easy to use codes in the
Enterprise Asset Management (EAM) system will allow us to capture data about what
types of failures are occurring and to react to that data by reengineering the maintenance
plan, training plan, or equipment design associated with the equipment. A well
designed Failure Reporting, Analysis, and Corrective Action System (FRACAS) is a
must for continuously improving system performance.
For part failures we may want to use a simple three part code that consists of the part
name, part defect, and defect cause.
How would your company handle creating severity rankings for failures?
In most cases each failure will be ranked according to what is known as criticality. The
criticality is the result of combining probability and consequence rankings together to
yield a single number. The criticality will be a biased towards the businesss
philosophy of safety, environmental, and operational risk. The tasks in the Equipment
Maintenance Plan (EMP) generated from the RCM analysis are, designed to lower the
criticality of the significant failures in the system. Tasks can be rank ordered for
implementation by implementing those that yield the higher reduction in criticality
first.
What Should be Done to Predict or Prevent the Failure?
Each failure mode must be examined to determine what type of maintenance task, if
any, should be used to prevent or predict it. Nowlan and Heap recognized four basic
types of PM tasks.
Scheduled discard of an item (or one of its parts) at or before some specified life
limit
When and how these tasks are performed depends on the failure mechanism that is
present. In the original report six failure shapes were investigated. The team
determined that only 11% of the failure modes present in their study of aircraft part
failures would lend themselves to scheduled rework or replacement. In this instance
89% of the failure modes present would require some sort of inspection. The majority
of the failure modes, 63%, could actually be made worse by time based overhaul or
replacement. Clearly, some good non-invasive method of inspecting for potential
failures would be very beneficial.
Table 6, reproduced from the Nowlan and Heap report presents a comparison of the four
types of tasks and their applicability. For non-critical failures the order of preference
will generally be inspection, rework, and lastly discard or replacement of the item.
When Nowlan and Heap published their report in 1979 condition monitoring methods
such as vibration analysis (VA), ultrasonic inspection (UE), ultra-violet inspection
(UV), and other non-invasive technology based inspection methods were in their
infancy and were very expensive to deploy. Now, nearly thirty years later, technology
based inspection methods are relatively inexpensive and easy to deploy. These methods
are really nothing more than inspection methods that can be used on a periodic basis to
determine the condition of equipment. We can be almost certain that Nowlan and Heap
would have recommended extensive use of these technologies had they been readily
available.
In any case, the task chosen must either lower safety, environmental, or operational risk
to an acceptable level, or for non-critical failures be economically effective. Risk is
always the top driver in the decision making process. We may have to spend more
money to ensure that we meet our risk goals.
have written procedures in place to deal with the failure mode, and that proper spares
levels are maintained.
Conclusion
Answering the seven questions of RCM properly will yield a cost effective EMP that
achieves the business goals for safety, environmental, and operational risk. Answering
the questions properly requires a cross-functional team of maintenance, operations, and
engineering personnel who have a thorough understanding of how the asset works, and
what the organizations risk and profit goals are.