Managing Human Error in Maintenance: Sandy Dunn
Managing Human Error in Maintenance: Sandy Dunn
in Maintenance
Sandy Dunn
Director, Assetivity Pty Ltd
Booragoon, Western Australia
I
n their ground-breaking work that led to the establishment
of the technique that we now know as Reliability
Centred Maintenance, Nowlan and Heap found,
when analysing the failures of hundreds of mechanical,
structural and electrical components in aircraft, that these
Numerous research studies have
Abstract
Conditional Probability
was intended to prevent the very B 2% 14% E
failures that occurred. Building on
the latest academic research, and
C 5% 68% F
based on practical experience, this
paper outlines the key things that
maintenance managers can do to
Figure 1 Aircraft components: Patterns of failure
reduce or eliminate the impact of
human error in maintenance. The In the context of this paper the interesting finding
key points that will be covered was that more than two-thirds of all components
include: exhibited early-life failure. It was estimated that,
between 1982 and 1991, maintenance errors
The inevitability of human
ranked second only to ‘controlled flight into terrain’
error - we ignore it at our peril; accidents in causing onboard aircraft fatalities
The role of an optimum PM (despite the application of RCM techniques in the
programme in minimising the airline industry during that period)2. Furthermore,
impact of human error; a study of coal-fired power stations indicated that
56% of forced outages occurred less than a week
The management of after a planned or maintenance shutdown3.
maintenance quality - an
essential element in the Other studies have been conducted which confirm
control of maintenance error; these findings, but until recent there has been little
Writing effective maintenance research into the reasons for this. Several plausible
theories have been proposed; possible explanations
task instructions that minimise
that I have heard including –
the possibility of human error.
‘Human Error’ – the repair or replace We can see from this that only one of these causes
task was not successfully completed due was unrelated to maintenance activities, and that
to a lack of knowledge or skill on the part such activities contributed to at least 80% of all
of the person performing the repair; IFSDs.
‘System Error’ – the equipment was
returned to service after a high-risk If poor quality maintenance causes so many
maintenance task without the repair having incidents in highly regulated and hazardous
been properly inspected and tested; industries such as nuclear power generation and
‘Design Error’ – the capability of the civil aviation, what proportion of failures is being
component being replaced was too close to caused by maintenance within your organisation?
the performance expected of it and lower
capability (lower quality) parts therefore failed What are the outcomes of maintenance-induced
during periods of high performance demand. failures? Clearly, depending on the industry in
The remaining higher capability (higher which you operate, there are potentially significant
quality) parts were capable of withstanding safety and environmental risks. There is a long list
all performance demands placed on them of catastrophic failures in which, the inadequate
performance of a maintenance task played a
‘Parts Error’ – the incorrect part or an
significant role. Some of these include –
inferior quality part was supplied.
Flixborough
More recently, Reason has compiled a table
summarising the results of three surveys conducted Three Mile Island
at nuclear power stations – two surveys performed Piper Alpha
by the Institute of Nuclear Power Operations American Airlines Flight 191
(INPO) in the USA, and one by the Central Research Bhopal
Institute for the Electrical Power Industry (CRIEPI)
Japan Airlines Flight 123
in Japan4. In all three studies more than half of all
identified performance problems were associated Clapham Junction
with maintenance, calibration and testing activities. etc. etc.
In comparison, on average only 16% of problems
occurred during operation under normal conditions.
But besides the obvious safety risks, perhaps the
Reason also quoted the results of a Boeing Study5 bigger consequences are economic. General Electric
which indicated that the top seven causes of in-flight has estimated that each in-flight engine shutdown
engine shutdowns (IFSDs) in Boeing aircraft were as costs airlines in the region of US$500,000. What
follows – could maintenance-induced failures be costing your
organisation?
incomplete installation (33%)
damaged on installation (14.5%) Clearly, we need to do something to reduce the
improper installation (11%) number of equipment failures that are being caused,
equipment not installed or missing (11%) not prevented, by maintenance. This paper suggests
foreign object damage (6.5%) that the most appropriate approach is to –
improper fault isolation, inspection, test (6%) admit that human error is inevitable (even
equipment not activated or deactivated (4%) in maintenance!) and design our systems
and processes around this inevitability;
use appropriate tools to ensure that we are
Enforce good housekeeping standards. at lower levels in the organisation. People must
Housekeeping practices are a good indicator not feel that reporting human failures is likely
of attitudes and culture relating to quality. The to lead to adverse personal consequences. Those
correct standards are those that avoid dangerous who have researched so-called ‘High Reliability
slovenliness, without resorting to anally-retentive Organisations’ (HROs) have noted that high levels
cleanliness. of failure reporting are a significant feature of those
organisations8.
Ensure that spare parts and tools are managed
well. Maintenance cannot perform high quality work
if the parts and tools that are needed are not available Put in place proactive processes for assessing the risk of
when required. This leads to potentially dangerous
future maintenance errors. Avoiding the recurrence of past
short-cuts and workarounds. An important aspect
of maintenance QM is ensuring that tool and spare failures is an admirable, but insufficient, goal for those
parts management processes and practices support seeking to achieve high quality maintenance outcomes.
the achievement of high quality work.
Write, and use, effective maintenance work Put in place proactive processes for assessing
instructions. Omission of necessary steps is the most the risk of future maintenance errors. Avoiding
common form of maintenance error. Some estimates the recurrence of past failures is an admirable, but
suggest that omissions account for more than half insufficient, goal for those seeking to achieve high
of all human factors problems in maintenance. The quality maintenance outcomes. One possible
development, and use, of effective maintenance work proactive method that could be employed to
instructions is an important tool in managing these proactively manage maintenance quality is to
types of errors, and will be discussed in more detail perform a risk assessment of maintenance activities,
in a later section of this paper. in order to assess whether the likelihood of human
error is high. Possible areas that could be assessed
Organisational measures could include
Put in place effective processes for analysing, - the knowledge, skills and experience of
and learning from, past failures. It is vitally maintenance personnel at all levels,
important that any significant failures should be - employee morale,
investigated using an effective Root Cause Analysis.
- the availability of tools, equipment and parts,
RCA, to be effective, should fully investigate all the
contributing causes to the failure, whether these - workforce fatigue, stress and time pressures,
be physical, human, or organisational. The most - shift rosters,
effective solutions to preventing these failures from - the adequacy of maintenance procedures and
happening again will be those that deal effectively work instructions.
with the organisational causes of failures.
One example of a risk assessment process that
However, in order to effectively analyse those failures is used in the aviation industry is Managing
that are occurring as a result of human failures, it Engineering Safety Health (MESH), which was
is also necessary to engender a ‘reporting culture’ developed initially by British Airways in the early
within the organisation – where all failures, no 1990s, and has been further developed and adapted
matter how seemingly insignificant, are reported. by Singapore Airlines4.
This, in turn, particularly when we are dealing
with human errors, requires the development of a In addition, more specific review and assessment of
high level of trust between management and those error detection and containment defences can be
performed. This could ask questions such as – who will be performing the work will be familiar
- Are there adequate processes in place for with the task (or be guided by someone who is).
independent inspection of high-risk tasks?
Group complex work instructions into phases,
- Are functional tests and checks ever with each phase consisting of many, related
omitted or abbreviated, for any reason? tasks. Remember that losing one’s place in a
- Have tasks ever been signed off as sequence is a frequent human error. We can
completed, when this was subsequently reduce the likelihood of this happening by
found not to be the case? grouping logically related tasks into phases. It is
- After maintenance, is equipment adequately much easier to remember that you are at Step 8 in
tested before being returned to service? Phase 4, rather than try to remember whether you
were at Step 48 or 49 in the entire sequence.
Ultimately, even putting both proactive and reactive
measures in place will not guarantee the absence Are written clearly, and use simple and
of human error, but together, these strengthen the consistent language. Once again, remember
organisation’s intrinsic resistance to human error. the type of person that is going to read the
instruction. Use language that you are sure they
will be familiar with. Be consistent in the use
WRITING EFFECTIVE WORK INSTRUCTIONS of terms. Is inspecting something the same as
As previously mentioned, some estimates suggest checking it? If not, then be sure that the reader of
that omissions account for more than half of all the instruction understands the difference. If it
human factors problems in maintenance. The is, then use only one term, and not the other in
development and use, of effective maintenance work order to avoid possible confusion.
instructions is an important tool in managing these Focus on the key risks that may prevent the
types of error. job from being performed safely and to the
required quality standard. For example, if there
What are the characteristics of a good Maintenance are certain dimensions that must be checked and
Work Instruction? Briefly, good work instructions – are critical to the subsequent operation of the
Are written with the person who is going machine, make sure that these are highlighted so
to read the instruction in mind. This sounds that the reader is aware of this – and always make
obvious, but in practice is not always so easy. sure that the required dimension is specified
We know that the person who is going to be in the work instruction, and easily readable. If
performing the task is a qualified tradesperson, certain steps MUST be performed in a specific
but we generally do not know the specific order, and there is a risk that they could be
individual who will be doing the work, and their performed in a different order, then make sure
familiarity with the task. Do we write the work that this is communicated clearly and strongly to
instruction assuming that this tradesperson has the reader.
never performed the task before? Do we assume
that they are very familiar with the task? Or Some other aspects of tasks that may represent high
something in between? To a certain extent, we risk are those that –
need to know more about the nature of the task.
- have been omitted or performed
If it is rarely performed, then it is probably safe to
incorrectly in the past,
assume that the tradesperson will be unfamiliar
with it. On the other hand, if it is frequently and - are associated with assembly or installation
regularly performed (such as a lubrication PM) (these tasks represent a much greater risk
then it is probably more likely that the person than disassembly or removal tasks),