The Failure Analysis Process-An Overview
The Failure Analysis Process-An Overview
https://doi.org/10.1007/s11668-021-01328-y
TECHNICAL ARTICLE—PEER-REVIEWED
Abstract Failure analysis is a process that is performed define the causes of failure in a rather binary manner: was
in order to determine the causes or factors that have led to the part defective or was it abused? Obviously, there are
an undesired loss of functionality. This article is intended many types of defects, including those that come from a
to demonstrate proper approaches to failure analysis work. deficient design, poor material, or mistakes in manufac-
The goal of the proper approach is to allow the most useful turing. Whether those ‘‘defects’’ exist in a given component
and relevant information to be obtained. The discussion that is being subjected to the failure analysis process can
covers the principles and approaches in failure analysis often only be determined by someone with a materials
work, objectives and scopes of failure analysis, the plan- background. This is because many of the ‘‘defects’’ that
ning stages for failure analysis, the preparation of a people are looking for are visible only in a microscope of
protocol for a failure analysis, practices used by failure some sort. While microscopes may be widely available, the
analysts, and procedures of failure analysis. knowledge required to interpret the images is less widely
available. The other major type of defects, those related to
Keywords Failure analysis Planning stages design issues, may also require the assessment of a mate-
Protocol preparation rials engineer. This is because many design engineers are
not very familiar with the natural variations within a
material grade. Evaluation of the adequacy of a material or
Failure analysis is a process that is performed in order to process specification is often best performed by a materials
determine the causes or factors that have led to an unde- engineer.
sired loss of functionality. This Volume primarily Thus, materials experts have been in an excellent posi-
addresses failures of components, assemblies, or structures, tion to gain experience in the failure analysis process. The
and its approach is consistent with the knowledge base of a advent of more powerful and widely available scanning
person trained in materials engineering. The contribution of electron microscopes has helped provide a more fact-based
the materials engineer to the advancement of the scientific foundation for opinions that may have been heavily spec-
foundation of failure analysis has been great in the last few ulative in the past. Some materials engineers have become
decades. This is evidenced by the fact that many people very experienced in failure analysis. As materials engineers
have worked on some very spectacular failures—or on
failures that have caused great pain and loss—they have
been led to ask deeper and broader questions about the
2021 ASM International. This article is reprinted with permission
from Failure Analysis and Prevention, Vol 11, 2021 ed., ASM
causes that lead to failures. In many cases it becomes clear
Handbook, Brett A. Miller, Roch J. Shipley, Ronald J. Parrington, and that there is no single cause or single train of events that
Daniel P. Dennies, editors, ASM International, 2021, p 27–35, https:// lead to a failure. Rather, there are factors that combine at a
doi.org/10.31399/asm.hb.v11.a0006754. particular time and place to allow a failure to occur.
D. Aliya (&)
Sometimes the absence of any one factor may have been
Aliya Analytical, Fenwick, Michigan, USA enough to prevent the failure. Sometimes, though, it is
e-mail: daaliya@itothen.com
123
J Fail. Anal. and Preven.
impossible to determine, at least within the resources Professionally performed failure analysis is a multilevel
allotted for the analysis, whether any single factor was key. process that includes the physical investigation itself and
If failure analysts are to perform their jobs in a professional much more. This Section of the Volume shows the latest
manner, they must look beyond a simplistic list of causes thinking on how the different ‘‘layers’’ of the failure
of failure. They must keep an open mind and always be analysis process should work together, so that when the
willing to get help when beyond their own experience. analysis or larger investigation is complete, those involved
Many beginning practitioners of failure analysis may will have useful knowledge about how to avoid future
have their projects defined for them when they are handed a occurrences of similar problems.
small component to evaluate and thus may be able to fol- Failure analysis of the physical object is often defined as
low an established procedure for the evaluation. This is a part of a larger investigation whose intent is to prevent
especially true for someone working within an original recurrences. If we are to take the broadest view of what is
equipment manufacturer. If there is someone who has required to prevent failures, one answer stands out: edu-
significant experience and knowledge of the physical fac- cation. Education must happen at multiple levels and on
tors that tend to go wrong with the object and an multiple subjects within an organization, within larger
established procedure exists, then a particular analysis may cultural groups, and within society in general if we are to
not require extensive pretesting work. However, for the reduce the frequency of failures of physical objects. Edu-
practitioner who works in an independent laboratory or cation, of which job training is a single component, allows
who is looking at a wide variety of components, following people at all levels of an organization to make better
a predefined set of instructions for a failure analysis will decisions in time frames stretching from momentary to
generally prove to be an inadequate guideline for the career-long. There are many books available that contain
investigation. Established ‘‘recipe-type’’ procedures are exercises to help restructure knowledge into a more useful
generally inadequate for the more advanced and broad- and accessible form (see, for example, titles in the Selected
minded practitioner as well. References list in this article). Other books, such as [1], can
Although the failure that we are investigating is that of a help one learn to recognize incorrect lines of reasoning.
physical component, assembly, or structure, the failures The specific levels of failure causes that have been
that lead to such physical failure happen on many levels. In defined by the Failsafe Network include physical, human,
other words, a failure should not be viewed as a single latent, and root. Clearly, many involved with failure anal-
event. It is more useful to view both the failure and the ysis today (2020) call something a root cause when what
failure analysis as multilevel processes that can be explored they are referring to is a simple physical cause. If failure
in many useful ways. The physical failure—a fracture, an analyses are performed adequately and with luck, at the
explosion, or a component damaged by heat or corrosion— end the analyst should be able to take the causes found,
is the most obvious. However, there are always other levels show that the failure would have happened the way it did,
of failures that allow the physical event to happen. For and also show that if something different had happened
example, even a simple failure whose direct physical cause along the way, the failure would not have occurred or
was an improper hardness value has human factors that would have occurred differently. The fact that is often
allowed the improperly hardened component to be manu- revealed at the end of an investigation is that this is not
factured and used. These human factors are generally possible. Even a long and involved investigation leaves
difficult to investigate within a manufacturing organization, unknowns; the honest analyst is left to make a statement of
because cultures that allow a particular type of failure to the factors involved in allowing conditions that promoted
occur will generally not have systems in place that allow the likelihood of failure. This is still a useful task, perhaps
simple remedies to be enacted for the deeper-level causes. more useful than something that pins ‘‘blame’’ on a par-
For example, if someone in an organization wants to ticular individual or group. Understanding the factors that
investigate causes beyond the simple fact of improper promoted a failure can lead to an understanding of what is
hardness, it may be discovered that the incoming (receiv- required to realize improvement in durability and safety of
ing) inspection clerk was not properly trained to take note products, equipment, or structures. Sometimes this analysis
of reported hardness values. Changing a corporate culture will lead to an improvement in economics as well.
to include better training and education is generally diffi- Understanding goes beyond knowledge of facts; it requires
cult; many corporations are structured so that the people integration of facts into the knowledge base of an indi-
responsible for training do not have an open line of com- vidual so that the facts may be transformed into knowledge
munication to those doing the investigation. This only and then into product and/or process improvement.
increases the difficulty of implementing change to prevent Failure analysis is a task that requires input from people
failures. with many areas of expertise. A simple physical failure of a
small object may be analyzed by a single individual with
123
J Fail. Anal. and Preven.
basic training in visual evaluation of engineered objects. and the culture in which it was used. Communication skills
Even someone only participating in the ‘‘straightforward’’ are a must. For example, when you ask a question, do you
portions of the investigation of physical failure needs to know for certain what ‘‘yes’’ means? In some cultures,
know how his or her contribution fits into a bigger picture. ‘‘yes’’ means ‘‘I heard the question’’ and does not imply
Going to the level of using the failure analysis to improve that the answer is affirmative. Thus, failure analysts must
products and processes requires expertise in the various be well versed in multiple disciplines.
aspects of human relations and education, at the least. The failure analysis process can be approached in many
Failure analysis of a complex or catastrophic failure different ways. Most people who do failure analysis of
requires much more. structural components or larger-scale structures and
People who perform failure analysis as part of their job assemblies have likely run into someone who wanted to do
function must have an awareness of how their legal obli- a failure analysis without considering a contribution from
gations are defined. People who perform destructive testing an experienced materials engineer. While the analyst may
on a component that has failed may sometimes be held reach a conclusion in this manner, its value should be
accountable for the destruction of evidence on a personal questioned. A reliable understanding of what happened and
level. Thus, company employees must learn to protect why it happened requires the input of a competent mate-
themselves. Investigators who were ‘‘just doing the job’’ rials engineering practitioner. Every ‘‘failed’’ object is
have been successfully sued by parties that the judicial made of some material, and some common materials can
system determined had a legitimate interest in the outcome lose more than 90% of their usual strength if they are not
of the failure analysis project. processed properly. Clearly, prior to reaching a conclusion
The days where anyone unquestioningly agrees to as to the most significant causes of the failure, someone
destructively test a component that they know or can see should determine if the correct material was used and if it
‘‘has failed’’ should be over. This places the destructive was processed properly. This often requires both an
testing technician or engineer in a difficult position, investigation of documentation and a series of physical
because it is sometimes difficult to see that something has tests.
failed. Corporate cultures that are highly structured and This Volume focuses on the definition of and require-
hierarchical can be particularly difficult environments for ments for a professionally performed failure analysis of a
the failure analysis practitioner, because it may be difficult physical object or structure. However, many of the con-
to even find out if the component has failed. Even if that cepts for investigation that are described in this
information is given, relevant background details are often Section have much greater utility than for investigations of
difficult to obtain. Pressure to finish the analysis in a physical objects failure. The concepts in learning how to
shorter time frame than is desirable for a quality investi- define objectives, negotiate the scope of the investigation,
gation is common. look at the physical evidence, structure both the investi-
Failure analysts must realize that many are still unaware gation and the data that it reveals, and perform general
of what they have to offer in terms of allowing clients or problem solving have broad applicability in other areas of
fellow employees to replace speculation with facts. The business, manufacturing, and life in general. Examples of
people who request failure analysis work may not be aware how competent materials engineers can use these concepts
that rushing ahead into the destructive portion of an in a failure analysis or failure investigation are also
investigation may well destroy much information. emphasized here.
The remainder of this article and the following articles
in this Section of the Volume are intended to demonstrate
proper approaches to failure analysis work. The goal of the Principles and Approaches in Failure Analysis Work
proper approach is to allow the most useful and relevant
information to be obtained. Readers of the various articles A key principle of failure analysis is, first and foremost, to
will see many points of view demonstrated. All the valid preserve evidence. The analyst must make sure that any
approaches require planning, defining of objectives, and necessary information from the subject part or assembly in
organization prior to any destructive testing. Simultaneous the as-received condition is captured before anything is
preservation of evidence is also required. done to alter its condition. This principle can be summa-
The competent failure analyst needs to know more than rized by several guidelines:
the failure analysis process and the tools used to support it
• First, preserve evidence. This includes appropriate
and must understand the function of the object being ana-
photodocumentation or sketches.
lyzed and be familiar with the characteristics of the
• Perform tests in the order of less destructive to more
materials and processes used to fabricate it. The failure
destructive in nature.
analyst also needs to understand how the product was used
123
J Fail. Anal. and Preven.
• Document the positions and orientations of every cut specification and whether there was evidence of gross
made. misalignment or dimensional problems. After the investi-
• Know the limitations of one’s personal knowledge. gation was finished, and the grease had been dissolved in
• Know how to ask for help. solvent and discarded, another individual at that company
• Do not attempt a failure analysis if the basics of requested detailed information on the actual wear mecha-
specimen preservation, collection, and selection have nism. At that point, it was too late to analyze the lubricant.
not been studied. If either the client or the analyst had taken more time, a
• Know when to say no to performing a destructive sample of this material could have been set aside and
test. preserved.
Another key principle of failure analysis is that for all
Destructive testing includes anything that requires cut-
but the simplest routine investigations, there may be mul-
ting the part. However, even moving fragments of an
tiple, legitimate approaches. Selecting the most appropriate
explosion may cause loss of information that may have
of these approaches to problem solving in failure investi-
been determined from the position of the fragments.
gations is an important skill. The classical approach is to
Cleaning components can also be problematic; not cleaning
follow a list of steps, which generally include planning the
can lead to damage by corrosion in the case of many
investigation, performing background research, and writing
common materials, while cleaning may remove the sub-
the report, as well as the actual physical tests and evalua-
stances that caused or contributed to the failure or that shed
tions to which the component in question is subjected.
light on the nature of any physical degradation of the
A ‘‘recipe-style’’ list of steps is not adequate on its own
components. Sometimes, cleaning of dangerous or toxic
to guide a proper failure analysis. Each step of such a list
substances from the debris of a failure is necessary for the
has its own set of considerations relating to how it will be
safety of the investigator. The practitioner should also keep
performed. Knowing where to cut the part, knowing which
in mind that many tests described as nondestructive are
features will reveal useful data by examination with a
only relatively nondestructive. There are numerous inves-
scanning electron microscope, knowing where to do the
tigations during which the analysts representing different
hardness test, and so on are all critical aspects of per-
parties have spent considerable time trying to figure out
forming a failure analysis. Individual procedures associated
whether the dye penetrant residue detected is a result of the
with each stop on the list may provide guidance on where
test done after the failure or before the last service period.
and how to cut the parts. Experienced failure analysts who
Again, nondestructive is relative.
embrace the concept of using a methodology, rather than a
It should also be recognized that the problem of failure
recipe, will perform higher-quality, more useful work. A
analysis can be approached in different ways, depending on
methodology is more than a list of steps accompanied by a
the required depth and scope of analysis. Another key
series of procedures; it implies that the practitioner has the
principle in failure analysis work is knowing how to define
knowledge to avoid the common pitfalls that may entrap
the scope of the investigation at the proper time, so that the
the beginner. A methodology for failure analysis will also
investigation has the highest chance of allowing the
provide the mental structure to accommodate a range of
answers to the questions posed to become known. The
investigation types, from that required for a simple worn-
circumstances of failure problems can be diverse, and even
out component that caused a mere inconvenience, to a
the ‘‘simple’’ principles of failure analysis may be subject
massive catastrophic event. Methodologies are designed to
to interpretation and examination. Even the principle of
match the complexity of the investigation to the costs of
preserving evidence may involve judgments, as noted in
the failure, whether human or material. This means
the preceding paragraph (even the experienced analyst can
choosing multiple test locations or exemplar specimens, as
make mistakes). For example, an analyst received a bearing
appropriate.
from a regular client. The bearing had worn out prema-
Failure analysis is an iterative and creative process,
turely in a durability test. The analyst was requested to
much like the design process, but with reversed roles of
photograph the wear marks and check the hardness and
synthesis and analysis. Knowing which approach to use is
quality of heat treating of the races and balls. The bearing
at least as important as knowing how to use it. This article
was covered with a black greasy substance. (When ana-
describes some of the factors and conditions that may be
lyzing wear failures, it is often the lubricant properties and
considered when approaching a failure analysis problem. In
the wear particles that offer the most information about the
any case, whichever approach is taken, it is always
wear process.) The analyst informed the client of this, and
important to cultivate an open mind and to minimize the
the client made the decision to sacrifice the ‘‘dirty grease’’
temptation to reach a conclusion about the cause(s) of the
in the interest of completing the investigation in a shorter
failure before performing the analysis and evaluation. The
time. The client just wanted to know if the part met the
science of critical thinking has a principle called the
123
J Fail. Anal. and Preven.
confirmation bias, which refers to the tendency to look only gained may lead to an improved product that may be
for what one expects to find, that is, ‘‘Ye shall find only appropriate for a particular market niche, for example,
what ye shall seek’’ [1]. Humans have a general tendency long-life lightbulbs.
to see what they expect to see or to perceive things It may be that there is no true product failure. In fact,
according to preconceived expectations. As Mark Twain one important question is: ‘‘Did a failure really occur?’’ It
wrote, ‘‘To the man who wants to use a hammer badly, a lot is possible to have an undesirable event that involves
of things look like nails that need hammering.’’ If obser- fracture, wear, deformation, or corrosion but that is not
vation is limited to an expected outcome, helpful data may really a component failure. Fracture, wear, deformation,
be overlooked. and corrosion are all types of damage. Damage is a fact,
It is also important to appreciate the value of intuition while the term failure involves a complex series of inter-
and instinct. While the importance of observation and pretations. For example, discovering a fatigue crack in a
analysis can hardly be overemphasized, sometimes intu- 40-year-old structural component, in many cases, may be
ition and imagination can provide insights and a better less of a surprise than finding one that is free from such
appreciation of the ‘‘big picture.’’ For years, the famed cracks. For these and similar reasons, it is good practice to
19th-century Indian mathematician Ramanujan would, avoid the use of the term failed part. The terms subject
immediately upon awakening, write down theorems that part, damaged structure, subject component, or physical
had come to him in his dreams. Many of these theorems evidence are preferred.
remain unproven yet useful to mathematicians and physi- The objectives of failure analyses can vary, as suggested
cists today (2020) [2]. Another example is the discovery of by some of the different types of objectives listed in
the structure of the benzene ring by F.A. Kekulé. Many Table 1. Early in every investigation, those with an interest
high school science books report that during the period of should determine exactly what their objectives are. If the
time when Kekulé was trying to figure out how the carbon parties have a genuine desire to prevent recurrences, even
and hydrogen atoms were arranged within the molecule, he if they have no legal obligation to do so, they still must‘
had a vision in a dream of intertwined serpents, each biting decide how far they want to go toward this goal. Economic
the tail of the adjacent snake. Obviously, he and others and timing considerations usually determine the scope of
went on to use more scientific methods to demonstrate the the investigation. Aside from the cost of the investigation
correctness of his theory. Likewise, while any failure itself, a sound answer that comes after repeat failures may
investigation must stand or fall on the merits of the ana- be less valuable than a reasonably sound answer before
lytical work done, historically, too little credit has been repeat failures. Problems cannot be solved from the same
given to the intuitive function in engineering and scientific level of understanding in which they are created. Thus,
work in general. failure analysis requires a keen and inquisitive outlook and
can offer a satisfying way to keep learning on a technical,
professional, and personal level for an entire career.
The Objectives of Failure Analysis
The purpose of a failure analysis project is often to prevent Scope and Planning
a recurrence of the failure. However, there are many dif-
ferent types of failure analysis projects. Where an injury The scope of a failure analysis depends on the depth and
lawsuit is involved, for example, it may be important to complexity of the project. Many failure analysts have been
assign responsibility for an undesired event. There are told to find the root cause of a particular failure in 30 min.
other cases when there may never be a chance for a Usually, this is impossible or leads to superficial results.
recurrence. For example, if the item that fails is unique, The scope of the investigation must also be targeted toward
there may never be a repeat incident. finding the real root (physical or human) causes of the
Another case in which the objective of the investigation failure. The term root-cause analysis is frequently used but
may not be the prevention of recurrences is one involving a does not have a globally accepted meaning. Sometimes the
very minor failure of a low-value component. If there is no term root-cause analysis is misconstrued to mean figuring
other damage, it may be difficult to justify a prevention- out whether a component meets a specification. If it does
oriented project. It may be more economical to live with a not meet specification, then the lack of conformity to the
certain level of failure than to devote resources to pre- specification becomes a convenient root cause. This use of
vention. The work is still worthwhile, because if certain the term is not only superficial but also invalid. Any
economic situations change, there is background informa- approach that does not attempt to link the particular
tion available to support a broader investigation, in a more physical effect with the particular lack of conformity and
efficient manner, at a later time. Also, the understanding
123
J Fail. Anal. and Preven.
its direct expected consequences is not a valid failure personnel, and material suppliers. Input from failure ana-
analysis approach. lysts who evaluate broken, worn, corroded, or deformed
W.R. Corcoran’s ‘‘The Rootician’s Dictionary’’ [3] has parts from durability testing is an important source of
51 different definitions of root cause. Had each not been information that is often neglected by designers. Many
considered useful by someone, they would not exist. Dr. companies scrap their successful durability test specimens.
Corcoran, who worked in the nuclear industry, offered the This is a waste of a valuable resource, especially during
following commentary for the first of the definitions listed: prototyping and preproduction, when little is known about
‘‘A cause that is root. See ‘Cause.’ See ‘Root.’ A basic, the population of a particular part number. A ‘‘test pass’’
underlying, fundamental harmful factor.’’ He noted that component is a perfect reference specimen to compare and
this definition is plain English, not misleading or arcane, contrast to a ‘‘test failure’’ component. The results of a
but also not very informative. In fact, this definition is what mechanical and microstructure characterization of a ‘‘test
critical-thinking experts call a tautology, a term that uses pass’’ component may pay great dividends in any future
itself to define itself. failure analysis work.
During the planning stages on the possible scope of an
investigation, it is helpful to focus attention on the potential Avoiding Errors
complexity of the problem and on when the physical cause
may have occurred. Various categories of complexity are The failure analyst needs to be aware that sorting out the
listed in Table 2. These categories of complexity are not causes of failures can cause economic and noneconomic
exclusive. All failure analysis work must be founded on the (e.g., psychological) consequences to particular individuals
physical causes. Sometimes one is asked to determine a or companies who are implicated for carelessness, negli-
cause based on the verbal description of a component. In gence, ignorance, or other errors or omissions. Thus, it is
some limited cases, this may be all that is possible if the important to avoid mistakes during the failure investiga-
part is gone. However, the value of the failure analysis is tion, because they could cause as much harm as, or more
limited in those cases. Also note that a failure problem is harm than, the original failure.
defined in a broader context going from top to bottom of Analytical mistakes may be technical in nature, such as
Table 2. This is particularly important for failure analysis an incorrect measurement of a mechanical property. Ana-
within a manufacturing organization. Today (2020), many lytical mistakes may also be subtle. An example may be
companies manufacture components or assemblies for not questioning a suspicious hardness or composition data
other companies. Individuals at contract manufacturing point. Another example is an error in judgment of the
companies may not learn the importance of specified or significance of something that is normally a minor detail. If
unspecified requirements of the components they are this causes one to overlook things that bear close scrutiny,
manufacturing until they can learn as a result of a failure an incorrect conclusion may be drawn. Making sure that all
investigation. Progressive companies have been taking relevant details are examined can help point to a clear
advantage of their human resources for years by providing conclusion and is crucial to competent failure analysis
more training and education, because they know that it work. It is especially important to pay attention to any test
generally has a positive effect on profitability. result that is not consistent with the rest of the data.
It is also important to keep in mind when root cause(s) In situations that involve loss of life, human injury, or
could have been introduced. Table 3 lists some examples. It large economic damage, professional analysts should be
is often difficult to assign a failure to a single phase of very careful to work only within their areas of competence.
problem creation. Many times, there are complex interac- It is important to know the limits of one’s knowledge and
tions. Only an integrated design approach will create robust to know when to ask for help. In fact, input from people
processes and thus robust parts. The designer must com- from different areas will probably be involved in all but the
municate with personnel from the sales department, as well most basic physical-cause investigations. If the failure
as customers, maintenance personnel, manufacturing involves complex interactions of latent factors, an
123
J Fail. Anal. and Preven.
Marketing people are given additional training so that they do not give
Design and supervisory personnel are provided training on the subject
question is, where does one draw the line as to what is reasonable
Heat treatment department employees are sent to a basic metallurgy
testing. It does not matter that she fabricated test results, until the
• Thinking it is easy to do
pyrometer fails, causing incorrect temperature exposure during
Nobody tells furnace operator about the importance of hardness
cause. There may be some other factor that was the true
Individual people causes
cause, and that factor varied back out of its problem range
[1]. Thus, process troubleshooting to eliminate a manu-
facturing problem is not the same as failure analysis. Both
are useful, but only failure analysis provides the hard data
that allow the link between process and properties to be
documented.
123
123
Table 3 Physical causes and time of occurrence
Physical causes Example Likely effects of findings Alternative effect(s) of findings
Planning and Preparation action to document the test plan and what type of infor-
mation is expected from each step.
It should be clear that the objectives and scope should be If the investigation is relevant only to those within a
defined and understood by the stakeholders early in every particular organization, such as an in-process failure,
investigation. If resources for a complete and detailed sending a simple memo summarizing the test plan to those
investigation leading to a high degree of technical certainty responsible for the product may be all that is necessary. In
are not available, the investigator is encouraged to clarify many investigations, especially during those of parts in
for himself or herself, as well as the others involved, what process at a manufacturing plant, analysis of a similar part
is hoped to be determined after following a particular may be useful. Often the term control part is used, which
protocol. This clarification process is best done before any implies that the similar part is free from any of the
destructive testing. imperfections that may have contributed to the ‘‘failure.’’
Even a very limited investigation is by no means use- Often, however, very little is known about the reliability of
less. Often, the stakeholders have two or three failure the particular part supplied as a control. It may never have
scenarios in mind. It is often possible for the trained analyst been in service. For this reason, it is often preferable to use
to rule out some of these scenarios with a small amount of a term such as exemplar component, comparison specimen,
work. A case in point is when an automotive repair shop or reference part.
owner wanted to know if employee negligence had caused As previously noted, a list of possible tests can be useful
a premature fracture in an externally threaded fastener. The to ensure that nothing is unintentionally left out. Beginners
fracture occurred two months after the repair job. The looking at the finished reports of experienced analysts may
fracture was found by a different repair shop. There were think that the experienced practitioners followed a list of
no indications of progressive cracking. The people from the steps. Some of them may have followed a list, and ana-
second repair shop accused the first shop of gross negli- lytical service groups may try to sell their clients a package
gence. In this situation, it may be reasonable to suggest that that includes all the tests they can perform. However,
something other than the first mechanic’s negligence checklists only summarize, and, similar to an executive
caused the fracture. Even without revealing the whole summary, do not address all the detailed considerations of
story, the information provided with a simple fractographic an investigation. One example of a relatively simple report
evaluation was useful to those involved. summary is in Fig. 1. An example of a relatively limited
Guidelines on the preparation of a protocol for a failure test plan for a minor failure of a machine component due to
analysis may vary. For a part investigation, it may be a fracture of a shaft is given in Table 5. This same
simple checklist (Fig. 1) that is included in a client report. table could serve for a much larger investigation. Instead of
In larger investigations, other methods may be considered spending 1 or 2 h on each activity, the analyst may spend 3
to help plan and identify priorities (Table 4). Each has or 10 or more hours on some of the activities, making sure
advantages, drawbacks, and limitations in any given situ- that the locations evaluated are representative of the whole
ation. When planning the actual step-by-step activities of or characterizing the degree of variation from location to
the investigation, one should keep in mind that the degree location.
of comprehensiveness necessary will be determined to a
large degree not only by what the involved parties want to ‘‘Open-Mind/Open-Toolbox’’ Approach
know but also by how strong their desire is to know it. In
practice, the strength of the desire is measured in practical To cover a broad perspective on legitimate approaches to
terms by the budget and timing considerations. failure analysis, it should also be pointed out that some
In many cases, particularly when litigation is or may be people obtain useful results in an efficient manner without
involved, it is preferable to define and document how the referring to a list of test and evaluation procedures. Rather,
decision about the scope of the investigation is reached. keeping an open and inquisitive mind may allow the ana-
Sometimes in litigation projects, there is a reluctance to lyst to formulate a series of questions. To answer the
create documentation due to a fear that it may be used to questions, tests must be performed in an appropriate order
create a negative perception regarding the conduct of the (from least destructive to more destructive). Formulating a
investigation. Such fears must be balanced against the series of questions based on the personal knowledge and
potential negative perception created if the investigator ignorance of the analyst is a good way to begin to try to
cannot later describe exactly what was done, why it was understand how the subject of the investigation failed. One
done, and what findings were determined. At the very least, step that should never be skipped without a good reason is
for situations involving loss of life, bodily injury, or sig- the written list of activities and their associated specific
nificant economic damage, the professional analyst takes purposes.
123
J Fail. Anal. and Preven.
Fig. 1 Example of a simple checklist and report for a simple part (cracking of a brace)
This approach is referred to as the ‘‘open-mind/open- preclude use of this flexible approach in cases where
toolbox’’ approach. It can be flexible and, when used by a physical or financial injury results from a failure. However,
competent analyst, is often the most efficient approach. it can be the most useful and appropriate approach for
Legal requirements for preservation of evidence may failures that occur during routine quality-control or
123
J Fail. Anal. and Preven.
Table 4 Structured decision-making and problem-solving methods just data). Information is not provided by the test alone; it
for larger-scale investigations comes from the interpretation or evaluation of data. The
Method Most useful in this type of analysis tools of failure analysis are not just test machines and
analytical instruments. They also include conceptual tools
Failure mode Used during design of component and the that are essential in determining the cause of any given
and effects processes that are used to manufacture it.
analysis Allows a structured approach to figuring failure, for example, pattern-recognition skills (in the
out the consequences of failures of single interpretation of macrofractographs, microfractographs,
features on a component or single steps in and metallographic images) and engineering and scientific
a processing sequence knowledge based on physical metallurgy, polymer physics,
Kepner Decision analysis method for problem solid-state physics, stress analysis, chemistry, and many
Tregoe solving in complex systems and
prioritizing other fields.
Fault-tree An analysis method that provides a
analysis systematic description of the combinations Understanding the Cause
of possible occurrences in a system that
can result in failure. It is a graphical Understanding how the failure happened is also a large part
representation of the Boolean logic that
relates to the output (top) event. of understanding why the failure happened. It can be
Failure wheels Helpful in evaluating failures from a problematic to wait until the conclusion of the investigation
combination of factors or damage (when one has a pile of test results obtained from a pre-
mechanism (see the article ‘‘Determination determined set of activities) to try and formulate opinions
and Classification of Damage’’ in this about how or why the failure has occurred. If it is not
Volume)
suspected that something could have happened, one is less
Root-cause There are many approaches. Most of this
analysis work has been done in the nuclear and
likely to do the tests that could show whether it did or did
chemical industries, in response to not happen. Something as subtle as making the measure-
disastrous failures. It is also being used by ments 0.1 mm (0.004 in.) away from the preferred site may
medical administrators and many other conceal a fact. An investigation performed in ignorance of
fields.
how the object is supposed to function—and how it can
Commercial These exist for a variety of functions,
software including tracking the course of an
malfunction—is not a quality investigation. Knowledge of
tools investigation and suggesting a course of high-stress locations and operating conditions is critical to
action for the background data collection. being able to draw useful and credible conclusions. While
it is clearly important not to jump to conclusions before the
testing is completed, it is also highly impractical in most
prototype or durability tests (where no third parties are cases to perform a competent analysis in the total absence
injured), or when multiple exemplars of apparently similar of speculations about the manner and cause(s) of the fail-
failures are available. Even if the analyst is involved in a ure. This is an important point because many believe that
formal investigation with a protocol that is not very open to the failure analyst should operate ‘‘blindly’’ in order to
negotiation, he or she may still find this approach to be a avoid clouding his or her judgment. However, data may be
useful exercise before finalizing the investigation protocol. interpreted in various ways by different people, or even by
A key element to understanding the open-mind/open- the same person at different times or in different circum-
toolbox approach is realizing that a conceptual framework stances. This must be recognized in the overall context of
is, in an important sense, an investigative tool (Table 6). how the failure happened and how the various participants
For example, understanding that the orientation of a frac- (including the analyst) are acting. Clearly, the likely and
ture surface can indicate how the part was loaded allows alternative findings columns of Table 3 may be viewed
one to insist that the protocol include the resources for a differently by various participants, depending on the
proper macrofractographic evaluation in a failure due to overall situation of how a failure may have happened.
fracture. Fractography offers the conceptual basis for per-
forming the visual examination. Someone familiar with the
power of macrofractography would not leave this step out Practices and Procedures
in any fracture-related failure. In the open-mind/open-
toolbox approach, an investigative or analytical activity is The practices (and tools previously discussed in Table 6)
selected because its cost and time is likely to be favorably that may be used by failure analysts are of many types and
balanced by the information it provides (where information represent continuously developing knowledge of how to
here is loosely defined as ‘‘useful data,’’ as distinct from work in many diverse areas, including:
123
Table 5 Example of a test protocol for a failure of minor economic importance
123
Activity Purpose/objective Time frame
123
J Fail. Anal. and Preven.
and careful technician work in failure analysis can hardly also be useful to point out that the issue is not just whether
be overestimated. A sloppy technician who leaves scrat- a reference, even the present Volume, is generally
ches in micromounts can be excused if all that is needed is authoritative but whether it is relevant and correct when
a grain size or a total case depth. If the desired datum is the applied to a specific situation. Failures that appear to be
crack path at the origin of the crack or the depth of a similar may have significantly different causes. Commu-
corrosion pit, careful attention to detail and pride of nication skills, like any other technical skill, must be
craftsmanship are necessities. General procedures for each learned. The specific list of issues affecting word use,
of the physical analysis steps in failure analysis are given in including avoiding common traps set by opposing attor-
articles in the Section ‘‘Tools and Techniques in Failure neys, is not very long. The interested reader is advised to
Analysis’’ in this Volume. Detailed procedures regarding do a search on the internet for ‘‘communications training
equipment operation may be found in equipment manuals. for expert witnesses.’’ Many of the commercial companies
The technician in a failure analysis must also have that exist for the purpose of matching expert witnesses to
procedures that are specific to the components and alloys in attorneys offer such training. The expert witness, like any
question. It is crucial to document specimen location, successful public figure, needs to reflect carefully on every
position, and orientation, in the absence of detailed pre- potential shade of meaning that a particular phrase might
existing procedures. For example, when testing an exter- carry. If failure analysis is about evidence preservation,
nally threaded fastener, the standard test position for a litigation communication is about preparation. It is the
hardness test is a distance equivalent to one diameter in attorney’s job to make sure that the expert is prepared for
from the small end, at midradius. A common location for deposition or trial, and the expert should expect to get paid
effective case depth on small gears is at the center of the for the time the preparation requires.
gear thickness, at midheight of the tooth. The new tech- To communicate effectively with people who request
nician may find it difficult to find terms to describe the work, especially if the analyst is new to the field, certain
position easily, but it is good practice to develop an ana- knowledge should be acquired, including basics of orga-
lytical mind. In some cases, a sketch in the lab notebook nizational structures and the perspectives of the
may be an alternative. management who will use and evaluate the information
provided by the analyst. It is worth being aware of current
Importance of Communication legal trends, specifically issues regarding what the com-
pany wants employees to document and to refrain from
The importance of developing communication skills also documenting.
cannot be overestimated. The technician who does routine These issues are particularly important to the employee
certification work can be taught how to do a task and go on who seeks to prevent recurrences of failures. People who
with his or her work. With most failure analysis work, new request the work may skim and file the report and may not
decisions must be made each time a project is started. take the action anticipated. Even errors that seem well
Some of these decisions may be as simple as figuring out documented to the analyst may not be convincing enough
how to clamp the part in the saw—without damaging the to change a mind that has decided no changes need to be
fracture surfaces—during microspecimen extraction. In made, the failure will not recur, and that in any case, it is
this case, if the inexperienced technician knows how to too expensive to prevent any further failures. For example,
listen carefully to a senior technician, things will likely just because the failure analyst performs an outstanding job
work out satisfactorily. on the latest investigation and shows that a simple change
The lead investigator in a complex investigation obvi- in procedure could reduce scrap by 50%, there is no
ously requires much more developed communication skills, guarantee that the simple change will be made. The
including specialized knowledge of how not to be misun- International Organization for Standardization (ISO) 9000
derstood during a deposition or trial. Attorneys can make registration procedures require companies to adhere to a
opposing experts look less knowledgeable than they are by policy of decision-making for the common good of
asking if a certain reference is ‘‘authoritative.’’ If the expert humanity, so if a direct safety issue is involved, it may be
answers ‘‘yes,’’ the attorney may proceed to give a ‘‘pop easier to make changes than otherwise. However, these
quiz’’ on any page in the reference. Obviously, most guidelines are not exactly specific. Scrap reduction, while
experts have not memorized all of the useful reference clearly beneficial to both the bottom line and the envi-
books that exist. In addition, if something in the reference ronment (and thus humanity at large), may not appear to be
is in apparent conflict with something the expert has said, mandated by these ISO procedures.
the attorney will point it out. Now the expert is in a rather Many constraints make procedural changes difficult
weak position. A better description of any reference may within an organization. One of these constraints may be the
be ‘‘useful reference’’ or ‘‘widely used reference.’’ It may ISO 9000 procedures themselves. Before changes are
123
J Fail. Anal. and Preven.
made, extensive testing, evaluation, and multidepartment, Failure analysts who work with fractures must be
multilevel approvals may be required. Because this often familiar with macrofractography. Without skill in fractog-
takes quite a long time, other priorities may push the raphy, the wrong test location may be selected for
change to a very low priority level. Additionally, there are microfractography and metallography. Evaluating a loca-
many companies that have budgeted amounts for the cost tion unrelated to the crack initiation may be worse than not
of scrap but cannot shift money to a capital expenditure if a evaluating the material at all with these methods, because
new piece of monitoring equipment is needed. the key evidence may be destroyed during the destructive
Another important concept that affects implementation portion of the incorrect testing. Metallography and inter-
of findings is that people learn in different ways. Some pretation of microstructures are also key skills. The analyst
people absorb information easily while reading, while must be able to look at a microstructure and determine
some require a verbal presentation or flashy graphics. Some whether the material is typical of its supposed composition
need to tour the production line and watch the operation. and specified processing. This implies knowledge of
While a written report of some nature is usually required of interpreting phase diagrams and isothermal and continuous
the failure analyst, some sort of verbal presentation of the cooling curves. Basic understanding of crystallography and
findings is usually in order as well. Even if there is no micro- and macroscale composition effects is also required.
opportunity to do a formal presentation, speaking with the It is difficult to independently perform failure analysis
people who requested the work in person or on the phone work without basic literacy. People skills and understand-
can be helpful. Especially if one’s company prefers a for- ing of human nature are also important for the failure
mal written format, the reader may misinterpret your analyst. Failures can bring out the worst in people; thus,
findings or conclusions. A short summary of the salient during the background-information collection process of a
points in spoken form, especially useful before the written major failure, the analyst likely needs to ask questions that
report is delivered, can be very helpful. make people uncomfortable and defensive. It may be dif-
ficult for the analyst to determine whether correct
Knowledge Requirements for Failure Analysts information is being provided. The analyst may need
advice on how to encourage the people involved to tell
Some companies hire people with technician backgrounds what they know. Technical skill can help the analyst weed
to perform failure analysis work. While an engineering out some incorrect information, whether intended to mis-
degree is not necessarily required to perform this type of lead or confuse or given in ignorance.
work in a competent manner, a wide range of skills and Finally, some consideration of ethics is required of the
knowledge is required. Someone without an engineering failure analyst. Ethical issues involve decisions that may be
background will probably take some years to develop most difficult to make. Most people would probably not envy the
of the necessary skills to an adequate proficiency level to whistle blowers who decided to inform government
work on a variety of components. authorities that their companies were breaking environ-
For noncorrosion failures, the basic skills required mental regulations and who caused fellow workers to lose
include understanding fundamental concepts in stress their jobs. Many ethical issues are not clear. Confidentiality
analysis and mechanical property theory. While the analyst promised to a client by an outside failure analysis service
does not have to know how to perform complex stress provider may conflict with the engineer’s duty to protect
analysis in a quantitative manner, he or she should be able the public, if the attitude of the client is not in line with
to determine where the high-stress areas are by looking at a ethical principles. (In such a case, it may be helpful to
part and having someone describe the function. This may remind the client of the consequences of a repeat failure.)
be called a qualitative or heuristic stress-analysis skill, to Anyone doing failure analysis work may wish to study
recognize unexpected failure locations or unexpected fea- some codes of ethics specifically written for engineers.
tures on a subject component. Since the 1950s, engineers There are also some interesting works on ethical systems
have been gathering data from mysterious failures that that are conveyed by other means, including literature and
‘‘should never have happened,’’ because the operating the support of trusted colleagues and mentors.
stresses were much lower than the known strength of the
materials used to make the components. The competent
analyst of structural failures must be familiar with the basic References
concepts of fracture mechanics, fatigue crack propagation,
1. D. Levy, Tools of Critical Thinking: Metathoughts for Psychology.
and the causes of residual stresses.
(Allyn and Bacon, Boston, 1997)
123
J Fail. Anal. and Preven.
2. M. Kaku, Hyperspace. (Oxford University Press, Oxford, 1994) von R. Oech, A Whack on the Side of the Head: How You Can Be
3. W.R. Corcoran, ‘‘The Rootician’s Dictionary 2015.06.15,’’ https:// More Creative, 2nd ed. (Warner Books, 1990)
app.box.com/s/3abshz3xhsm917v1q8ns
4. D.P. Dennies, Boeing Co., private communication Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Selected References
123