Secnav Rme 2022
Secnav Rme 2022
RELIABILITY AND
MAINTAINABILITY
ENGINEERING
GUIDEBOOK
JUNE 2022
ii |
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
RELIABILITY AND
MAINTAINABILITY ENGINEERING
GUIDEBOOK
JUNE 2022
| iii
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
Early and upfront consideration and integration of R&ME by design is the only path to
success. It cannot be added in at the end. Throughout the Government, its field activities,
and contractors, there are reliability engineers. Seek them out and insert them into the
process early and often throughout the acquisition process.
Familiarize yourself with this guidebook and properly appreciate and implement R&ME,
especially the Design Phase essentials.
iv | F O R W A R D
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
This guidebook is structured to provide life cycle information on how to conduct an R&ME
program. A balance among capability, availability, reliability, and maintainability provides
systems to the warfighter at the most optimized O&S cost to ensure our Fleet’s readiness to
support its mission and promote national security. More information on these activities can
be found later in this guidebook. All Navy and Marine Corps acquisition and sustainment
programs should implement this guidebook.
Key References:
GAO report, “Defense Acquisitions: Senior Leaders Should Emphasize Key Practices to
Improve Weapon System Reliability” (GAO-20-151) [Ref 1].
Section 2443 of Title 10, United States Code (U.S.C.) [Ref 2].
GAO report, "Navy Shipbuilding: Increasing Focus on Sustainment Early in the Acquisition
Process Could Save Billions" (GAO-20-2) [Ref 3].
FORWARD |v
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
EXECUTIVE SUMMARY
This guidebook discusses a wide range of Reliability and Maintainability Engineering
(R&ME) roles, tasks, and opportunities in support of the Secretary of the Navy Instruction
(SECNAVINST) 5000.2 series. Initially, the R&ME role was to validate requirements (ensure
they are based in physics), translate user requirements (i.e., Sustainment Key Performance
Parameters (KPPs), Key System Attributes (KSAs) through analysis, block diagrams,
modeling, and predictions into well-defined contractual requirements. R&ME tasks and
opportunities will evolve, as part of the systems engineering and logistics team, to include
supporting analysis of alternatives, Failure Reporting, Analysis, and Corrective Action
System (FRACAS)-based measurement, assessment, and improvement of system attributes.
Reliability demonstration testing can be time consuming and resource intensive and needs
to be planned from the outset of the program.
vi | E X E C U T I V E SUMMARY
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
R&ME includes identifying, analyzing, and affecting design to improve life cycle
performance. The range of effort includes requirements
analysis and allocation, developing appropriate contract
language, Fault Tree Analyses (FTAs), Failure Modes and All of this can
Effects Criticality Analysis (FMECA), parts selection, stress be summed up
analysis, de-rating, physics of failure analysis, Test and in a simple
Evaluation (T&E), parts selection, and FRACAS to statement:
realistically achieve desired fielded system R&M attributes. Reliability and
Recognize that piece-part predictions can provide for a maintainability
sound relative assessment across differing contractor are operational
designs; however, they cannot be expected to accurately capabilities
depict field performance. It is the engineering design and, hence
features that will control or enable achievement of reliable,
design criteria.
sustainable, and affordable capabilities.
TABLE OF CONTENTS
EXECUTIVE SUMMARY......................................................................................... VI
1 | BACKGROUND AND IMPORTANCE .............................................................. 1
The Future of R&M Engineering ............................................................................................. 4
Implement Digital Engineering into Reliability and Maintainability ....................................................... 4
Deliver Reliable Software ............................................................................................................................................... 5
Reliability Estimates versus Observed Performance ......................................................................................... 6
Design Factors Versus Support Factors ............................................................................... 7
Product Support Strategy Impact to Operational Availability ..................................................................... 8
How Do Companies Do R&ME Well? .................................................................................... 10
Leverage Reliability Engineers Early and Often ............................................................................................... 10
Establish Realistic Reliability Requirements Based on Proven Technology ........................................ 11
Emphasize Reliability with Suppliers .................................................................................................................... 12
Employ Reliability Engineering Activities to Improve a System’s Design Throughout
Development ............................................................................................................................................................. 12
2 | GENERAL ...................................................................................................... 14
Standard Metrics ........................................................................................................................ 19
3 | R&ME IN THE ACQUISITION PROCESS ........................................................ 26
Policy .............................................................................................................................................. 26
10 USC 2443....................................................................................................................................................................... 26
DoDI 5000.88 .................................................................................................................................................................... 27
DoDI 5000.91 .................................................................................................................................................................... 27
SECNAVINST 5000.2G ................................................................................................................................................... 28
DON Gate 7 Sustainment Reviews Policy Memo ............................................................................................... 29
Guidance........................................................................................................................................ 31
Naval SYSCOM R&ME Guidance ............................................................................................................................... 32
Acquisition Life Cycle..................................................................................................................................................... 33
A. Materiel Solution Analysis ..................................................................................................................................... 57
B. Technology Maturation and Risk Reduction ................................................................................................. 57
C. Engineering Manufacturing and Development ............................................................................................ 59
4 | REQUIREMENTS DEVELOPMENT AND MANAGEMENT ................................ 64
Sustainment KPP ........................................................................................................................ 64
Sustainment KPP Requirements ............................................................................................................................... 65
Mandatory Attribute (KSA or APA) Requirements .......................................................................................... 66
Translating and Allocating KPP and KSA/APA Requirements into Contract
Specifications............................................................................................................................... 68
Allocating the Ao Requirement into Contract Specifications ...................................................................... 73
Reliability Attribute ....................................................................................................................................................... 74
Translating Reliability Attribute into Contract Specifications .................................................................. 76
Managing Data Sources ............................................................................................................................................... 78
Reliability Allocations ................................................................................................................................................... 80
Commercial Off-The-Shelf Hardware Selection ................................................................................................ 80
viii | T A B L E OF CONTENTS
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
TABLE OF CONTENTS | ix
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
Table 1: Selected Laws and DOD Reliability-related efforts over time ..................................... 2
Table 2: Typical Reliability, Maintainability, and Built-in-Test (BIT) Metrics .................... 20
Table 3: MCA R&M Engineering Activities ........................................................................................ 34
Table 4: Scorecard Disciplines and Sub-Areas ................................................................................ 97
Table 5: Compliance Value Scoring ...................................................................................................... 98
Table 6: Maturity Index Scale ................................................................................................................. 99
USS Thresher nuclear submarine crew and vessel lost in deep dive
1963
Submarine Safety Program (SUBSAFE) started
The Government Accountability Office (GAO) reported numerous DOD reliability problems,
most recently in its study entitled “Defense Acquisitions: Senior Leaders Should Emphasize
Key Practices to Improve Weapon System Reliability” (GAO 20-151) [Ref 1]. The DON
suffers for this lack of emphasis on reliability and maintainability engineering during the
operation and sustainment phases of the life cycle. A separate GAO report (GAO-20-2) [Ref
3] found that over the past 10 years Navy ships have required more effort to sustain than
planned, in part because the sustainment requirements do not provide key information on
how reliable and maintainable mission–critical systems should be. DOD guidance advises
acquisition programs to plan for and design reliability and maintainability into the weapon
system early in the acquisition effort.
Guidance on model and data exchange between R&M engineering and other
engineering models (data centricity)
From a program perspective, R&ME should be considered an effort to expose hidden risks.
Each R&ME analysis furthers the understanding of technical risk and enriches the
program’s understanding of the technical issues that must be addressed to meet the
warfighter’s needs. The reliability and maintainability of a system are established during
the design process, either actively or passively.
A passive R&ME approach allows designs to take form that may or may not be
sufficient to meet R&ME requirements during testing. Passive approaches often
culminate in systems not meeting requirements during late developmental tests or
operational tests. Unfortunately, by this point in development, only extreme cost and
schedule mitigation actions can improve the R&ME factors in the design. The standard
and undesired (but expected) strategy at this point is to implement product support
mitigations such as increasing spare parts, developing special tools, increasing
manpower, developing special training, increasing maintenance periodicity, etc. Or
worse, the program may simply find relief by relaxing the R&ME requirement to the
level achieved during testing.
Actively addressing R&M is accomplished by evaluating the system using specialized
R&ME analyses. Each R&ME analysis forms a vignette of the overall picture of the
system’s ability to achieve its mission in an operational environment at a specific
operational tempo. These analyses each build on the next and inform subsequent
R&ME and product support analyses. This iterative approach is the basis for
establishing a comprehensive R&M program tailored to the appropriate size and
complexity that will achieve the system requirements needed for mission readiness.
Design analyses implemented in accordance with approved R&ME program plans ensure
system designs are capable of acceptable R&M performance. The Government must actively
monitor the activities during in-process reviews and at established formal systems
engineering design reviews. Results of these activities are also used as a basis for review of
R&ME requirements in specifications and drawings and system support factors.
1GAO 20-151, “Defense Acquisitions: Senior Leaders Should Emphasize Key Practices to Improve Weapon System
Reliability,” Report to the Committee on Armed Services, U.S. Senate, January 2020. [Ref 1]
Consequently, early action is key, as indicated in the JCIDS, with a directed focus on R&ME
to improve readiness of future Joint Forces.
Close coordination between engineering and product support during the design phase will
maximize system Reliability and Maintainability. The Systems Engineering Plan (SEP)
needs to ensure the product support strategy is executable through a disciplined approach
to ensure that R&ME metrics are achievable. The product support strategy relies on the
design teams’ approach to meeting R&ME metrics to be successful. Therefore, the product
support activities must align with and support the design if the system is to achieve its full
reliability and maintainability potential during operation. Sustainment planning relies on
R&ME data and system design information to fully address the support planning elements.
However, benefits of coordinating the efforts of engineering and product support are not
limited to increased product support capability. The engineering effort also benefits from
close coordination by having a more complete understanding of how the system will be
supported. Having a better understanding of the support approach and its limitations
during the design process provides an opportunity to apply engineering solutions to
address supportability issues. The opportunity to address these supportability issues
during design is easily overlooked because engineering efforts are focused on achieving the
technical requirements derived from the Capability Development Document (CDD). It is
vital that design engineers understand the product support strategy and how their design
choices will impact the future maintenance burden of the system. The need for
maintenance is the prime driver of the logistics footprint and has a substantial impact
on Ao.
Ao represents the percentage of time the system is operationally mission ready. Ao is driven
by the reliability and maintainability of the system, combined with its product support
structure. Achieving the required levels of Ao is a matter of establishing the logistics
elements needed to address the R&M factors of the system. The logistics footprint is the
overall size, complexity, and cost of the logistics solution that is needed to achieve the
required Ao given the system’s reliability and maintainability. A smaller logistics footprint
is most favorable and can best be achieved when engineers make design decisions that
reduces this footprint.
Commercial companies understand that reliability engineering activities can add value to
decision-making by providing direction and feedback that helps development teams refine
designs that lead to more reliable and cost-effective systems. They believe reliability
engineers should be empowered to influence decisions, such as delaying overall project
schedules or negotiating for more resources when necessary. In addition, management
should provide sufficient resources and time dedicated specifically to improving reliability
by discovering failures, implementing corrective actions, and verifying their effectiveness
on outcomes. They understand that cost and schedule constraints can negatively influence
reliability testing, which can limit development teams’ ability to discover potential failures
and improve designs through corrective actions.
These companies rely on developing experienced reliability engineers. Some of the top
companies have a dedicated reliability engineering community that coaches members of
the company’s various product development teams. They focus on teaching development
team members to ask the right questions at the right point in time with the right people in
the room.
If the reliability requirements are not technically feasible, it could have broad implications
for the intended mission, life cycle costs, and other aspects of the system. These companies
understand the importance of making informed trade-offs when considering requirements
to reduce program risk or total ownership costs. Making trade-offs involving capability,
reliability, and cost requirements requires having the right people involved in these trade-
off decisions, and that they work with user representatives and reliability engineers to
define their systems’ reliability requirements.
Engaging the supplier early in the process, often during concept development, and asking
the supplier to demonstrate that it can meet the requirements is critical. This ensures that
the supplier can meet quality standards and there is enough lead time and testing of
components. Engineers at commercial companies work directly with the supplier and hold
them responsible for meeting reliability requirements. This includes visiting their
suppliers’ testing facilities and evaluating their testing programs, focusing specifically on
their failure analysis and reliability activities. Leading commercial companies use
disciplined quality management practices to hold suppliers accountable for high quality
parts through activities such as regular supplier audits and performance evaluations.
2 | GENERAL
This document is intended to complement SECNAVINST 5000.2 [Ref 4] by providing
guidance on the use of technical measures to produce Naval systems with desired
Reliability and Maintainability characteristics that support both the warfighter mission at
the cost the DON needs to maintain a fighting force into the future. It implements Naval
policy detailing the need to address reliability as a performance parameter and, hence,
design criteria. It provides a synopsis and timeline to implement a successful and effective
Reliability and Maintainability program for Program Managers and R&ME practitioners.
DOD policy and guidance generally requires program managers to develop a Reliability,
Availability, Maintainability – Cost (RAM-C) analysis that optimizes reliability, availability,
and maintainability (RAM) within cost constraints. R&ME includes all activities that
prescribe the designing, testing, and manufacturing processes that impact the system’s
RAM. The GAO has reported that O&S cost is driven by the system’s RAM qualities and
make up approximately 80% of a system’s LCC. More importantly, R&M must be an integral
part of the upfront design process. System stress and ease of maintenance are controlled
through the design. A system’s R&M factors significantly affect the performance and
sustainment of the deployed system. This is the basis for SECNAVINST 5000.2, emphasizing
and prioritizing rigorous and disciplined R&ME efforts early in the acquisition process.
Figure 4 shows three pillars of system effectiveness. Note that reliability and
maintainability directly affect two of them. Generally, a design’s R&M is be measured by
reliability metrics such as Mean Time Between Failure (MTBF), or maintainability metrics
such as Mean Time Between Repairs (MTBR); however, to affect these measures, R&ME
must start early to ensure design rules and practices are adhered to throughout the
development process.
14 | 2 | GENERAL
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
It is important that the Program Manager (PM) understands the importance and influences
of these design factors early in a program or as a part of any block upgrade, tech refresh, or
investment in supportability. The DON’s direction is to address reliability as a performance
parameter and, hence, a design criteria. To facilitate this, the PM is responsible for:
Decomposing the Sustainment KPP and Reliability, Maintainability, and O&S cost
either KSAs or additional performance attributes (APAs) into affordable and testable
design requirements; and
Developing sustainment requirements and resources for the design that will enable
systems effectiveness (reliability, dependability, and capability).
This approach places the emphasis on design practices that correlate to fielded system
performance. Figure 5 highlights the importance of proper reliability by design criteria. Of
the 14 programs listed, none met even half of their predicted reliability, proving that good
design practices are the key. R&ME includes calculating, assessing, and improving the
design to avoid deficiencies. The range of required effort includes stress analysis, de-rating,
physics of failure analysis, T&E, and FRACAS to realistically achieve desired fielded system
R&M attributes.
2Adapted from “Operational Availability Handbook: A Practical Guide for Military Systems, Sub-Systems and Equipment,”
Published by the Office of the Assistant Secretary of the Navy (Research, Development and Acquisition), NAVSO P-7001,
May 2018 [Ref 7].
2 | GENERAL | 15
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
The National Academy of Sciences observed that 75% of programs do worse during
Operational Testing (OT) than Developmental Testing (DT) 4. This lack of correlation
explains why reliability growth testing needs to be planned well into OT and fielding. Much
of this is due to the relatively benign test environment verses actual OT conditions. While
issues and correlation vary by program, from a DON standpoint it is clearly more cost
effective to mandate reliability by design rules and, as necessary, growth testing
throughout the life cycle.
Ao is a critical measure of mission readiness of fielded systems, however, its use as a metric
is inappropriate early in the design process. This is because Ao is a combination of
reliability (design controllable), maintainability (design controllable), and product support
factors (not design controllable). Including an Ao requirement in the contract allows
logistics planners to adjust spares quantities, in an attempt to decrease logistics delay
times, to compensate for design deficiencies.
3 National Research Council 2015. Reliability Growth: Enhancing Defense System Reliability. Washington, DC: The National
Academies Press, page 112. https://doi.org/10.17226/18987 [Ref 8].
4 Reliability Growth: Enhancing Defense System Reliability [Ref 8].
16 | 2 | GENERAL
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
For the design of ships, NAVSEA places Ao into the contract specification requirements to
define the ships readiness requirements to support each mission that the ship is designed
to perform. This allows the ship design agent and the Government to define mission critical
systems to support each mission area. The ship design agent must predict the Mean Time
Between Failure (MTBF) and Mean Time To Repair (MTTR) for mission critical equipment
and identify space and weight for spares necessary to ensure those systems can remain
available throughout each deployment. With limited space onboard each ship for spares,
the proper balance between reliability and maintainability is critical to ensure ship’s
readiness. The level of reliability needed of each system on a ship is dependent on the
ability to repair or replace the items at sea and its criticality should the item fail during a
mission. Designing for maintainability and provisioning of onboard spares are included for
those items that are mission critical and can be easily repaired or replaced at sea, while
mission critical items that are non-repairable or must be sufficiently reliable to ensure
ships readiness throughout each deployment. Optimizing reliability and maintainability
design controllable features are balanced with provisioning of spares to ensure ships
readiness. This optimization is documented in the RAM-C Rationale Report.
2 | GENERAL | 17
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
Figure 7 shows that life cycle cost is comprised of system acquisition cost and operation
and support costs. Of note, many fielded systems exist 30 or more years before disposal.
The system acquisition cost ranges between 20-40% of the life cycle cost, while operation
and support cost ranges between 60-80% of the life cycle cost.
System Acquisition
60% - 80%
20% - 40%
30+ years
In addition, Figure 8 shows the relationship of cost committed to cost expended across the
defense acquisition phases. It is important to point out that by MS B, about 70% of the life
cycle cost is committed while less than 10% of the life cycle cost is expended. By Full Rate
Production Decision, about 90% of the life cycle cost is committed, while about 20% of the
life cycle cost is expended. So, acquisition decisions (and associated decision artifacts
including requirements and contracts) have a significant impact on committing a
significant portion of the life cycle costs. Thus, it is important that the requirements and
contract deliverables are well thought out in support of these major program decisions.
5Adapted from Dallosta, Patrick M and Simcik, Thomas A. “Designing for Supportability: Driving Reliability, Availability,
and Maintainability In...While Driving Costs Out.” Defense AT&L: Product Support Issue, March-April 2012, page 35. [Ref 9]
18 | 2 | GENERAL
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
40
Cost
Expended
20
0
IOC
Technology Engineering & Production & Operations &
Maturation Manufacturing Deployment Support
& Risk Development FRP
Reduction OT&E Decision
Review
Optimizing system reliability and maintainability, through the RAM-C Rationale Report,
will minimize the program’s O&S cost through the reduction in spares and sound
maintenance activities required to restore lost functions.
STANDARD METRICS
This section presents typical reliability, maintainability, and Built-in-Test (BIT) metrics, as
shown in Table 2 on the following pages. These are examples of metrics that are typically
used on programs. Every metric will not apply. The R&M engineer will need to determine
the appropriate metrics for the program based on the goals and intent of the respective
metrics. For the reliability metrics, it should be noted, that “time” must be expressed in
mission-relevant units of measure (e.g., hours, rounds, cycles, miles, events, etc.). It does
not need to tie exclusively to “clock time.”
6 Adapted from Dallosta, Patrick M and Simcik, Thomas A., page 37. [Ref 9]
2 | GENERAL | 19
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
RELIABILITY
RM Mission Reliability: The measure of the ability of an item to perform
(Note 1) its required function for the duration of a specified mission profile,
defined as the probability that the system will not fail to complete the
mission, considering all possible redundant modes of operation. (Per
JCIDS 2021; Figure B-23 - Recommended Sustainment Metrics)
𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻
𝑅𝑅𝑀𝑀 =
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹
RL Logistics Reliability: The measure of the ability of an item to operate
(Note 1) without placing a demand on the logistics support structure for repair
or adjustment, including all failures to the system and maintenance
demand as a result of system operations. Logistics Reliability is a
fundamental component of an O&S cost as well as Materiel Availability.
(Per JCIDS 2021, Figure B-23 - Recommended Sustainment Metrics) The JCIDS
definition for Logistics Reliability is a demand-based definition, not a
failure-based definition. From an engineering perspective, logistics
reliability measures the ability of an item to operate within its
specified limits, for a particular measurement period under stated
conditions. The failure of a redundant component that does not affect
mission completion is a logistics failure, but not a mission failure.
𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻
𝑅𝑅𝐿𝐿 =
𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷
Demand versus failure: If an end item (aircraft, ship, etc.) has multiple
instances of the same component, the demand is calculated at the end-
item by taking the end item’s operating hours and dividing it by the
number of demands. The component’s reliability would be calculated
by multiplying the end item’s operating hours by the number of
instances of the component and then dividing it by the number of
failures. For example, if an aircraft has four engines, 100 flight hours,
and five failures:
Aircraft Mean Flight Hours Between Demand = 100 flight hours / five
failures = 20 hours
Engine Mean Flight Hours Between Failures = (100 flight Hours x 4
engines) / five failures = 80 hours
20 | 2 | GENERAL
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
2 | GENERAL | 21
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
MAINTAINABILITY
MTTR/ Mean Time To Repair (Hardware), Mean Time to Recover or
MCMT Restore (Software) / Mean Corrective Maintenance Time: Mean
(Notes 3, 4, and 6) Time To Repair, also referred to as Mean Corrective Maintenance Time,
is a basic measure of maintainability.
MTTR / MCMT measures the average time required to bring a system
from a failed state to an operational state. It is strictly design
dependent, as it does not include logistics or administrative delay
times. The sum of the corrective maintenance time (clock hours)
divided by the total number of corrective maintenance actions. The
corrective maintenance time includes fault isolation, access, removal,
replacement, and checkout. This alone is not a good measure of
maintenance burden as it does not consider the frequency of corrective
maintenance, nor the man-hours expended.
Each “Mean Time Between” reliability parameter will have an
associated MTTR / MCMT. For example, MCMTOMF is mean time
required to perform corrective maintenance for operational mission
failures associated with the MTBOMF reliability metric.
𝛴𝛴 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝐻𝐻𝐻𝐻/𝑆𝑆𝑆𝑆) =
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑜𝑜𝑓𝑓 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴
MAXTTR##% Maximum Time to Repair: The maximum repair time associated with
(Note 2) some percentage of all possible system repair actions. For example,
MAXTTR90% requires 90% of all maintenance actions are completed
within the required time. Creates a limitation on the overall time
required for performing on-equipment maintenance. Combining this
with MTTR further defines the maintenance burden. MAXTTR is useful
in special cases where the system has a tolerable Down Time. An
absolute maximum would be ideal but is impractical because some
failures will inevitably require exceptionally long repair times.
22 | 2 | GENERAL
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
ALDT Administrative and Logistics Delay Time is the time spent waiting for
(Note 4) parts, administrative processing, maintenance personnel, or
transportation per specified period. During ALDT, active maintenance
is not being performed on the downed piece of equipment.
2 | GENERAL | 23
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
BUILT-IN-TEST/HEALTH MONITORING
PCFD Probability of Correct Fault Detection: A maintainability measure
(Note 3) for the effectiveness of Built-in-Test (BIT). The measure of BIT’s
capability to detect failures/faults correctly.
𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑜𝑜𝑜𝑜 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹/𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 𝑏𝑏𝑏𝑏 𝐵𝐵𝐵𝐵𝐵𝐵
𝑃𝑃𝐶𝐶𝐶𝐶𝐶𝐶 =
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑜𝑜𝑜𝑜 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹/𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹
BFAh BIT False Alarms per hour: A maintainability measure for the
(Note 3) effectiveness of Built-in-Test (BIT). A BFA indicates a failure, where
upon investigation, it is found the failure cannot be confirmed. The
number of incorrect BIT failure/fault indications of failures that
occurred per hour of operating time. Calculated as:
𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑜𝑜𝑜𝑜 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 𝐵𝐵𝐵𝐵𝐵𝐵 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹/𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼
𝐵𝐵𝐵𝐵𝐵𝐵ℎ =
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑙𝑙 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇
24 | 2 | GENERAL
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
2 | GENERAL | 25
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
POLICY
The reliability and maintenance engineering policy for the Department of Defense has been
established in Title 10 US Code (USC) 2443, “Sustainment Factors in Weapon System
Design” (26 August 2021) [Ref 2]; DoDI 5000.88, “Engineering of Defense Systems” (18
November 2020) [Ref 16]; DoDI 5000.91, “Product Support Management for the Adaptive
Acquisition Framework” (4 November 2021) [Ref 17]; SECNAVINST 5000.2G, “Department
of the Navy Implementation of the Defense Acquisition System and the Adaptive
Acquisition Framework” (8 April 2022) [Ref 4], and the DON “Gate 7 Sustainment Reviews”
Policy Memo (27 September 2021) [Ref 18].
10 USC 2443
Title 10 USC 2443 “Sustainment Factors in Weapon System Design” states in part:
The Secretary of Defense shall ensure that the defense acquisition system gives ample
emphasis to sustainment factors.
The requirements process shall ensure that R&M attributes are included in the
Sustainment KPP.
Solicitation and Award of Contracts shall:
– Include clearly defined and measurable R&M requirements for engineering
activities in solicitations of a covered contract.
– Document the justification for exceptions if R&M requirements or activities are
not included in solicitations.
– Ensure that sustainment factors are emphasized in the process for source
selection and encourage use of objective R&M criteria in the evaluation.
Contract Performance shall:
– Ensure the use of best practices for responding to positive or negative
performance of a contractor in meeting sustainment requirements.
– Be authorized to include provisions for incentive fees and penalties.
– Base determinations on data collection and measurement methods in the
covered contract.
– Notify the congressional defense committees upon entering contracts that
includes incentive fees or penalties.
DoDI 5000.88
DoDI 5000.88 “Engineering of Defense Systems” includes in part:
For all defense acquisition programs, the Lead Systems Engineer (LSE)*, working for
the PM, will integrate R&ME engineering as an integral part of the overall engineering
process and the digital representation of the system being developed.
The LSE will plan and execute a comprehensive R&M program using an appropriate
strategy consisting of engineering activities, products, and digital artifacts, including:
* Note: The LSE equivalent in SYSCOMs includes SDM/SIM (NAVSEA) and APM-E (MARCOR,
NAVWAR and NAVAIR).
DoDI 5000.91
DoDI 5000.91 “Product Support Management for the Adaptive Acquisition Framework”
contains references to the JCIDS Sustainment KPPs, KSAs, and Additional Performance
Attributes. It also states the following regarding the RAM-C Rationale Report:
The product support manager (PSM) will work with systems engineers and users to
develop the RAM-C Rationale Report to ensure supportability, maintenance, and
training are incorporated into the design through early user assessments and to
incorporate user feedback into supportability planning. This collaboration will ensure
sustainment thresholds are valid and feasible. More detail on the RAM-C Rationale
Report may be found within relevant engineering instructions (e.g., DoDI 5000.88 [Ref
16]) and in the JCIDS Manual [Ref 10] (Annex D, Appendix G, Enclosure B, paragraph
2.5.1).
SECNAVINST 5000.2G
SECNAVINST 5000.2G provides additional Navy guidance regarding R&ME implementation
in the acquisition process and requires:
For all Adaptive Acquisition Framework (AAF) programs other than provision of Services,
the PM will implement a comprehensive R&ME program. The R&ME program will include
Government and contractor efforts that address reliability, maintainability, diagnostics,
Health Management (HM) specifications, and other engineering tasks and activities
necessary to resolve operational requirements, design requirements, and Government and
contractor R&ME activities. For acquisition category (ACAT) I and II programs, the PM shall
ensure that solicitations and resulting contracts include R&ME factors and requirements.
The Government R&ME program shall be documented in an R&ME Program Plan that shall
be approved by the Systems Command (SYSCOM) R&ME Tech Authority or subject matter
expert (SME).
c. Each SYSCOM CHENG or designee will designate an R&ME manager responsible for
SYSCOM R&ME policy, standards, guidance, oversight and implementation for their
designated platforms, environments, and Command structure.
e. Programs will maintain a R&M associated risks and risk mitigations list, including
deviations from the R&M Program Plan. Future impacts such as, cost, availability,
and mission effectiveness should be primary factors considered in risk acceptance.
Internal control oversight of R&M risk acceptance will be conducted during Systems
Engineering Technical Reviews (SETRs), Technical Review Boards (TRBs),
independent logistics assessments (ILAs), independent technical review
assessments (ITRAs), and Gate Reviews as appropriate.
The DON intends to utilize the appropriate Systems Command’s cost estimating
organizations, working with the program offices, to conduct the required O&S ICE in
coordination with Director, Cost Analysis and Program Evaluation and in accordance with
DOD and DON cost policies and procedures. The ICE will include all costs for the remainder
of the program’s life cycle. Results of the ICE, including any critical cost growth, will be
reported in the SR. The DON will provide mitigation plans, or certification, for critical cost
growth annually to Congress.
5. Comparison of actual costs to funds budgeted and appropriated in the previous five
years. If funding shortfalls exist, provide implications on weapon system availability.
6. Comparison between assumed and achieved system reliabilities.
7. Performance to approved SPB requirements (if applicable).
8. Analysis of the most cost-effective source of repairs and maintenance.
9. Evaluation of costs of consumables and depot-level repairables.
10. Evaluation of costs of information technology, networks, computer hardware, and
software maintenance and upgrades.
11. Assessment of actual fuel efficiencies compared to projected fuel efficiencies, if
applicable.
12. Comparison of actual manpower requirements to previous estimates.
13. Analysis of whether accurate and complete data is reported in cost systems of the
military department concerned. If deficiencies exist, a plan to update the data and
ensure accurate and complete data will be submitted in the future.
14. Information regarding any decision to restructure the LCSP for a covered system, or
any other action that will lead to critical O&S cost growth, if applicable.
GUIDANCE
Related guidance documents with specific reference to R&ME include: DOD R&ME
Management Body of Knowledge (DOD RM BoK) [Ref 19]; USD (R&E) Systems Engineering
Guidebook (Feb 2022) [Ref 20]; Reliability, Availability, Maintainability, and Cost (RAM-C)
Rationale Report Outline Guidance (Feb 2017) [Ref 21]; Engineering of Defense Systems
Guidebook (Feb 2022) [Ref 22]; Systems Engineering Plan (SEP) Outline (v. 4.0, Sep 2021)
[Ref 23]; Life Cycle Sustainment Plan (LCSP) v2.0 (Jan 2017) [Ref 24]; and Test and
Evaluation Master Plan (TEMP) Guidebook v. 3.1 (Jan 2017) [Ref 25].
The DOD RM BoK presents the procedures that program managers, project engineers, and
R&M engineers should use for implementing and executing R&M programs. It provides
very detailed descriptions and guidance for each associated task and life cycle phase. The
USD (R&E) Systems Engineering Guidebook provides systems engineering guidance and
recommended best practices for defense acquisition programs. The RAM-C Rationale
Report, the SEP Outline, LCSP Annotated Outline, and the TEMP assist in the preparation of
the respective documents. These guidance documents described are examples of planning
documents that span the life cycle of the program and therefore appear as activities during
different acquisition phases.
Results of R&M engineering activities are essential for programmatic decision and control
functions. The R&ME design methods and procedures are not new, but the challenge occurs
in the management of these methods and procedures to achieve reliable and maintainable
systems. Effective management control of the R&ME program, using the policies and
guidance set forth by DOD, DON, and the Naval SYSCOMs will ensure timely performance of
the necessary activities to achieve the requirements and the development of adequate data
to judge the acceptability of R&ME achievement at major milestones.
NAVSEA:
– T9070-BS-DPC-010_076-1 Reliability and Maintainability Engineering Manual,
21 Feb 2017 [Ref 26]
NAVAIR
– Most of the NAVAIR R&ME guidance comes in the form of Standard Work
Packages (SWP) including:
• Validate and Translate R, M, and BIT Requirements for Joint Capabilities
Documents
• Develop and Implement a Reliability, Maintainability, and Integrated
Diagnostics Program
• Perform Reliability, Maintainability & BIT Design Analyses
• Perform R&M Pre-installation Design Verification Tests
• R&M/IHMS Test and Evaluation Management
• Reliability Control Board: Reliability and Maintainability Analysis Process
• Reliability Growth Planning, Tracking and Projection During
Developmental and Operational Testing
• Reliability, Availability, Maintainability, and Cost (RAM-C) Analysis and
Report Development Cross Domain SWP
• Systems Engineering Plan: Reliability and Maintainability Inputs
• SETR Event Process: R&M Preparation and Attendance
• SETR Event R&M Risk Assessment Process
Table 3 outlines MCA tailoring guidelines based on the program phase and type of
equipment being acquired. This table identifies the engineering activities identified in DoDI
5000.88 [Ref 16], DoDI 5000.91 [Ref 17], and SECNAVINST 5000.2G [Ref 4], as well as
specific tasks and activities that support the overall R&ME program. Checkmarks indicate
tailoring is required to address the equipment type and unique requirements of the system.
The table identifies when an update is recommended and should be tailored to the
program’s needs. The tasks and activities presented in Table 3 are in concert with the DOD
RM BoK [Ref 19], which provides very detailed descriptions and guidance for each
associated task and life cycle phase. For more details of the procedures, criteria, and data,
refer to DOD RM BoK [Ref 19].
SECNAVINST 5000.2G
Type
DoDI 5000.88
DoDI 5000.91
NDI/COTS/GOTS
Major Change
New Design/
Modified
R&ME Tasks and Activities
TMRR
MS A
EMD
O&S
P&D
Reliability and Maintainability
● ● Initial Update Update Update Update
Program Plan
Mission Profile Definition: Review
● Initial Update Update Update Update
and Summarize the OMS/MP
Perform R&M Requirements
● Initial Update Update Update Update
Validation
Subcontractor Requirements:
Translate JCIDS R&M values into ● Initial Update Update Update Update
design and contract requirements
Review the Acquisition Strategy Initial Update Update Update
Provide or Update R&M Input to SEP ● Initial Update Update Update
Prepare or Update RAM-C Report ● ● Initial Update Update Update
Provide or Update R&M Input to Test
Initial Update Update Update
and Evaluation Master Plan (TEMP)
Provide or Update the Performance
Initial Update Update Update
Specification
Provide or Update R&M Inputs into Initial Initial Initial Initial Initial
the Statement of Work (SOW) Phase Phase Phase Phase Phase
based based based based based
Parts Derating Guideline and Stress
Prelim Initial Update
Analysis
Evaluate GFE/COTS Initial Update Update Update
Prepare or Update allocations of
● ● Prelim Initial Update Update
R&M requirements
Prepare or Update R&M Block
● Prelim Initial Update Update Update
Diagrams
Predict R&M to estimate feasibility ● Initial Update Update Update Update
Prepare or Update failure definitions
● ● Initial Update Update Update Update
and scoring criteria (FD/SC)
Perform or Update FMECA ● ● Initial Update Update Update
Reliability Critical Items Initial Update Update Update
FRACAS ● ● Plan Implement Execute Execute
Provide R&ME Design Support Execute Execute
Perform Design Trade-off Studies Execute Execute Execute Execute Execute
Conduct Growth and Design
Plan Execute Execute Execute
Verification Tests
Perform Subsystem Tests ● Plan Execute Execute
Perform System Tests ● Plan Execute Execute
Production Planning InitialUpdate
Fleet R&M Data Analysis Execute
Engineering Change Proposals Execute Execute
Life Cycle Sustainment Plan ● ● Initial Update Update Update
Integrated Logistics Assessment ● ● Initial Update Update Update
Prelim – Preliminary draft of the artifact may not be needed for the phase. Initial – Artifact required in support of a specific decision point,
potentially requires an update at a later date. Update – Maintenance of the document to account for design maturation, strategy changes,
contractual updates, design modifications, and lessons learned. Plan – Plan the test or activity. Execute – Conduct the task or activity
The Government R&ME Program Plan describes the Reliability (R), Maintainability (M), and
Health Management (HM) engineering effort for the full life cycle of the program. Planning
activities will typically commence with Materiel Solution Analysis (MSA) or TMRR and run
through O&S. This plan establishes a properly constructed and tailored R&ME management
approach to ensure that all elements of the R, M, and HM engineering efforts are uniformly
implemented, properly conducted, evaluated, documented, reported, and integrated. This
Government plan will serve as the master planning and control documentation for the R, M,
and HM program.
The prime contractor’s R&ME Program Plan describes how the program will be conducted,
and the requirements, controls, monitoring and flow down provisions levied on
subcontractors and vendors. It describes the R&ME, including HM, procedures, and tasks to
be performed and their interrelationship with other system related tasks. The principal use
is to provide a basis for review and evaluation of the contractor’s R&ME program and for
determining compliance to specified R&M requirements.
The R&M engineer should summarize the OMS/MP and environment for the program. An
accurate and thorough OMS/MP, based on the CONOPS or combat scenario deemed to be
the most representative, is critical to ensuring the equipment meets the user’s needs. Any
special conditions of use that would affect the sustainment of the system should be
identified.
This analysis process is described in the following section of the DOD RM BoK [Ref 19]:
“MSA Activity #4, R&M Requirement Analysis, System Engineering.” As part of the
validation, the R&M engineer does the following:
STEP 1.1: The R&M engineer reviews the desired capabilities established in the draft
CDD to refine (if necessary) the OMS/MP, operational sequence and maintenance
concept. The R&M engineer should ensure the system boundaries, FD/SC, and mission
time are defined and consistent with the program acquisition concepts of operation.
STEP 1.2: The R&M engineer performs preliminary R&M analysis, feasibility and
trade-off studies of the design concepts. This includes the development of a composite
model for early planning and determining feasibility of the reliability, maintainability,
and availability metrics.
STEP 1.3: Based on results of the R&M analysis, the R&M engineer:
– Recommends adjustment (if necessary) of the R&M thresholds.
– Summarizes whether the sustainment parameters are valid and feasible.
– Identifies any significant issues in OMS/MP, CONOPS, failure definitions or
maintenance approaches.
– Provides issues and recommendations to the requirements developers and other
stakeholders.
– Repeat above steps as necessary until requirements are determined to be
feasible.
STEP 1.4: Once the Operational requirements are considered valid, R&M engineer
ensures the appropriate documents are updates.
As the design matures, the R&M engineer should continue to update the requirements
analysis and assess the risk associated with the R&ME performance. The DOD RM BoK
contains detailed procedures for requirements analysis in each life cycle phase.
D. CONTRACTOR REQUIREMENTS
Once JCIDS warfighter requirements have been validated and assessed for feasibility, the
R&M engineer should translate thresholds and objectives into contractual R&M design
requirements. The translation accounts for differences between operational environments
and acquisition developmental environments. These differences are not statistical
variations or confidence intervals but are, in part, attributed to the fact that operational
systems include more elements and potential failures in the operating environment than
systems under contract evaluated in a developmental environment. This task should be
completed regardless of the acquisition pathway.
Further info is located in Chapter 4, “Translating and Allocating KPP and KSA/APA
Requirements into Contract Specifications.”
A SEP outline is provided in the Systems Engineering Plan (SEP) Outline [Ref 23].
sustainment requirements and concepts for each alternative. The RAM-C Rationale Report
should provide a quantitative basis for reliability, availability, and maintainability
requirements, as well as improve cost estimates and program planning. The tasks in Table
3 (“Perform R&M Requirements Validation” and “Translate JCIDS R&M Values into Design
and Contract Requirements”) will support the RAM-C analysis. RAM-C Rationale Reports
are to be developed and attached to the SEP at Milestone A, RFP Release Decision Point,
Milestone B, and Milestone C. The RAM-C analysis and the RAM-C Rationale Report are
required for all urgent capability acquisition (UCA), MCA, or MTA programs. However, it is
beneficial to create a RAM-C-like report for all acquisition programs to document the
analyses behind the requirements for future reference.
The RAM-C Rationale Report Outline [Ref 21], as well as additional training and other
resources, may be found at DAU’s R&M Engineering Community of Practice [Ref 29].
The TEMP should specify how R, M, and HM will be tested and evaluated during the
associated acquisition and test phases. Beginning in MS B, the Reliability Growth Strategy
and associated Reliability Growth Curves should be included in the TEMP. The TEMP
should provide the picture of how all testing fits together and how testing produces a
verification of not only the system’s effectiveness at meeting the performance objectives
for the capability, but the required R, M, and HM as well. The TEMP should identify R, M,
and HM testing and data requirements. Test limitations should be discussed, including
impacts of limitations and potential mitigation.
interface and interchangeability characteristics. The requirements should not specify how
the product should be designed or manufactured.
The verification method for each requirement is stated in “Section 4” of the Specification.
isolation, equipment access (open doors and panels, etc.), equipment removal and
replacement, and system closeout (close doors and panels, etc.). In addition, list tasks
and activities that are not included, such as tool gathering and software loading.
Clear definitions and equations for BIT and testability requirements.
Qualitative design for maintainer requirements.
Description of verification methodologies.
Reliability Information Analysis Center (RIAC)’s “Maintainability Toolkit” [Ref 32] provides
further guidance and approaches for maintainability specification.
The SOW tasks should be defined and scheduled so they are deemed as proactive tasks and
analyses positively impacting the design vice reactive tasks and analyses that just
document the design.
limiting values, define electrical, mechanical, thermal, environmental and special sensitive
criteria beyond which either initial performance or operations are impaired. All critical
parameters must be addressed for each part or material subclass. Stress derating practice
ranks with mission profiles as one of the most critical design factors associated with high
reliability, low risk products.
L. EVALUATE GFE/COTS
The R&M engineer should review contractor’s analysis of Government-Furnished
Equipment (GFE)/ Commercial-off-the-Shelf (COTS) components’ R&M attributes. Using
GFE/COTS can enhance operational effectiveness and reduce costs as the development and
supply system for these items are already established.
To fully investigate GFE/COTS options and make informed decisions, the acquiring activity
should acquire design data, test results, and information on field performance and interface
compatibility for specific GFE/COTS items identified in the contract.
The allocation process is approximate and usually results from a trade-off between the
R&M of individual items. If the R&M of a specific item cannot be achieved at the current
state of technology, then system design must be modified, and allocation reassigned. This
procedure is repeated until one allocation is achieved that satisfies the system level
requirement and results in items that can be designed.
Caution must be exercised in allocating system requirements when GFE or COTS items are
part of the system. Often, the source data originally specified for such GFE or COTS items
are used in lieu of the actual field data experienced in the Fleet. Use of original source data
(i.e., specification or lab demonstrated values) can impact achievement of system
requirements, development time and cost. If actual GFE or COTS source data is significantly
worse than the original specification values, then allocation for Contractor items will be
inadequate to satisfy system requirements. On the other hand, if GFE or COTS source data
is significantly better than the specified value, then allocations for Contractor items will be
higher than required and could cause an increase in development time and cost necessary
to satisfying system requirements.
Regardless of the type of acquisition, R&M allocations must be constructed for all procured
systems.
It is imperative to implement life cycle R&M block diagrams which can be updated as more
accurate data becomes available. The R&M block diagrams are used to identify potential
areas of poor R&M and where improvements can be made. This method can be used in both
design and operational phases to identify poor reliability and provide targeted
improvements.
MIL-HDBK-338B [Ref 31], Section 6.4, “Reliability Modeling and Predication” provides
extensive coverage of R&M block diagrams and math models.
REMLs are used to describe the level of confidence in reliability predictions. REMLs
describe the level of knowledge that we have in the accuracy and completeness of the
failure rate data on specific equipment in the environment that it is intended to be used. A
prediction may be followed by the percentage REML in each category (I –IV) as
described below. REMLs are assigned during the design and development process to
understand how the new design compares to what is known today regarding the
equipment’s reliability and should be included in the prediction analysis.
The following categories are used to assign REMLs when designing new systems and may
be tailored to support the system under design:
reliability application and prediction risk due to the lack of relevant data. Further
reliability analysis and testing are recommended to mitigate the risk.
II – Existing technologies used in different applications: Equipment or technologies
that have limited to no relevant DOD/DON field data or data that does exist is from a
different industry with remotely related use or environments (such as the auto
industry). This may be equipment where only manufacturer’s data is available, but it
is not relevant to the Naval applications. These items could also be existing equipment
that does not meet their reliability requirements and are considered candidates for a
reliability improvement program. These items or systems represent a moderate-to-
high reliability application and prediction risk due to the lack of relevant data. Further
reliability analysis and testing are recommended to mitigate the risk.
III – Existing technologies used in similar applications: Existing equipment that has
been in use previously in similar applications (such as commercial marine
applications, but not on Naval systems / commercial aircraft but not Naval aircraft),
and there are abundant reliable sources of reliability and maintainability data
available to support R&ME estimates. These may be commercial items or items that
have been tested by the Government for which test results are available. These items
or systems represent a low-to-moderate reliability application and prediction risk
because they have been demonstrated in a similar application. Further reliability
analysis and testing may be necessary to mitigate the risk.
IV – Existing technologies used in identical applications: Existing equipment that are
already fielded in similar Naval applications and have relevant field data that
demonstrate a proven failure rate. The equipment may be standard DON issue items
or COTS items with a proven failure rate in the same application that it is intended to
be used. These items or systems represent a low reliability risk. Further reliability
analysis and testing may not be necessary to mitigate the risk.
Predict failure frequency caused by randomly occurring failures during any period of
a system’s useful life and
Consider failures caused by manufacturing defects, component variabilities, and
customer use variations.
The underlying assumption with use of empirical methods is that all life limiting failure
mechanisms far exceed useful operating life of the system, leaving only latent
manufacturing defects, component variability and manufacturing defects, component
variability and misapplication to cause field failure. Examples include MIL-HDBK-217, MIL-
HDBK-217Plus, and Telcordia, etc.
The R&ME practitioner needs to recognize that these statistically based “piece-part”
predictions (especially such as MIL-HDBK-217 [Ref 34]) can provide for good relative
assessment across differing contractor designs, but will not accurately depict field
performance.
Predict when single specific failure mechanism will occur for an individual component
due to wear out; and
Analyze numerous potential failure mechanisms (e.g., electromigration, solder joint
cracking, die bond adhesion, etc.) to evaluate the possibility of device wear out within
useful life of the system.
The physics of failure process requires detailed knowledge of all device material
characteristics, geometries, and applications which may be unavailable to system
designers, or which may be proprietary.
The probability that software will not cause the failure of a system for a specified time
under specified conditions.
The ability of a program to perform a required function under stated conditions for a
stated period of time.
There are different models and methods for software reliability predictions. The IEEE
1633-2016 [Ref 36] defines the software reliability engineering (SRE) processes, prediction
models, growth models, tools, and practices. The document identifies methods, equations,
and criteria for quantitatively assessing the reliability of a software or firmware subsystem
or product.
The FD/SC is considered a living document, in that failure definitions may be refined
as system design is matured. Changes to FD/SCs may result from an increased
understanding of how the system executes mission functions and should not be used
to change the requirement, severity, or timelines for meeting those functions.
Instability in failure definitions leads to drastically varying reliability and
maintainability measurements.
The FD/SC should be agreed to by all parties involved. Disagreements must be
elevated and resolved within the DON. The cognizant Operational Test Agency
(OTA), Operational T&E Force (COTF), Marine Corps Operational T&E Agency
(MCOTEA) action officer who chairs the Reliability and Maintainability Scoring Board
3 | R&ME IN THE ACQUISITION PROCESS | 47
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
for the Operational Test and Evaluation and the program chief engineer, ship design
manager (SDM) or systems integration manager (SIM) should assure that only one
FD/SC is used.
All time or cycle parameters used should be clearly defined. For example, time
parameters must clarify or differentiate between flight hours versus operating hours,
or operating hours versus power on or standby hours. Any terms specifically defined
in the contract that are inconsistent with the FD/SC should be noted.
Mission essential, mission critical, mission specific, system critical, safety critical and
self-protection/defense functions are all critical parameters to be addressed in the
FD/SC. The system operations necessary to maintain those functions should be
identified, so failures and severity can be tied back to mission function.
There are several different types of FMECAs, including design, process, and software.
Design FMECAs evaluate system design to identify failure modes. Process FMECAs evaluate
manufacturing process to identify potential issues. Software FMECAs evaluate failure
modes in software design and hardware software interface.
Many different Government and industry standards and guidelines address the FMECA
process, elements, and typical ground rules and assumptions. MIL-STD-1629 [Ref 37],
although cancelled, is one of the most used guides. The FMECA is not a one-time analysis
but should be updated throughout the life of the system. It should be updated during test
and sustainment to incorporate failure modes that were not foreseen, to update failure
rates for each failure mode, and to ensure detection methodologies are accurate. These
updates should be coordinated with Reliability Centered Maintenance (RCM), System
Safety, and Logistics to ensure their planning efforts are updated as necessary.
For more information, refer to DI-SESS-81495B, “Failure Modes, Effects, and Criticality
Analysis” [Ref 38] and DI-SESS-82495, “Model-Based Engineering Failure Modes, Effects,
and Criticality Analysis Profile (SYSML Version)” [Ref 39].
For more information, refer to DI-SESS-80685A, “Reliability Critical Items List” [Ref 40].
FRACAS
A disciplined and aggressive closed loop FRACAS is an essential element in the early and
sustained achievement of the R&ME required in military systems. It is the key requirement
for a Reliability Growth Program. The essence of a closed loop FRACAS is that failures and
faults of both hardware and software are formally reported, analysis is performed to the
extent that the failure cause is understood, and positive corrective actions are identified,
implemented, and verified to prevent further recurrence of the failure. The basis of FRACAS
is further discussed and defined in MIL-HDBK-2155, “Failure Reporting, Analysis, and
Corrective Action Taken” [Ref 41].
Additionally, DoDI 5000.88 [Ref 16] requires that each program implement a FRACAS,
maintained through design, development, test, production, and sustainment.
For more information, refer to DI-SESS-81927, “Failure Analysis and Corrective Action
Report (FACAR) (Navy)” [Ref 42].
beginning of the system development and demonstration phases. The R&M engineer
should make sure that all trade studies assess each design concept for its producibility. The
contractor has a corporate design policy and process to ensure that design trade-off studies
continue throughout the system development and demonstration phases. The contractor
also has procedures that establish a specific schedule, identifies individuals responsible,
and defines proper levels of reporting trade study results, and all trade studies identify the
relative risks of all options associated with the use of new technology.
for as a component of technology studies and other technology demonstrations during the
TMRR phase.
Design verification and/or risk reduction tests should be performed whenever there is
reasonable doubt as to the adequacy or validity of analytical results related to a critical
(high-risk) area of design.
All failures during contractor subsystem tests, and later during production and
deployment, should be recorded in the FRACAS. The contractor should flow FRACAS
requirements to subcontractors and vendors to ensure failures are recorded, analyzed, and
corrected. A regular failure review board should be held jointly with the contractor to
review contractor failure analysis reports and evaluate the depth to which failure diagnosis
has been probed for cause-and-effect relationships, and failure modes and mechanisms.
It is important to note that many times, these R&ME specific tests are cancelled due to test
asset shortages, schedule constraints, or financial issues. It is imperative that these tests
be conducted. All these subsystem level tests allow for early identification of design issues,
which are much less expensive to repair during EMD than in production and sustainment.
Additionally, if these tests are cancelled, the equipment R&ME design will be matured in
the Fleet causing additional burden on the maintainers, increased costs, and decreased
system availability. These risks should be captured by the contractor risk process and
rolled up into the program risk assessment.
products are compliant with contractual and technical requirements, prepare for OT&E,
and inform decision-makers throughout the program life cycle. DT&E results verify exit
criteria to ensure adequate progress before investment commitments or initiation of
phases of the program, and as the basis for contract incentives. During DT&E, the R&ME
team reports on the program’s progress to plan for reliability growth and assess R&M
performance to the JCIDS and contractual requirements for use during milestone decisions.
It is imperative that the R&ME team collects all appropriate data to conduct analyses.
During system testing, all maintenance tasks should be monitored to ensure technical
publication adequacy and maintenance documentation accuracy. All data related to each
maintenance action should be recorded for analysis against JCIDS and contractual
requirements. This data will be recorded in the FRACAS/maintenance data collection
system and reviewed and scored as part of the R&M Review Board (RMRB) or Joint
Reliability and Maintainability Evaluation Team (JRMET). The FD/SC will be used to score
the data and calculate metric values against appropriate specification requirements and
CDD thresholds. The R&ME team should coordinate with OTAs to ensure that data
collection, R&M monitoring, and FD/SC processes are compatible with processes of both
OTAs and program offices to evaluate contractual and operational R&M performance and
suitability characteristics.
System tests to demonstrate R&M and BIT include the maintainability demonstration, the
system BIT demonstration, and the system R&M assessment:
discovered during laboratory testing and development work and to establish effective
corrective actions to eliminate these problems. During all system tests, maintenance
tasks should be conducted by maintenance personnel of the same type, number, and
skill level to perform maintenance on the system during the operational phase in
the field.
The Initial Operational Test and Evaluation (IOT&E) is conducted on production, or
production representative articles, to determine whether systems are operationally
effective and suitable for intended use by representative users to support the decision to
proceed beyond Low Rate Initial Production (LRIP). OT&E is a fielded test, under realistic
combat conditions, for an MDAP of any item or component of a weapons system,
equipment, or munitions for the purposes of determining its operational effectiveness and
operational suitability for combat. OT&E is conducted by independent operational testers.
Operational testing of an MDAP may not be conducted until the Director of Operational
Test and Evaluation approves the adequacy of test plans for OT&E to be conducted in
connection with that program. Additionally, the director analyzes results of the OT&E
conducted for each MDAP. At the conclusion of such testing, the Director should prepare a
report for the Secretary of Defense stating completeness or incompleteness of the test.
OT&E activities continue after the FRP decision in the form of FOT&E. FOT&E verifies the
operational effectiveness and suitability of the production system, determines whether
deficiencies identified during IOT&E have been corrected, and evaluates areas not tested
during IOT&E due to system limitations. Additional FOT&E may be conducted over the life
of the system to refine doctrine, tactics, techniques, and training programs and to evaluate
future increments, modifications, and upgrades.
U. PRODUCTION PLANNING
The R&M engineer / analyst needs to ensure the systems continue to meet operational
thresholds but also ensure there is no unacceptable degradation of design characteristics
that would present a risk to meeting operational thresholds due to Fleet environment or
manufacturing changes.
785B (cancelled), “Task 304: Production Reliability acceptance Test (PRAT) Program”
[Ref 45] provides more information regarding PRAT.
In order to accomplish this task, the R&M engineer should ensure the proper processes and
procedures are in place to obtain the data necessary to assess the system R&M
performance, identify poor performing systems, sub-systems, or components, and conduct
root cause analyses. These tasks require access to organic usage, failure, maintenance, and
health management data. Additionally, supplier and original equipment manufacturer
(OEM) maintenance and repair data is needed. When issues are identified, the R&M
engineer along with the systems and design engineers will coordinate on determining the
root cause and the corrective actions needed to eliminate of minimize the failure mode
occurrence. The R&M engineer will then contribute to the Business Case Analysis (BCA) by
determining the R&M improvement benefits to the product reliability and maintainability
performance. Once the corrective action is identified, the R&M engineer will continue to
monitor the system performance to ensure the corrective action was effective. A funded
FRACAS is required for a fully effective sustainment R&M program. At a minimum, the OEM
and Organic I-level and Depot level repair data is needed.
All data and analyses are coordinated with logistics and engineer. The identification or new
failure modes or BIT design deficiencies may result in maintenance planning changes. In a
future iteration of this guidebook, R&ME interactions with Condition Based Management
Plus (CBM+) efforts will be included.
R&M engineers assist the PSM to ensure that the LCSP evolves in tandem with the SEP, to
ensure that JCIDS sustainment capabilities are designed into the system and integral to systems
performance. Specifically, the R&M engineer contributes to the Design Interface and Sustaining
Engineering portions of the LCSP.
Figure 11 is an overview of the DON’s Two-Pass Seven-Gate Review process. The goal of
the Two-Pass Seven-Gate Governance procedures is to ensure alignment between Service-
generated capability requirements and systems acquisition, while improving senior
leadership decision-making through better understanding of risks and costs throughout a
program’s entire development cycle. The following paragraphs discuss the R&ME
objectives throughout the phases of acquisition life cycle.
MILESTONE A REVIEW
The Milestone A (MS A) review should look for inconsistencies that may be visible with the
proposed solution in an integrated, system oriented, program wide view. The following
documents should be evaluated for adequacy of R&ME requirements and provisions:
Develop and verify adequacy of the allocated design for the system with respect to
operational effectiveness and suitability, logistics supportability, and life cycle costs.
Develop the allocated baseline (if the program completes a successful Preliminary
Design Review (PDR) in this phase) and contract for the EMD phase, by which the
preliminary design can be transformed into engineering hardware and software for
test and evaluation. If the contract overlaps the EMD and subsequent phases, the data
and contract should also satisfy those subsequent phases.
The following data should be available for the in-process review of R&ME analyses results
during the TMRR phase:
By SRR:
– Preliminary environmental studies.
– R&M block diagrams, allocations, and predictions for major system and
subsystems.
– A reliability growth-planning curve is developed and included in the SEP.
By SFR:
– R&ME Specification – Approved specification R&ME requirements reflecting
functional baseline.
– The OMS/MP definition (provided by the Government) is used by the contractor
to provide the following:
• Mission objectives, including what, when, and where a function is to be
accomplished.
• Constraints that affect the way objectives are to be accomplished (e.g.,
launch platform, design ground-rules for various flight conditions).
• Time scale of system-level functions to accomplish the mission objectives.
– BIT functional requirements allocated for operations and maintenance to the
functional baselines and are supported by maintainer use-case analysis.
– System architecture contains required BIT functionality.
By PDR:
– Design derating guide and criteria.
– Final environmental studies.
– R&M block diagrams, allocations, and predictions to subsystem and unit levels.
– Current, approved version of allocated baseline R&M requirements.
– Preliminary functional FMECA with supporting software FMEAs to the
subsystem and unit that addresses 100 percent of functions and preliminary
Critical Items list.
MILESTONE B REVIEW
The MS B review at the conclusion of the TMRR phase requires an R&ME assessment to
provide the data necessary for an evaluation of R&M conformance to requirements in
system specification. The PDR, the final systems engineering design review before entering
EMD, signifies completion of all assigned activities in the TMRR phase. It verifies the
acceptability of activity results as a basis for a decision to proceed into EMD.
The contractor’s prediction analyses, test results, problem evaluations, and root failure
cause/categorization (by which the detail design has been guided) are verified analytically.
The Government review team evaluates the program’s progress and effectiveness in
correcting deficiencies noted in the earlier assessments, and evaluates the status of any
remaining R&ME problems. The team evaluates the seriousness of problems to determine
whether correction should be required before release of the design for development and
manufacture. R&M requirements and provisions defined by the contractor in the proposed
follow-on contract data package are critically reviewed to determine compliance with
contract requirements (e.g., R&ME plans, specifications, reliability growth plans, test and
evaluation plans, demonstration acceptance criteria and procedures, data requirements,
and contract work statement).
Describe in the SEP the R&ME program for monitoring and evaluating contractor,
subcontractor, and supplier conformance to contractual R&M requirements.
Conduct design reviews, R&ME assessments, and problem evaluations at scheduled
milestones. Assign and follow up on action items to correct noted deficiencies and
discrepancies.
Conduct a CDR to ensure that the product baseline design and required testing can
meet R&M requirements, the final FMECA identifies any failure modes that could
result in personnel injury and/or mission loss, and detailed prediction to assess
system potential to meet design requirements is complete.
Perform specified development, qualification, demonstration, and acceptance tests to
show conformance to contractual R&M requirements and assess the readiness to
enter system-level reliability growth testing at or above the initial reliability
established in the reliability growth curve in the TEMP. Verify the adequacy of
corrective action taken to correct design deficiencies.
Ensure the Software Development Plan (SDP) and TEMP include software test
methods to identify and correct software failures and that there is a high degree of
confidence the system can be recovered from any software failures that may occur
after fielding.
Implement a FRACAS to ensure feedback of failure data during test to design for
corrective actions. Provide a data collection system for data storage and retrieval
suitable for R&M tracking analysis and assessment.
Coordinate with OTAs to ensure that data collection, R&M monitoring, and failure
definition and scoring processes are compatible with the processes of both the OTA
and the program office to evaluate contractual and operational R&M performance and
suitability characteristics.
Ensure the configuration control program includes the total life cycle impact
(including R&M) of proposed changes, deviations, and waivers. Ensure the systematic
evaluation, coordination, timely approval or disapproval, and implementation of
approved changes.
Apply and evaluate allocation and prediction analyses using latest test data to identify
potential R&M problem areas.
Prepare initial production release documentation to ensure adequate R&M
engineering activities in production test plans, detailed drawings, procurement
specifications, and contract SOW. Ensure that documentation provides adequate
consideration of R&ME in re-procurements, spares, and repair parts.
When the program has accomplished the objectives of the EMD phase and the system has
demonstrated adequate progress toward achieving the contractual requirements, the MDA
convenes a milestone review or its equivalent to consider approval for commitment of
resources for initial production and deployment. Although system-level R&M requirements
may have been achieved, subsystem and Component R&M failing their individual R&M
requirements can affect logistics, support equipment, and manpower.
MILESTONE C REVIEW
Milestone C is the point at which a program is reviewed for entrance into the Production
and Deployment Phase.
The final review of R&M achievements in the EMD phase (performed just prior to the
scheduled milestone) is intended to verify fulfillment of specified requirements and to
ensure that the production release data package is adequate for proceeding to production.
R&M Recommendation
On the basis of the review, make recommendations (with justification) for disposition of
the program by one of the following alternatives:
Typically, a system begins its introductory period of service use under the surveillance and
with the augmented support of the production contractor. During this period, the
production contractor is required, by reference to appropriate contract tasks, to identify
and investigate inherent design and manufacturing process-related problems and to
Sustainment Reviews
Sustainment Reviews (SRs) required for all active and in service covered weapon systems.
SRs begin at five years after initial operational capability and repeat every five years
thereafter. SRs end five years before a covered system’s planned end of service date. The
SRs will focus on statutory sustainment elements and track O&S cost growth. In support of
the SR, the R&ME team will provide assessments of the systems fielded performance to the
Sustainment KPP and KSAs.
4 | REQUIREMENTS DEVELOPMENT
AND MANAGEMENT
SUSTAINMENT KPP
The Sustainment KPP and associated KSAs are translated into systems design and
supportability requirements. They are used to influence system design, improve mission
capability and availability, and decrease the logistics burden over a system’s life cycle.
Metrics ensure operational readiness, performance of assigned functions, and optimized
operation and maintenance.
The sustainment KPP metric is used to determine if the system can be operated and
maintained within the O&S cost goals. The sustainment KPP includes key supportability
metrics used to develop the program’s logistics footprint such that the system is
sustainable during its operating life. By not adopting sustainment requirements, especially
during the design phase, the logistics footprint will be insufficient to support the system
resulting in the operational availability not meeting the warfighter’s needs. Every program
must consider sustainment during acquisition planning and develop requirements in
accordance with Annex D to Appendix G to Enclosure B of the JCIDS Manual, Sustainment
KPP Guide [Ref 10].
SECNAVINST 5000.2G [Ref 4] requires a Sustainment KPP for all CDDs (with inherent
flexibility to allow a resource sponsor (user) to justify not including one.) The JCIDS Manual
instructs that the Sustainment KPP (in addition to System Survivability, Force Protection,
and Energy) must be addressed. The resource sponsor can address this by stating that the
user requirement is not applicable; however, all systems have some attributes that are
relevant to the Sustainment KPP.
Operational Availability (Ao) KPP - Measure of the percentage of time that a system
or group of systems within a unit are operationally capable of performing an assigned
mission and can be expressed as:
𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈
𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 (𝐴𝐴𝑜𝑜 ) =
𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 + 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷
Reliability Attribute - Measure of the probability that the system will perform
without failure over a specific interval, under specified conditions. Reliability shall be
sufficient to support the warfighting capability requirements, within expected
operating environments. Examples include a probability of completing a mission, a
Mean Time Between Operational Mission Failures (MTBOMF) or A Mean Time
Between Failures (MTBF). For AIS, a reliability attribute should not use traditional
reliability metrics (e.g., MTBF, MTBCF). Subordinate attributes are:
– Mission Reliability – Measure of the ability of an item to perform its required
function for the duration of a specified mission profile, defined as the probability
that the system will not fail to complete the mission, considering all possible
redundant modes of operation.
– Logistics Reliability – Measure of the ability of an item to operate without
placing a demand on the logistics support structure for repair or adjustment,
including all failures to the system and maintenance demand as a result of
system operations.
Maintainability Attribute – Measure of the ability of the system to be brought back
to a readiness status and state of normal function. Subordinate attributes are:
– Corrective Maintenance – Ability of the system to be brought back to a state of
normal function or utility, at any level of repair, when using prescribed
procedures and resources.
– Maintenance Burden – Measure of the maintainability parameter related to
item demand for maintenance manpower.
– Built-in-Test (BIT) – An integral capability of the mission system or equipment
which provides an automated test capability to detect, diagnose, or isolate
failures.
Operations and Support Cost Attribute - Provides balance to sustainment solution
by ensuring that total O&S cost across the projected life cycle associated with
availability and reliability are considered in making decisions. Note: Logistics
Reliability is a fundamental component of O&S cost as well as Materiel Availability.
Note: For complex systems and System of Systems (SoS), the Sustainment KPP and supporting
Reliability attribute are to be applied to each major end item or configuration item, and
whenever practical, to the system/SoS as a whole.
The Government R&M engineer should assist the Resource Officer in establishing basic
sustainment KPP and KSA/APA requirements for the AoA and ICD and numeric user
sustainment KPP and KSA/APA requirements in the CDD. Per the JCIDS, the Resource
Officer provides OMS/MP and architectural views to better define operational capability. If
the Resource Officer decides not to include the Sustainment KPPs, the JCIDS should, at a
minimum, provide sufficient readiness and mission capability information to enable
acquisition R&M engineers to derive values for R&M metrics. These derived R&M metrics
values will be documented in a Government performance specification and interface
control documents and included in contract specifications.
The user (resource officer and operational tester), Systems Engineering, R&ME, and
logistics managers must develop failure definitions and scoring criteria (FD/SC). The
FD/SC provides a clear, unambiguous definition of what constitutes a failure (FD) and how
each failure counts against the R&M metrics (SC). The FD/SC provides a means for
problems to be identified as failures when they occur and identified as critical/non-
critical/operator induced, or other necessary categories such that they can be scored
properly against requirements.
FD/SC is placed into the TEMP to ensure that failures are properly identified during testing
to score and report sustainment metrics. The definition of all categories of failures is
important to reduce ambiguity in determining the performance of systems during all
phases of testing. Finally, the FD/SC must be placed into the reliability and maintainability
review board charter to ensure that the sustainment KPP is recorded and reported
properly during systems engineering technical reviews, and that corrective action, which
are most critical, are prioritized for the PM.
Only one set of operational FD/SC should be developed and maintained in accordance with
the SECNAVINST 5000.2G [Ref 4]. FD/SCs should be consistent for all systems installed on
a platform or integrated together as a SoS. The operational FD/FC may be supplemented
for evaluating contract compliance with performance and interface specifications.
JCIDS requirements are to be developed by the Government during the AoA and validated
against the warfighter’s mission capability needs. The warfighter’s Sustainment KPP
requirements should be validated by a Government R&M engineer, logistics support
manager and cost engineer for each program by performing and developing a Reliability,
Availability and Maintainability – Cost (RAM-C) rationale study and report in accordance
with the most recent DOD RAM-C Rationale Report guidance [Ref 21]. The R&M engineers
should work with the PSM and cost engineers to balance the optimum sustainment cost
with feasible and affordable reliability and maintainability requirements.
R&ME activities and technical requirements should be a part of all contracts, including
performance-based contracts for design, development, and production of defense materiel.
Materiel Availability (Am): The Availability KPPs are unique for each program and
describe the total end items that are required to support the warfighter’s needs. Materiel
Availability must be translated into a total quantity of end items needed including any
spares that will be needed given that some items will not be available for operational
tasking due to training and research needs, as well as items that will be out of service for
repair. Translating the Materiel Availability KPP into a total quantity requires the
If an operating profile is not contained within the OMS/MP, then reliability engineers must
extract and document this information from other sources such as the CONOPS and the
LCSP. A system level OMS/MP is prepared by the Government and included in the system
performance specification to allow the developer to understand expected usage rate of all
of the functions to design for R&M. The acquisition R&M engineers must ensure a
composite OMS/MP covering anticipated mission and environmental profiles is prepared
to enable the derivation and evaluation of the design specifications. Failure to clearly define
an OMS/MP will result in assumptions on the warfighter’s usage requirements and may
result in a system being down (even during an operational mission) more for maintenance
than originally required and not meet the Operational Availability component of the
Sustainment KPP.
Once the OMS/MP is complete, and the system’s operating profile is defined in support of
all mission areas, engineers must then document all functions which are mission
critical/essential and which functions are not. Failure of any function can result in the
system becoming non-mission capable, partially mission capable, or to remain fully mission
capable. From these definitions, mission critical/essential functions can be defined, and
placed into contract specifications, to allow developers to identify mission critical/essential
items and deliver a critical items list. The critical items list will be used to ensure logistics
support is properly planned for those components in terms of organizational, intermediate
or depot level spares, and to properly plan organizational maintenance tasking.
The Government must then translate the user Ao requirement from Uptime and Downtime
to something measurable for design and development and prior to fleet operations. In
general, the interval of interest is calendar time, but this can be broken down into other
intervals of active time and inactive time. Active time contains Uptime and Downtime,
while inactive time can normally be considered neutral time or when the item is in storage
or the supply pipeline. Uptime and Downtime in the Ao equation are intended to describe
system operating and non-operating periods once deployed. Uptime is that element of
active time during which an item is in condition to perform its required functions. Uptime
may include time that the equipment is operating, in standby or off and downtime
generally does not include time for preventive maintenance. Downtime is that element of
active time during which an item is in an operational inventory but is not in condition to
perform its required function.
Figures 13 and 14 show examples of Uptime and Downtime for a ship to provide guidance
of how they must be tailored for continuously operated systems or intermittently operated
systems. It is important to understand how Ao will be measured so that a translation to a
procurement specification can be made.
Figure 14: Operational Availability for Intermittently Operated System / One-shot System
Figure 15 shows examples of various operating and non-operating conditions that may be
included in Uptime and Downtime definitions. Some programs may use neutral time to
define periods of time that will not be included in either Uptime or Downtime and thus
exclude these periods of time from the Ao definition. Inactive time may be considered
neutral time when an item is in reserve and not in an operating state. Neutral time is used
to eliminate specific periods of time over calendar time that will be excluded from the Ao
equation and will not be counted as either Uptime or Downtime. Neutral time may be a
weekend, or the time periods when repairs are halted due to holidays. Neutral time can
account for time between operating periods when a system is intermittently operated or
used only occasionally and thus availability does not apply over the entire calendar year.
Neutral time can also be used during test events when testing is halted or stopped. Using
neutral time makes a test event look more like an operational event because those periods
of time when testing is halted are excluded from calendar time in the Ao equation. More on
how Uptime and Downtime are used and affect the Ao equation can be found in MIL-HDBK-
338B [Ref 31].
All Time
Inactive
Time
(InT)
Active Time
(AcT)
Maintenance Administrative
Time and Logistics
(MT(Dn)) Down Time
(ALDT)
Maintenance Maintenance
Time Time
(MT(Up)) (MT(Msn))
Mission Time
Corrective
Maintenance
Time
Failure – Loss of Mission Failure – Non-Mission Essential
Essential Function Essential Function Maintenance
(Non-Deferrable) (Deferrable) Action (EMA)
Crew Correctible
Maintenance
Action (CCMA)*
*Fixed by the crew using onboard tools, equipment, and spares within the specified time limit.
Absent careful attention to the Requirements process discipline, reliability may not be
treated as a performance parameter and hence a design criteria. Consequently, the
developer must use
Logistics-based USER DEVELOPER LOGISTICIAN
metrics to
demonstrate the MTBF
ability to achieve Ao. As Ai = MTBF + MTTR
shown in Figure 16,
the solution is to focus
Ao = MTBF
on design-controllable MTBF + MTTR + MLDT
MTBF and MTTR (Ai),
Hardware/Software Logistics System
in the requirement Design Considerations Design Considerations
generation,
decomposition, and Figure 16: MLDT is Not a Design Criteria
design process. Thus,
Mean Logistics Delay Time (MLDT) remains an integrated logistic support (ILS) item, not a
design topic.
ASN (RDA), Component DASNs, SYSCOM technical authorities, and reliability SMEs are well
versed in this process and available to support the PM as needed to ensure reliability and
maintainability are treated as design requirements.
Use of R&M measures, time-based R&M metrics provide the contractor with objective,
quantifiable criteria to guide the system design, and engineering and manufacturing
process. By requiring that all R&M metrics are allocated to, and included in, all system
subcontracts (flowed down), the PM will assure that any trade analysis will be supported in
a consistent manner, without surprises, and that testable provisions exist at all levels.
Deficiencies will be promptly identified at the source, not subsequently at integrated
system levels.
the system, when multiplied together, the resulting Ao for the system would be much lower
than the required Ao for the system. As a result, when multiple subsystems are being
integrated, each subsystem’s Ao will need to be much higher than the warfighter’s required
system Ao simply because all subsystem Ao’s must be multiplied together to properly
measure and achieve the warfighter’s Ao requirement shown in Figure 17.
n
Reliability Attribute
Reliability and maintainability requirements in performance and contract design
specifications should be identified as critical technical requirements (CTRs) for all
contracts. Reliability performance and contract specifications should be testable and
verifiable and in a form that the developer (Government or contractor) can demonstrate
prior to delivery of the equipment to the Government acquisition office. Requesting that the
developer demonstrate a Mean Time Between Operational Mission Failures (MTBOMF) for
example may not be practical especially when the developer will not be testing or
demonstrating mission operations or success. Placing an operational mission requirement
into a Government RFP may require that the developer demonstrate the requirement by
analysis only. Translating a MTBOMF into a simple failure rate (failures/hour) or MTBF
(time/failure) is typically the most practical method of specifying a reliability specification
when the developer (Government or contractor) is not being asked to analyze or
demonstrate mission capabilities. Suppliers can deliver parts (electronic, mechanical, or
other COTS components) that meet MTBF requirements, but those parts cannot be
guaranteed to meet MTBOMF because MTBOMF is a system level measure. When
translating user reliability requirements into Government performance specifications,
interface control documents, and contract specifications, R&M engineers must consider
two types of failures: 1) Predictable component and subcomponent failures, and 2)
Unpredictable operationally-induced failures.
Component and subcomponent failures are typically predictable because they generally fall
within their design expected failure rate. While failed subcomponents (“piece parts”) are
not repairable when they occur, their failure rates are directly translatable to their failure
rate requirements and ultimately the failure rate requirement of the component.
To anticipate the effect operationally-induced failures may have on the overall mission
profile, R&M and systems engineers should conduct a function level FMECA to assess the
level of risk expected from new technologies, untested environmental effects, and
integration and interoperability of the equipment used in the design. Based on this
analysis, user reliability KSA/APAs can be more accurately defined for optimum
mission success.
Parts (22%): Failures resulting from a part failing to perform its intended function
before its expected “end-of-life” (or wearout) limit is reached (random failures,
typically based on part quality variability issues).
Wearout (9%): Failures resulting from “end-of-life” or “age related” failure
mechanisms due to basic device physics (e.g., mechanisms associated with
electrolytic capacitors, solder joints, microwave tubes, switch/relay contacts, etc.).
System Management (4%): Failures traceable to incorrect interpretation or
implementation of requirements, processes or procedures; imposition of “bad”
8Nicholls, David and Lein, Paul, “When Good Requirements Turn Bad," 2013 Proceedings Annual Reliability and
Maintainability Symposium (RAMS), 2013, pp. 1-6, DOI: 10.1109/RAMS.2013.6517616 [Ref 49].
reliability and human factor reliability models that do exist, but software reliability
requirements are not always adequately specified in contracts, and human factor
requirements are rarely, if ever, called out). For these reasons, predicted component failure
rates are insufficient and must predict performance 70% higher than what you would
expect to see in the fielded product.
Some contracts will require that the developer demonstrate reliability performance during
factory acceptance testing in the engineering and manufacturing development phase, long
before operational test and evaluation. Factory acceptance testing will not account for
induced failures due to the operational environment and from interoperability with the
entire system. For this reason, specifying reliability requirements in contracts must take
into account which of the eight failure-cause categories will not be accounted for in the
developer’s predictions.
In this example, if the contract will be specifying a reliability requirement that will require
the developer to demonstrate reliability using MIL-HDBK-217 predictions or similar
reliability handbooks:
The Government performance specification should require a failure rate that is 70%
lower than what is required in the field, or a 70% higher MTBF, since MTBF is the
inverse of the failure rate.
If the system design is evolutionary where there are years of data to predict performance of
the existing hardware, there are minor changes in the design, and the contract will require
the developer to use a combination of field data and MIL-HDK-217 predictions:
The Government performance specification should require a failure rate that is 41%
lower than the expected fielded performance. This is because failures due to
manufacturing, software, design, system management will be accounted for in the
factory acceptance test but failures due to wearout, induced failures and those
identified as “no defect” will not.
Methods of controlling the developer sources for reliability data can be used such as
defining a priority list of data sources based on risk. The best data sources will come
directly from the Fleet when there are years of evidence of performance of the equipment.
Higher risk data sources, such as MIL-HDBK-217, will require adjustments to be made in
the predicted failure rates. A priority list of data sources is described below to assist with
requiring adjustments to the predicted failure rates.
perform actual life cycle testing or reliability testing. An attempt should be made
to understand what the manufacturer is advertising as the failure rate and apply
conservatism in using the failure rate. An adjustment to the failure rate from
40% to 60% is recommended.
As the fidelity of the source of data diminishes as described above, the developer’s
reliability predictions should contain methods to translate the data to accommodate the
level of risk being assumed with the source of data.
Reliability Allocations
Reliability requirements, once translated into contract specifications, must be allocated by
the Government into several contract specifications or between GFE and CFE. OM/MP
engineers can determine the operating duration of each function during a mission and
develop reliability block diagrams to assist with calculating the appropriate system-level
failure from subsystem and component-level failure rates. Failure rate allocations can be
determined by the amount of time that a system must operate during a mission, and from
those allocations reliability block diagram complexity can be determined. More information
on calculating failure rates can be found in many standard reliability textbooks and in the
DOD guide to achieving RAM of 2005. Once allocations are completed, subsystem failure
rates (failures/hour) can be directly added together to meet end items or system level
failure rate requirements where MTBF must be inverted into a failure rate (failures/hour)
prior to addition.
MIL-HDBK-217 [Ref 34] provides common metrics that apply to a manufacturer’s failure
rate based on its expected operating environment. However, MIL-HDBK-217 predicted
failure rates are solely based on “piece part” failure rates predicted from bench testing in a
pristine environment and will not represent all suppliers and sources of material. The use
of COTS requires extensive testing in the expected operating environment to gain
confidence that the equipment is compatible and reliable for military needs.
Durability and material properties should be specifically considered in the mandatory early
FMECA required by PDR and in the root cause analysis phase of the mandatory FRACAS
that is done throughout the life of the system.
Maintainability Data
Maintainability predictions can be managed similarly to reliability data when attempting to
determine the MTTR. Maintainability predictions must be made even when no data exists
or when no testing is planned. In these extreme conditions, engineers will need to
qualitatively assess the level of effort required by maintenance personnel when making
maintenance predictions. An effort should be made to understand the difficulty in
performing repairs and maintenance. When maintenance data is determined from
analytical 3D models demonstrating the repair, a risk assessment like translating reliability
data sources can be used. When maintenance data will be obtained from a maintenance
demonstration, engineers should attempt to understand the effects of performing the
demonstration in the actual location where it will be used by the operator, or if the
demonstration will be performed on a bench or at a factory where access to the equipment
may be unrestricted. In this case, attempts to increase the MTTR should be made.
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀
𝐴𝐴𝑜𝑜 =
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 + 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 + 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀
Or reorganized to:
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 =
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀
𝐴𝐴𝑜𝑜 − 1 − 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀
This equation assumes that the program has allocated Ao and MTBF, and can determine the
appropriate MLDT to assume for each subsystem.
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑥𝑥 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝑆𝑆1)
𝑀𝑀𝑀𝑀𝑀𝑀𝑅𝑅(𝑆𝑆1) =
𝑛𝑛 𝑥𝑥 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀
Where:
MTTR(S1) is the mean time to repair of the system
MTTR is the mean time to repair for the entire system
MTBF(S1) is the mean time between failures for subsystem 1
MTBF is the mean time between failures for the entire system
n is the total number of subsystems
This method is independent of operational availability since it is known that the system-
level MTTR will support the Ao requirement.
Top-level
MTTR Requirement
15 hours MTTR
Subsystem 1 Subsystem 2
MTTR Requirement MTTR Requirement
14 hours MTTR 14 hours MTTR
In Figure 20, a one-hour margin is used as an example for placing the top-level MTTR
requirement on several contracts or between subsystem 1 and subsystem 2.
System boundaries should be defined with any excluded (legacy and/or GFE) equipment
specifically identified.
The terms and parameters above should be explicitly defined to clarify seemingly common
terms that create recurring problems due to unclear meanings, such as time or cycle
parameters. For example, time parameters must clarify or differentiate between flight hour
versus operating hour, or operating hours versus power on or standby hours. Aviation
operating days (12 hours) versus 24 hours days must be reconciled, and requirements or
technical parameters adjusted accordingly.
6 | RELIABLE SOFTWARE
ORIGIN
Hardware reliability engineering was first applied in military applications during World
War II to determine the probably of success of ballistic rockets. Throughout the 1950s, life
estimation methods for mechanical, electrical, and electronic components were created and
used in the development of military products. By 1960s the practice of life estimation of
products had proven integral to developing successful military and commercial systems.
These new methods were grouped under the name of Reliability Engineering. Reliability
Engineering evolved from an understanding of physical components, their arrangement in
the system, and how their interaction supports the functions of the system. At that time,
software, although present and critical in some systems, was not part of
reliability engineering.
Early software was utilized in systems to execute basic programs quickly and accurately,
often numerical calculations too numerous or complex to be done manually. Once
developed and tested, the software was simple enough to be depended on to perform
100% consistently. This meant it was 100% reliable and therefore not a consideration in
the system reliability analysis. The term software reliability was first coined in the 1970s as
an evolution of software quality efforts of software engineers wanting to improve the
reliability of their software. Software and software development has evolved at an ever-
increasing pace since then, and the need for reliable software has and will continue
to increase.
PRESENT
Today, system reliability is not only affected by the hardware in the system, but also by the
software. Software is installed in the hardware of nearly all military systems. This software
includes executable programs, operating systems, virtual environments, and firmware.
Increasingly system functions are dependent on the interaction of hardware and software.
It is rare, and becoming rarer, to find a system that contains no software. Any time software
supports or performs a system function, the reliability of that software’s impact to the
system should be considered as part of system reliability analysis.
FUTURE
Increased use and reliance on digital engineering technologies will make it possible to
evaluate the reliability of the system more quickly and more accurately. Models used for
system development and realization will be evolved into operational models (digital
twins). In the future, system reliability will be evaluated in the digital model which will
include all relevant interactions between the hardware and software. Such a complete
digital model will display the impact of proposed changes to system reliability in real-time.
Operational reliability models will be perfected from the design models and will enable
prognostic capabilities that optimize system availability and maximize mission readiness.
The probability that software will not cause the failure of a system for a specified time
under specified conditions.
The ability of a program to perform a required function under stated conditions for a
stated period of time.
Notice that the spirit of both the software and hardware reliability definitions are the same;
however, some of the language has been adjusted to account for the fundamental
differences between them. Also, notice the use of “time” as a relevant factor in all the
equations. Time refers to time elapsed in the physical environment (hardware) and not the
virtual environment (software). One may ask why the software reliability definitions
include time (physical world). The answer: only physical time is relevant to evaluation of
system function in the operating environment. More simply, system users live in the
physical world so both software and hardware reliability must be represented in a way that
shows the impact of a loss of functionality to the user in the physical world. Since hardware
exists in the physical world, the conversions are based on usage (mission) profiles (e.g.,
converting miles requirement to a time requirement). On the other hand, software does not
change or degrade over time, so quantifying the functional time of software in the physical
world is a matter of determining how often existing errors, defects, or bugs present
themselves and cause the system to lose functionality.
The RBD above is meant to draw attention to the various engineering disciplines that a
system relies upon to provide a function. The reliability engineer cannot be a specialist in
all engineering disciplines, so to develop a meaningful reliability block diagram the
reliability engineer must rely on engineering analyses performed by engineers of those
respective specialties. Each of the elements could be further decomposed into sub-
elements as necessary to support the needs of the analysis.
failure modes, root causes from software viewpoint: requirements, design, code or other
artifacts. Below are some compelling purposes of conducting a SFMEA. 10
Identifying serious problems before they impact safety: The complexity of modern
software means testing cannot be depended on to exhaust all paths and combinations of
inputs that result in system failure
Uncovering multiple instances of one failure modes: The bottom-up approach provides
the ability for entire types of failures to be eliminated if a corrective action is applied at the
failure mode level since one failure mode could cause several instances of failures.
Finding software failure modes that of difficult to find in testing: Hidden or latent
failure modes are those failure modes that aren’t observed during development or testing
but can become known one the software is operational. Some failure modes are simply
more visible when looking at the requirements, design, code, etc., then by testing.
Finding single point failures: Particularly single point failures that cannot be mitigated by
restarting, workarounds, hardware redundancy or other hardware controls.
SFMEA combined with design or code review can improve the focus of the reviews:
During design and code reviews, it is typical for the reviewers to focus on what the
software should do. A SFMEA focuses on the design or code should not do so combining the
SFMEA with a design and code review increases the cost and effectiveness of both.
Providing a greater understanding of both the software and the system: Executing a
SFMEA may be tedious, but when done properly there is an improvement in the overall
understanding of the system and software. It is often an eye-opening experience for
software engineers.
TELEMETRY
The nature of software requires fundamentally different techniques to analyze, detect, and
test failures. Software offers one major advantage over hardware; it can be tested to failure
repeatedly without requiring additional test artifacts or repairs. Because of this, a key tool
for ensuring reliable software is telemetry (sometimes referred to as “software
instrumentation.”) Instrumenting software with the ability to detect and report on failures
10 Excerpted from Neufelder, Ann Marie, “Effective Application of Software Failure Modes Effects Analysis,” [Ref 51].
6 | RELI ABLE SOFTWARE | 93
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
allows software reliability to be measured and managed. Telemetry allows the “virtual
environment” of the software to be monitored and measured. Telemetry provides the
scaffolding to do fault insertion testing (all can be called “chaos testing”) allowing the
reliability and software engineers to understand failure conditions and impact. Since
software can be tested repeatedly without need to procure more components or perform
repairs, running multiple tests with and without fault injection will allow for software
reliability to be characterized.
Software telemetry is conceptually similar to built-in-test (BIT) for hardware and can
provide many of the same advantages. Instrumenting software early in the design provides
insight into failures that occur both in test and operational environments. Also similar to
hardware BIT, software telemetry can increase system maintainability in the areas of
troubleshooting paths and start points, automated readiness testing, mission readiness
status, and identification of failed or failing hardware. Telemetry is best when designed
into the software from the start and evolved throughout the system life cycle. A SFMEA
conducted early in the design provides valuable information in selecting the software
components that will be instrumented. SFMEAs help decide how to utilize the scarce
system resources to create optimal instrumentation coverage approaches by identifying
the most critical or troublesome failures (or potential failure conditions).
OBJECTIVE
Develop R&ME guidance, along with associated contract language, for defining, estimating,
analyzing, testing, and identifying occurrences of software failures (that would occur) in an
operational (field) environment. The approach is to use DevSecOps (Development, Security,
and Operations), Iterative, and Agile Practices to deliver reliable software. All types of
software are within the scope of this effort (e.g., application, cloud computing, fog
computing, edge computing, embedded, and firmware in certain instances). It includes
software acquired through all acquisition pathways (e.g., DoDI 5000.75 [Ref 53], DoDI
5000.85 [Ref 54], and DoDI 5000.87 [Ref 55]).
GOALS
Define acceptable system metrics supported by R&ME to measure and evaluate
(define how software related failures impact current R&ME system metrics and
establish guidance for failure definition and scoring criteria (FD/SC) development).
Effectively implement R&ME into software development programs by emphasizing
the use of DevSecOps as a key for reliable software. This includes development and
methods of gathering operational software performance metrics to identify,
characterize, and address or correct software failures through CI/CD (continuous
integration/continuous delivery) updates.
Enhance programs’ ability to contract for reliable software and effectively evaluate
the risks of contractor’s proposal to achieve reliable software.
Differentiate roles and responsibilities for reliability, software, development, safety,
certification, security, and operations. Describe interface between each role.
Explore the concept of architecting software using design patterns that incorporate
reliability concepts to build software that is more failure resistant and fault tolerant.
Reduce the occurrence or impact of software failures during operations.
DELIVERABLES
Guidance for specifying, developing, and assessing reliable software.
Contract language and guidance on implementation (including tailoring) for
delivering reliable software.
Guidance for evaluating proposals for reliable software (Government only).
7 | SCORECARD/CHECKLIST
INTRODUCTION
Evaluation of the R&ME Program is an important step to understanding its health. A
detailed evaluation of the maturity of the R&ME Program provides valuable information
that should be used to determine where effort should be placed to bring the reliability
program to a state that it supports the overall program goals. Utilizing a standardized
scorecard ensures a repeatable, methodological approach of the evaluation.
Standardization and repeatability enable comparison between past and present states of
health, therein providing important decision information to shape the program to meet
future state goals.
The DON is developing an R&ME scorecard that provides such a standardized, repeatable
method to evaluate the maturity of the Reliability and Maintainability Program for SETR
events or periodic reviews over the acquisition life cycle. The Naval R&ME scorecard will
guide the user in the evaluation of the R&ME Program across four phases of the program
life cycle. It will enable reliability engineers and program managers the ability to perform a
reliability and maintainability program self-evaluation by providing scores to a question
set for each sub-area and phase. The scores are then combined to provide an overall
maturity index and grade percentage for each sub-area. The scores for each sub-area are
used to calculate the combined score for the phase, and the scores for each phase are
further combined to determine an overall R&ME Program score. The phases and respective
sub-areas that will be included in the R&ME scorecard are listed in Table 4.
PHASE SUB-AREA
Design Operational Mode Design Reviews
Summary/Mission Profile Spec Development Allocation/
Design Requirements Validation
Trade Studies Prototype Development
Design Process for Reliability and Review
Design Analysis Prepare Design Requirements
Documents
Parts and Materials Selection
Quality Assurance (QA)
Software Design
Built-in-Test
7 | SCORECARD CARD/CHECKLIST | 97
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
PHASE SUB-AREA
Test Integrated Test Plan Design Limit
Failure Definition Scoring Life
(and FMEA/FMECA) Test, Analyze, and Fix (TAAF)
Software Test TEMP Development/Execution
Production Piece Part Control Defect Control
Requirements Flow Down - Manufacturing Screening
Subcontractor Control
Supportability- Sustainment/Provisioning Analysis Spares
Logistics
Maintenance/Manpower Ratio Technical Manuals
Support and Test Equipment Logistics Analysis/
Training Materials and Documentation
Equipment
SCORING
The basis for the effectiveness of the scorecard are the consistent and accurate responses
to the probing questions for each sub-area. The questions included in the scorecard
template will be based on existing policy and guidance and the best practices of other
referenced materials; however, the template will provide options for tailoring the question
set to meet the needs of the user. Similar to the way a FMEA should not be performed as the
effort of a single individual nor should the scoring in of the R&ME program be done as the
effort of one person. The best practice is to organize a group that will evaluate and present
objective quality evidence to support the recommended score for each question. This
approach will ensure that when completed the final scores will represent the consensus of
the group and provide an accurate estimation of the efficacy of the R&ME program.
The evaluation process requires that each question be scored from 1 to 3. The score
provided represents the group’s opinion of how well the program is complying with the
detailed criteria of the question. The group will determine the Compliance Value (CV) for
each question using scoring values in Table 5.
98 | 7 | SCORECARD CARD/CHECKLIST
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
The Sub-area Maturity Index (SMI) is the calculation of the maturity of each sub-area for
each phase. The SMI is calculated by averaging the Compliance Values provided by the
group for all questions within a specific sub-area using the equation below:
∑𝑛𝑛𝑖𝑖=1 𝐶𝐶𝐶𝐶𝑖𝑖
𝑆𝑆𝑆𝑆𝑆𝑆 =
𝑛𝑛
Where:
The Phase Maturity Index (QMI) is calculated by averaging the values of the SMIs within the
respective phase (Design, Test, Production, or Sustainability / Supportability). It is
calculated using the equation below:
∑𝑛𝑛𝑖𝑖=1 𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖
𝑄𝑄𝑄𝑄𝑄𝑄 =
𝑛𝑛
Where:
The Program Maturity Index (PMI) is calculated by averaging the values of the four QMIs
(Design, Test, Production, or Sustainability/Supportability). It is calculated using the
equation below:
∑4𝑖𝑖=1 𝑄𝑄𝑄𝑄𝑄𝑄𝑖𝑖
𝑃𝑃𝑃𝑃𝑃𝑃 =
4
A common maturity scale, applied across all three evaluation levels, allows for a universal
comparison of the R&ME maturity at all three levels (program, phase, sub-area). The
maturity index scale is shown in Table 6.
7 | SCORECARD CARD/CHECKLIST | 99
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
manual calculations. The Excel template will also be able to store the results for up to three
user-defined milestones to establish an historic record of the progress or regress of the
R&ME program. The Excel template will have conspicuously marked user-definable fields
to enable tailoring as needed to meet the needs of different Naval organizations, programs,
or system types.
APPENDIX A | REFERENCES
1. GAO 20-151, “Defense Acquisitions: Senior Leaders Should Emphasize Key Practices
to Improve Weapon System Reliability,” Report to the Committee on Armed Services,
U.S. Senate, January 2020.
2. Title 10, United States Code, Section 2443, “Sustainment Factors in Weapon System
Design,” 31 January 2019.
3. GAO-20-2, “Navy Shipbuilding: Increasing Focus on Sustainment Early in the
Acquisition Process Could Save Billions,” Report to the Committee on Armed
Services, U.S. Senate, March 2020.
4. SECNAVINST 5000.2G, “Department of the Navy Implementation of the Defense
Acquisition System and the Adaptive Acquisition Framework,” 08 April 2022.
5. Office of the Deputy Assistant Secretary of Defense for Systems Engineering,
“Department of Defense: Digital Engineering Strategy,” June 2018.
6. Deputy Assistant Secretary of the Navy Research, Development, Test and Evaluation,
“U.S. Navy and Marine Corps Digital Systems Engineering Transformation
Strategy,” 2020.
7. “Operational Availability Handbook: A Practical Guide for Military Systems, Sub-
Systems and Equipment,” Published by the Office of the Assistant Secretary of the
Navy (Research, Development and Acquisition), NAVSO P-7001, May 2018.
8. National Research Council 2015. “Reliability Growth: Enhancing Defense System
Reliability.” Washington, DC: The National Academies Press, page 112.
https://doi.org/10.17226/18987.
9. Dallosta, Patrick M and Simcik, Thomas A. “Designing for Supportability: Driving
Reliability, Availability, and Maintainability In...While Driving Costs Out.” Defense
AT&L: Product Support Issue, March-April 2012, page 35.
10. CJCSI 5123.01I, “Charter of the Joint Requirements Oversight Council and
Implementation of the Joint Capabilities Integration and Development System
(JCIDS),” 30 October 2021.
11. Reliability Information Analysis Center (RIAC), “System Reliability Toolkit: A
Practical Guide for Understanding and Implementing a Program for System
Reliability,” 15 December 2005.
12. Commander Operational Test and Evaluation Force, “Operational Suitability
Evaluation Handbook,” 26 March 2019.
13. Marine Corps Operational Test and Evaluation Activity (MCOTEA), “Operational Test
& Evaluation Manual,” Third Edition, 22 February 2013.
14. MIL-STD-721C, “Definitions of Terms for Reliability and Maintainability,” 12
June 1981.
15. ISO/IEC 25023:2016, “Systems and software engineering – Systems and software
Quality Requirements and Evaluation (SQuaRE) – Measurement of system and
software product quality,” 15 June 2016.
16. DoDI 5000.88, “Engineering of Defense Systems,” Office of the Under Secretary of
Defense for Research and Engineering, 18 November 2020.
17. DoDI 5000.91, “Product Support Management for the Adaptive Acquisition
Framework,” Office of the Under Secretary of Defense for Acquisition and
Sustainment, 4 November 2021.
18. The Assistant Secretary of the Navy (Research, Development and Acquisition).
Memorandum For Distribution, Subject: “Gate 7 Sustainment Reviews,” 27
September 2021.
19. “Department of Defense Reliability and Maintainability Engineering Management
Body of Knowledge,” (DOD RM BoK) Office of the Deputy Assistant Secretary of
Defense Systems Engineering, August 2018.
20. “Systems Engineering Guidebook,” Office of the Under Secretary of Defense for
Research and Engineering, Washington, D.C., February 2022.
21. “Reliability, Availability, Maintainability, and Cost (RAM-C) Rationale Report Outline
Guidance, Version 1.0,” Office of the Deputy Assistant Secretary of Defense for
Systems Engineering, 28 February 2017.
22. “Engineering of Defense Systems Guidebook,” Office of the Under Secretary of
Defense for Research and Engineering, Washington, D.C., February 2022.
23. “Department of Defense Systems Engineering Plan (SEP) Outline, Version 4.0,” Office
of the Under Secretary of Defense for Research and Engineering, Washington, D.C.,
September 2021.
24. Office of the Assistant Secretary of Defense. Memorandum for Assistant Secretaries of
the Military Departments Directors of the Defense Agencies, Subject: “Life-Cycle
Sustainment Plan (LCSP) Outline Version 2.0,” 19 January 2017.
25. “Director, Operational Test and Evaluation (DOT&E) Test and Evaluation Master Plan
(TEMP) Guidebook,” Version 3.1, 19 January 2017.
https://www.dote.osd.mil/Guidance/DOT-E-TEMP-Guidebook
26. T9070-BS-DPC-010_076-1 Reliability and Maintainability Engineering Manual, 21
Feb 2017.
102 | A P P E N D I X A | REFERENCES
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
104 | A P P E N D I X A | REFERENCES
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK
HM Health Management
HW Hardware
ICE Independent Cost Estimate
ICD Initial Capabilities Document
ILA Independent Logistics Assessment
ILS Integrated Logistic Support
IOC Initial Operational Capability
ITRA Independent Technical Review Assessment
JCIDS Joint Capability Integration and Development System
JRMET Joint Reliability and Maintainability Evaluation Team
JTTI Joint Training Technical Interoperability
KPP Key Performance Parameter
KSA Key System Attribute
LCC Life Cycle Cost
LCSP Life Cycle Sustainment Plan
LFT&E Live Fire Test and Evaluation
Logistics Reliability is the measure of the ability of an item to
operate without placing a demand on the logistics support
Logistics
structure for repair or adjustment, including all failures to the
Reliability
system and maintenance demand as a result of system operations.
(RL) [Note: Logistics Reliability is a fundamental component of an O&S
cost as well as Materiel Availability.] (JCIDS 2021)
LRFS Logistics Requirements and Funding Summary
LRIP Low Rate Initial Production
LRU Line Replaceable Unit
LSE Lead Systems Engineer
M Maintainability
Maintainability is the measure of the ability of the system to be
Maintainability brought back to a readiness status and state of normal function.
Attribute [Note: Subordinate attributes which may be considered as KSAs or
[KSA or APA] APAs: 1) Corrective Maintenance, 2) Maintenance Burden, and 3)
Built in Test.] (JCIDS 2021)
Maintenance Burden is a measure of the maintainability parameter
Maintenance related to item demand for maintenance manpower. It is the sum
Burden directed maintenance man hours (corrective and preventive),
divided by the total number of operating hours. (JCIDS 2021)
OH Operating Hour
OMF Operational Mission Failure
OMS Operational Mode Summary
Operational Operational Availability is the measure of the percentage of time
Availability that a system or group of systems within a unit are operationally
(Ao) capable of performing an assigned mission and can be expressed as
[KPP] (uptime/ (uptime + downtime)). (JCIDS 2021)
OT Operational Testing
OTA Operational Test Agency
OT&E Operational Test and Evaluation
P&D Production and Deployment
PBL Performance Based Logistics
PDR Preliminary Design Review
PHM Prognostic and Health Management
Program Manager
PM
or Preventive Maintenance
PMI Program Maturity Index
PRAT Production Reliability Acceptance Testing
PSM Product Support Manager
QA Quality Assurance
R Reliability
R&D Research and Development
R&ME Reliability and Maintainability Engineering
RAM Reliability, Availability and Maintainability
RAM-C Reliability, Availability, Maintainability – Cost
RBD Reliability Block Diagram
RCM Reliability Centered Maintenance
APPENDIX B | GLOSSARY & REFERENCE GUIDE | 109
SECNAV RELIABILITY AND MAINTAINABILITY ENGINEERING GUIDEBOOK