Astm E2862-23
Astm E2862-23
for the
Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
Designation: E2862 − 23
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
1
E2862 − 23
3.3.3.1 Discussion—According to the formula in MIL- 5.4 Prior to performing the analysis it is assumed that the
HDBK-1823A, ap/c is a one-sided upper confidence bound on discontinuity of interest is clearly defined; the number and
ap, ap/c represents how large the true ap could be given the distribution of induced discontinuity sizes in the POD speci-
statistical uncertainty associated with limited sample data. men set is known and well-documented; discontinuities in the
Hence ap/c > ap. Note that POD is equal to p for both ap/c and POD specimen set are unobstructed; and the POD examination
ap. ap is based solely on the hit/miss data resulting from the administration procedure (including data collection method) is
examination and represents a snapshot in time, whereas ap/c well-designed, well-defined, under control, and unbiased. The
accounts for the uncertainty associated with limited sample analysis results are only valid if convergence is achieved and
data. the model adequately represents the data.
4. Summary of Practice 5.5 The POD analysis method described herein is consistent
4.1 In general, the POD examination process is comprised with the analysis method for binary data described in MIL-
of specimen set design, study design, examination HDBK-1823A, and is included in several widely utilized POD
administration, statistical analysis of examination data, docu- software packages to perform a POD analysis on hit/miss data.
mentation of analysis results, and specimen set maintenance. It is also found in statistical software packages that have
This practice is focused only on and describes step-by-step the generalized linear modeling capability. This practice requires
process for analyzing nondestructive testing hit/miss data that the analyst has access to either POD software or other
resulting from a POD examination and includes minimum software with generalized linear modeling capability.
requirements for validating the resulting POD curve and 5.6 This practice does not apply to hit/miss data resulting
documenting the results. from a POD examination based on the Point Estimate Method
4.2 This practice also includes definitions and discussions (PEM), also referred to as the “29 out of 29” method. (See
for results of interest (for example, a90/95) to provide for X1.2.4.5 for more detail.)
correct interpretation of results.
4.3 Definitions of statistical terminology used in the body of 6. Procedure
this practice can be found in Annex A1. 6.1 The POD analysis objective shall be clearly defined by
4.4 A more general discussion of the POD analysis process the responsible engineer or by the customer.
can be found in Appendix X1.
6.2 The analyst shall obtain the hit/miss data resulting from
4.5 An example POD analysis using simulated data can be the POD examination, which shall include at a minimum the
found in Appendix X2. documented known induced discontinuity sizes, whether or not
4.6 A mathematical overview of the underlying model the discontinuity was found, and any false calls.
commonly used with hit/miss data resulting from a POD 6.3 The analyst shall also obtain specific information about
examination can be found in Appendix X3.
the POD examination, which shall include at a minimum the
5. Significance and Use specimen standard geometry (for example, flat panels), speci-
men standard material (for example, Nickel), examination date,
5.1 The POD analysis method described herein is based on number of inspectors, type of inspection method (for example,
a well-known and well established statistical regression line-of-site Level 3 Sensitivity Fluorescent Penetrant
method. It shall be used to quantify the demonstrated POD for Inspection), and pertinent comments from the inspector(s) and
a specific set of examination parameters and known range of test administrator.
discontinuity sizes under the following conditions.
5.1.1 The initial response from a nondestructive evaluation 6.3.1 In general, the results of an experiment apply to the
inspection system is ultimately binary in nature (that is, hit or conditions under which the experiment was conducted. Hence,
miss). the POD analysis results apply to the conditions under which
5.1.2 Discontinuity size is the predictor variable and can be the POD examination was conducted.
accurately quantified. 6.4 Prior to performing the analysis, the analyst shall
5.1.3 A relationship between discontinuity size and POD conduct a preliminary review of the POD examination proce-
exists and is best described by a generalized linear model with dure and resulting hit/miss data to identify any examination
the appropriate link function for binary outcomes. administration or data issues. The analyst shall identify and
5.2 This practice does not limit the use of a generalized attempt to resolve any issues prior to conducting the POD
linear model with more than one predictor variable or other analysis. Identified issues and their resolution shall be docu-
types of statistical models if justified as more appropriate for mented in the report. Examples of issues that could arise and
the hit/miss data. possible resolutions are outlined in the following subsections:
5.3 If the initial response from a nondestructive evaluation 6.4.1 If the examination procedure was poorly designed or
inspection system is measurable and can be classified as a executed, or both, the validity of the resulting data is question-
continuous variable (for example, data collected from an Eddy able. In this case, the examination procedure design and
Current inspection system), then Practice E3023 may be more execution should be reevaluated. For design guidelines see
appropriate. MIL-HDBK-1823A.
2
E2862 − 23
6.4.2 If the examination procedure was properly designed call data shall not be included in the development of the
but problems or interruptions occurred during the POD exami- generalized linear model.
nation that may bias the results, the POD examination should 6.8 The analyst shall conduct the analysis using software
be re-administered. that has generalized linear modeling capabilities.
6.4.3 Data that appear to be outlying (for example, an early
hit in the small size range or a late miss in the large size range) 6.9 After running the analysis, the analyst shall verify that
should be identified and investigated. convergence has been achieved. The resulting POD curve shall
6.4.3.1 If a discontinuity was missed because it was ob- not be used if convergence has not been achieved.
structed (such as a clogged discontinuity), the discontinuity 6.10 If included in the analysis software output, the analyst
shall be removed from the POD analysis since there was not an shall also assess the significance of the predictor variable in the
opportunity for the discontinuity to be found. model. In general, only significant variables are included in a
6.4.3.2 If a discontinuity is removed from the analysis, the regression model. (See X1.2.7.1 for details on assessing
specific discontinuity and rationale for removal shall be docu- significance.)
mented in the final report. 6.11 After verifying convergence and assessing the signifi-
6.4.4 POD cannot be modeled as a continuous function of cance of the predictor variable, the analyst shall use at a
discontinuity size if there is a complete separation of misses minimum the informal model diagnostic methods listed below
and hits as crack size increases. If a complete separation of to assess the reliability of the model and verify that the model
misses and hits is present in the data, the POD examination adequately fits the data.
may be re-administered. If this occurs, it shall be documented 6.11.1 If included in the analysis output, the analyst shall
in the report. If a complete separation of misses and hits occurs check the number of iterations it took to meet the convergence
on a regular basis, the specimen set should be examined for criterion. If more than twenty iterations were needed to reach
suitability as a POD examination specimen set. convergence, the model may not be reliable. A statement
6.4.5 POD cannot be modeled as a continuous function of indicating that convergence was achieved and the number of
discontinuity size if all the discontinuities are found or if all the iterations needed to achieve convergence shall be included in
discontinuities are missed. If this occurs, the specimen set is the report.
inadequate for the POD examination. 6.11.2 The analyst shall visually assess the shape of the
6.5 The analyst shall use a generalized linear model with the POD curve. (POD curves tend to be s-shaped.)
appropriate link function to establish the relationship between 6.11.3 The analyst shall visually assess how well the POD
POD and discontinuity size. For application to POD, the curve fits the data by comparing how well the range over which
generalized linear model with discontinuity size as the single the POD curve is rising matches the range over which misses
predictor variable is typically expressed as g(p) = b0 + b1•a or begin to overlap with and transition to hits as discontinuity size
g(p) = b0 + b1•ln(a), where a or ln(a) is the continuous increases.
predictor variable, b0 is the intercept, b1 is the slope, p is the 6.11.4 The analyst should also compare an empirical POD
probability of a response (that is, p=POD), and g is a function curve to the POD curve based on the generalized linear model.
(commonly referred to as the “link” function) that maps [0, 1] The empirical POD curve shall be used for validation purposes
onto the real number line. If predictor variables other than only. It shall not be used as a substitute for a POD curve
discontinuity size are quantifiable factors, a generalized linear resulting from a hit/miss analysis.
model with more than one predictor may be used. (For more 6.11.4.1 To create an empirical POD curve, divide the
detail on GLMs, see Appendix X3.) discontinuity sizes into bins. For example, (0.010 in.,
0.020 in.), (0.020 in., 0.030 in.), …, (0.100 in., 0.110 in.), etc.
6.6 The analyst shall choose the appropriate link function
((0.0254 cm, 0.0508 cm), (0.0508 cm, 0.0762 cm), …,
based on how well the model fits the observed data. MIL-
(0.2540 cm, 0.2794 cm), etc.). For each bin, calculate the total
HDBK-1823A discusses four different link functions (Logit,
number of discontinuities contained in the bin and how many
Probit, Log-Log, Complementary-LogLog) and describes
were detected. Calculate the empirical POD in each bin by
methods for selecting the appropriate one. In general, the logit
dividing the number detected in the bin over the total number
and probit link functions have worked well in practice for
of discontinuities in the bin. Plot the empirical POD versus the
modeling hit/miss data. (For more detail on the logit and probit
midpoint of the bin to obtain the empirical POD curve. Overlay
link functions, see Appendix X3.)
the POD curve based on the generalized linear model on the
6.6.1 In general, the appropriateness of a selected model is empirical POD curve to assess how well the generalized linear
determined by the significance of the predictor variable(s), how model fits the data by how well it matches the empirical POD
well the model fits the observed data, and how well the curve. For an example, see Table X2.2 and Fig. X2.4 in
underlying assumptions are met. Hence, model selection may Appendix X2.
be an iterative process as the appropriateness of the link
6.11.5 The analyst should assess the impact of data that
function, the significance of the predictor variable(s),
appear to be outlying observations (for example, an early hit in
goodness-of-fit, and other underlying assumptions are typically
the small size range or a late miss in the large size range) by
assessed after the model has been developed.
removing the outlying value from the data and re-running the
6.7 Only hit/miss data for induced discontinuities shall be analysis to assess its influence on the shape of the POD curve.
used in the development of the generalized linear model. False Both analysis results (with and without the outlying data) shall
3
E2862 − 23
be included in the report along with a discussion of the impact NOTE 1—Failure to document pertinent information about the specimen
to the POD curve. (See X2.1.7.5 for an example.) This set, examination design, examination execution, raw data, and analysis
method may be considered grounds for disputing the validity of the
assessment does not apply to outlying observations resulting results.
from an obstructed discontinuity which are removed from the
analysis per 6.4.3.1. 7.1.1 The specimen standard geometry (for example, flat
panels).
6.12 If a c % level of confidence is specified by the 7.1.2 The specimen standard material (for example, Nickel).
responsible engineer or the customer, the analyst shall put a 7.1.3 Examination date.
c % lower confidence bound on the POD curve. Methods for 7.1.4 Number of inspectors.
constructing a confidence bound can be found in MIL-HDBK- 7.1.5 Type of inspection method (for example, line-of-sight
1823A as well as statistics text books on generalized linear Level 3 Fluorescent Penetrant Inspection).
regression. 7.1.6 Any comments from the inspector(s) or test adminis-
6.12.1 The analyst shall visually assess the shape of the trator.
confidence bound on the POD curve. The confidence bound 7.1.7 The documented known induced discontinuity sizes.
should roughly follow the same shape as the POD curve. If the
7.1.8 Which discontinuities were found and which were
confidence bound flares out significantly on either or both ends
missed.
or intersects the x-axis, the confidence bound should be viewed
7.1.9 Any false calls.
as suspect and may not be reliable.
7.1.10 The selected link function.
6.12.2 The analyst should assess the impact of data that
7.1.11 The generalized linear model coefficients.
appear to be outlying observations by removing the outlying
7.1.12 The variance-covariance matrix (if included in the
value from the data and re-running the analysis to assess its
software output).
influence on the shape of the confidence bound (if applicable).
7.1.13 A statement indicating that convergence was
Both analysis results (with and without the outlying data) shall
achieved.
be included in the report along with a discussion of the impact
7.1.14 The number of iterations needed to achieve conver-
to the confidence bound (if applicable). This assessment may
gence (if included in the software output).
be done in conjunction with the assessment done on the POD
curve as described in 6.11.5. This assessment does not apply to 7.1.15 A plot of the resulting POD curve and confidence
outlying observations resulting from an obstructed discontinu- bound (if applicable).
ity which are removed from the analysis per 6.4.3.1. 7.1.16 Specific results of interest as required by the analysis
objective (for example, a90/95).
6.13 The analyst shall analyze any false call data and shall 7.1.17 A statement about the model diagnostic methods
report the false call rate at the 50 %, 90 %, and 95 % level of used and conclusions.
statistical confidence. Acceptable false call rates shall be 7.1.18 Any deviations from the POD examination proce-
determined by the responsible engineer or by the customer. dure or standard POD analysis.
6.13.1 The false call rate shall be defined as the number of 7.1.18.1 If the POD examination was re-administered, the
false calls divided by the number of opportunities in the original results and rationale for re-administration shall be
specimen set that do not contain a discontinuity. documented in the report.
6.13.2 What constitutes a false call shall be clearly defined 7.1.18.2 If a discontinuity is removed from the analysis, the
by the responsible engineer or by the customer. specific discontinuity and rationale for removal shall be docu-
6.13.3 What constitutes an opportunity in the specimen set mented in the final report.
that does not contain a discontinuity shall be clearly defined by 7.1.18.3 If the impact of outlying data was assessed, the
the responsible engineer or by the customer. results shall be included in the report along with an explana-
6.13.4 The Clopper-Pearson binomial method for construct- tion.
ing confidence intervals for proportions should be used to 7.1.19 Summary of false call analysis, including the follow-
calculate the false call rate at the 50 %, 90 % and 95 % level of ing.
statistical confidence. The Clopper-Pearson upper 100•(1-α)% 7.1.19.1 Definition of what constitutes a false call.
confidence bound for p is: 7.1.19.2 Definition of what constitutes an opportunity in the
21
H
P U 5 11
n2x
J
~ x11 ! ·F ~ 12α, 2x12, 2n22x !
specimen set that does not contain a discontinuity.
7.1.19.3 False call rate at the 50 %, 90 %, and 95 % level of
where F(1–α, 2x+2, 2n–2x) is the F-statistics with degrees of confidence.
freedom (2x+2, 2n–2x) and P[F < F(1–α, 2x+2, 2n–2x)]=1–α. 7.1.20 Name of analyst and company responsible for the
This method is consistent with that used in MIL-HDBK- POD calculation.
1823A.
7. Report 8. Keywords
7.1 At a minimum the following information about the POD 8.1 hit/miss analysis; penetrant POD; POD; POD analysis;
analysis shall be included in the report. Probability of Detection
4
E2862 − 23
ANNEX
(Mandatory Information)
A1. TERMINOLOGY
A1.1 Definitions: Poisson. The function relating the mean to the linear combi-
nation of independent variables is called the link function.
A1.1.1 a90—the discontinuity size that can be detected with
A1.1.6.2 Discussion—Generalized linear models are the
90 % probability.
basis for the hit/miss POD analysis method described in
A1.1.1.1 Discussion—The value for a90 resulting from a MIL-HDBK-1823A. See Appendix X3 for an overview of
POD analysis is a single point estimate of the true value based GLMs.
on the outcome of the POD examination. It represents the
typical value and does not account for variability due to A1.1.7 independent variable, n—a variable used to predict
sampling or inherent variability in the inspection system, another using an equation. Terminology E456, Practice
which is always present. E3080
A1.1.8 outlying observation, n—an extreme observation in
A1.1.2 a90/95—the discontinuity size that can be detected
either direction that appears to deviate markedly in value from
with 90 % probability with a statistical confidence level of
other members of the sample in which it appears. Practice
95 %.
E178, Terminology E456
A1.1.2.1 Discussion—The value for a90 resulting from a
POD analysis is an estimate of the true a90 based on the A1.1.9 regression, n—the process of estimating param-
outcome of the POD examination. If the examination were eter(s) of an equation using a set of data. Terminology E456,
repeated, the outcome is not expected to be exactly the same. Practice E3080
Hence the estimate of a90 will not be the same. To account for A1.1.10 sample, n—a group of observations or test results,
variability due to sampling, a statistical confidence bound with taken from a larger collection of observations or test results,
a 95 % level of confidence is applied to the estimated value for which serves to provide information that may be used as a basis
a90 resulting in an a90/95 value. POD is still 90 %. The 95 % for making a decision concerning the larger collection. Termi-
refers to the ability of the statistical method to capture (or nology E456, Practice E2586
bound) the true a90. That is, if the examination were repeated
A1.1.11 sample size, n—number of observed values in the
over and over under the same conditions, the value for a90/95
sample. Terminology E456, Practice E2586
will be larger than the true a90 95 % of the time. In practice the
POD examination will be conducted once. Using a 95 % A1.1.12 standard error, n—standard deviation of the popu-
confidence level implies a 95 % chance that the a90/95 value lation of values of a sample statistic in repeated sampling, or an
bounds the true a90 and a 5 % risk that the true a90 is actually estimate of it. Terminology E456, Practice E2586
larger than the a90/95 value. A1.1.12.1 Discussion—If the standard error of a statistic is
estimated, it will itself be a statistic with some variance that
A1.1.3 a90/50—the discontinuity size that can be detected depends on the sample size.
with 90 % probability with a statistical confidence level of
50 %. A1.1.13 statistical confidence, n—the long run frequency
associated with the ability of the statistical method to capture
A1.1.3.1 Discussion—Using a one-sided 50 % confidence
the true value of the parameter of interest.
bound implies a 50 % chance that the a90/50 value bounds the
A1.1.13.1 Discussion—Statistical confidence is a probabil-
true a90 and a 50 % risk that the true a90 is actually larger than
ity statement about the statistical method used to estimate a
the a90/50 value. Given this, a90/50 is really the same as a90.
parameter of interest—for example, the probability that the
A1.1.4 binary response, n—a response variable with only statistical method has captured the true capability of the
two possible outcomes. inspection system. The opposite of statistical confidence can be
A1.1.4.1 Discussion—The response from a POD examina- equated to risk. For example, a statistical confidence level of
tion on a manual fluorescent penetrant inspection system, for 95 % implies a willingness to accept a 5 % risk of the statistical
example, is binary. The discontinuity is either found or it is method yielding incorrect results—for example, there is a 5 %
missed. risk that the wrong conclusion has been drawn about the
capability of the inspection system.
A1.1.5 dependent variable, n—a variable to be predicted
using an equation. Terminology E456, Practice E3080 A1.1.14 statistical confidence bound—a one-sided or two-
sided bound around a single point estimate representing the
A1.1.6 generalized linear model (GLM), n—a model for a variability due to sampling.
response variable whose distribution is a member of an A1.1.14.1 Discussion—According to the formula in MIL-
exponential family where the mean response is predicted by a HDBK-1823A, ap/c is a one-sided upper confidence bound on
function of a linear combination of independent variables. ap. ap/c represents how large the true ap could be given the
A1.1.6.1 Discussion—The exponential family of distribu- statistical uncertainty associated with limited sample data. In
tions includes, for example, normal, binomial, gamma, and general, a confidence bound is a function of the amount of data,
5
E2862 − 23
the scatter in the data, and the specified level of confidence. inherent process variability. In order to capture inherent pro-
When the sample size increases, statistical uncertainty de- cess variability, a tolerance bound should be used. As opposed
creases (all else held constant). That is, given an infinite to a confidence bound, a tolerance bound will always differ
amount of data (for example, an infinite number of flaw sizes from the point estimate because process variability cannot be
adequately distributed across a POD specimen set), ap/c will eliminated by increasing the sample size.
approach ap because the statistical uncertainty goes away. It is A1.1.14.2 Discussion—The term “statistical confidence
important to note that a statistical confidence bound on ap only bound” in this practice is equivalent to the term “confidence
accounts for variability due to sampling. It does not account for interval” in Terminology E456 and Practice E2586.
APPENDIXES
(Nonmandatory Information)
X1.1 Fig. X1.1 shows a flowchart of POD Analysis for and hits when the discontinuity sizes are sorted in ascending
hit/miss data. order, then the convergence criteria will not be met. If the
responses are all misses or all hits, then the convergence
X1.2 Additional commentary on the POD analysis process criteria will not be met.
as illustrated in Fig. X1.1 and its significance. X1.2.3.3 Examples of examination procedure or data issues,
X1.2.1 Define POD Analysis Objective—In general, the or both, and possible resolutions can be found in 6.4.
objective of a POD analysis is to determine the relationship X1.2.4 Select Model:
between discontinuity size and POD. Based on the established X1.2.4.1 Generalized linear models (GLMs) are the tradi-
relationship, the objective may be to determine the discontinu- tional statistical models used to describe the relationship
ity size that can be detected with a given probability p and between continuous variables (such as discontinuity size) and
specified statistical confidence level c, denoted ap/c. It is binary outcomes (such as hit or miss). For binary outcomes, the
important for the analyst to have a clear understanding of the form of a generalized linear model with a single predictor
specific analysis objective prior to performing the analysis. variable is g(p) = b0 + b1•x, where x is the continuous predictor
X1.2.2 Obtain POD Demonstration Test Data and Exami- variable, b0 is the intercept, b1 is the slope, p is the probability
nation Specifics—In general, the results of an experiment apply of a response (that is, p=POD), and g is a function (commonly
to the conditions under which the experiment was conducted. If referred to as the “link” function) that maps [0, 1] onto the real
the examination procedure was poorly designed or executed, or number line. This model is the basis for the hit/miss analysis
both, the validity of the resulting data is questionable. method as described in MIL-HDBK-1823A. In general, a
generalized linear model is the appropriate statistical model for
X1.2.3 Conduct Preliminary Review of Examination Proce- relating hit/miss data and flaw size since it restricts POD
dure and Data: predictions to be between 0 and 1. (For more detail on GLMs,
X1.2.3.1 If an experiment is not properly designed and see Appendix X3.)
executed, the data collected are subject to question and likely X1.2.4.2 In general, the appropriateness of a selected model
invalid. Invalid data cannot be corrected through a statistical is determined by the significance of the predictor variable(s),
analysis. Hence, any results from a statistical analysis of how well the model fits the observed data, and how well the
invalid data will be invalid as well. underlying assumptions are met. Hence, model selection may
X1.2.3.2 POD cannot be modeled as a continuous function be an iterative process as the appropriateness of the link
of discontinuity size if there is a complete separation of misses function, the significance of the predictor variable(s),
and hits as crack size increases or if the responses are all misses goodness-of-fit, and other underlying assumptions are typically
or all hits. The model coefficients do not have a closed form assessed after the model has been developed.
solution. As such, an iterative numerical procedure is required X1.2.4.3 Note that there can be one or more predictor
to solve the system of equations from which the estimates of variables in a generalized linear model. However, for POD
the model coefficients are derived. The procedure iterates until applications there is often only a single predictor variable—
a convergence criterion is met, at which point estimates of the discontinuity size or a function of discontinuity size (such as
model coefficients are obtained from the last iteration. The the natural log) since that is typically the only known physical
analysis results are not valid unless the convergence criterion is characteristic of the discontinuity. This practice does not limit
met. Even if the analysis software outputs model information, the use of a generalized linear model with more than one
the results shall not be used if the convergence criterion has not predictor variable or other types of statistical models if justified
been met. Prior to performing the analysis, a preliminary as more appropriate for the hit/miss data.
review of the hit/miss data resulting from the POD examination X1.2.4.4 In general, only uncorrelated and significant pre-
can reveal whether or not failure to meet the convergence dictor variables are included in a regression model. If more
criteria may be an issue. If there is no overlap between misses than one continuous predictor variable is being considered for
6
E2862 − 23
be assessed for significance. (See X1.2.7.1 for details on
assessing significance.)
X1.2.4.5 Other methods exist for determining the demon-
strated POD for hit/miss data. One example is The Point
Estimate Method (PEM), also referred to as the “29 out of 29”
method, which is used in practice to quantify the demonstrated
POD for a specific set of examination parameters and a single
target discontinuity size. Because the PEM is focused on
hit/miss data generated from specimens with multiple discon-
tinuities representing a single target size versus a range of
sizes, the analysis is based on an entirely different statistical
method and does not result in a functional relationship between
POD and discontinuity size. The PEM is used to quantify the
minimum probability p with a statistical confidence level c of
detecting the target discontinuity size. In contrast, the method
described in this practice is used to estimate the relationship
between POD and discontinuity size for the purpose of
quantifying the discontinuity size that can be detected with a
given probability p with a statistical confidence level of c.
Given the specific analysis objective and an appropriately
designed POD study, it is ultimately the analyst’s responsibility
to (1) select the appropriate statistical method for the data and
(2) verify that all underlying assumptions associated with the
selected method hold.
X1.2.5 Perform Analysis using Appropriate Software:
X1.2.5.1 POD-specific software or statistical software is
commonly used to perform an analysis on hit/miss data in order
to establish a functional relationship between POD and discon-
tinuity size. Though the software performs the complex
calculations, it does not check the validity of analysis inputs or
outputs. The analyst is responsible for ensuring that the
analysis inputs (for example, data, model formulation) are
correctly specified and that the underlying model assumptions
hold. Treating the software as a “black box” can lead to
seriously misleading conclusions about the inspection capabil-
ity of the system. Hence, it is critical that the analyst have a
basic understanding of the complete analysis process, includ-
ing the underlying statistical methods and techniques for
validating the results.
X1.2.5.2 Prior to performing the POD analysis, the analyst
shall format the data as required by the software used to
conduct the analysis. For example, a hit is typically coded as a
1 and a miss is typically coded as a 0. For some software the
analyst may also be required to perform a transformation of the
predictor variable prior to running the analysis. For example,
the natural log of discontinuity size is often used as the
predictor variable since it forces the POD curve to pass through
the origin, which is interpreted as zero POD for a discontinuity
FIG. X1.1 Flowchart of POD Analysis for Hit/Miss Data
of size 0. If the natural log of discontinuity size is used as the
predictor variable, then the analyst may need to create a new
variable column for the natural log of discontinuity size prior
to running the analysis.
inclusion in the model, a preliminary graphical analysis of all
possible pairings of continuous predictor variables shall be X1.2.6 Verify that Convergence has been Achieved—The
performed to verify independence of the predictor variables. procedure states that if more than twenty iterations were
When plotted against each other, there should be no apparent needed to reach convergence, the model may not be reliable.
relationship between any two continuous predictor variables. This criterion was selected to be consistent with several well
After the analysis is performed, all predictor variables should known software packages. The criterion of twenty is used in
7
E2862 − 23
Minitab® statistical software and PODv3.4,5mh1823 (the com- X1.2.8 Construct a c% Lower Confidence Bound on the
panion software to MIL-HDBK-1823A) uses a criterion of POD Curve (if applicable)—See definitions and discussions of
twenty-five. statistical confidence and statistical confidence bound in An-
X1.2.7 Assess Model Reliability and Adequacy using Infor- nex A1. Methods for constructing a confidence bound can be
mal Methods (use formal methods as well if means for doing so found in MIL-HDBK-1823A as well as statistics text books on
are available): generalized linear regression.
X1.2.7.1 In general, only significant variables are included X1.2.8.1 Calculating ap/c becomes more complicated when
in a regression model. The significance of each predictor more than one predictor variable is included in the model.
variable is assessed after a model is selected and the analysis is Hence, it is recommended to consult with a statistician if other
performed. The results of a test for significance are typically predictor variables in addition to discontinuity size are in-
reported in the analysis output. One common criterion used to cluded in the model.
judge significance is the p-value. A p-value less than or equal X1.2.9 Perform False Call Analysis at 50 %, 90 %, and
to 0.05 implies evidence that the predictor variable is a 95 % Confidence—The method for constructing a confidence
statically significant contributor to the model. More interval for a binomial proportion proposed by Clopper and
specifically, the hypothesis being tested to assess the signifi- Pearson (1934) is an exact method for estimating confidence
cance of discontinuity size on POD is Ho: β1=0 versus Ha: bounds for a proportion p based on x “failures” in a sample size
β1≠0. When the p-value is greater than or equal to 0.05, there of n where p is estimated as x/n. (This method is also discussed
is no evidence to suggest that β1 is any different from 0, in Footnote 10 under the name “The Conservative Method.”10)
meaning POD is independent of discontinuity size. The upper 100•(1–α)% confidence bound for p is:
X1.2.7.2 A visual assessment of the POD curve is a useful 21
4
Minitab® is a registered trademark of Minitab, LLC, 1829 Pine Hall Rd, State
X1.2.9.3 95 % confidence:
College, PA 16801, United States. 21 21
5
The sole source of supply of the apparatus known to the committee at this time
is Minitab, LLC. If you are aware of alternative suppliers, please provide this
H
P U 5 11
150 2 0
~ 011 ! ·F ~ 0.95, 2· ~ 0 ! 12, 2· ~ 150! 22· ~ 0 !! J H 5 11
150
3.026 J
information to ASTM International Headquarters. Your comments will receive 5 0.0198
careful consideration at a meeting of the responsible technical committee,1 which
you may attend. X1.2.10 Document the Results in a Report—The report
6
McCullagh, P. and Nelder, J. A., Generalized Linear Models, 2nd Ed., Chapman should contain enough information such that the analysis
& Hall/CRC, 1989. results may be reproduced.
7
Agresti, A. Categorical Data Analysis, 2nd Ed., John Wiley & Sons, Inc., 2002.
8
Lloyd, C. J., Statistical Analysis of Categorical Data, John Wiley & Sons, Inc.,
1999.
9 10
Neter, Kutner, Nachtsheim, and Wasserman, Applied Linear Statistical Models, Meeker, Hahn, and Escobar, Statistical Intervals: A Guide for Practitioners
4th Ed., The McGraw-Hill Companies, Inc., 1996. and Researchers, 2nd Ed., John Wiley & Sons, Inc., 2017, p. 103–104.
8
E2862 − 23
X2.1 The following is an example POD analysis using 6.4.5). The plot is shown in Fig. X2.1. This plot was also used
simulated data. The layout of the example follows the flow- to identify possible outlying observations (6.4.3). Discontinu-
chart in Fig. X1.1. Some suggested verification techniques ity ID 26 (annotated in Fig. X2.1) may be outlying. (Note:
(Section 6) and most of the information required in the report Formal statistical methods for assessing the statistical signifi-
(Section 7) are identified by section number in parentheses. cance of an outlying observation are also available.) An
X2.1.1 Define POD Analysis Objective—For this example, investigation yielded no rational for removal. It will be retained
the objective is to estimate the discontinuity size that can be in the analysis, but its influence on the POD curve will be
detected with 90 % probability and 95 % confidence, that is, evaluated.
a90/95.
X2.1.4 Format Data as Required—mh1823 POD software
X2.1.2 Obtain POD Demonstration Test Data and Exami- was used to perform the analysis. Data was formatted per
nation Specifics—The following conditions are assumed for software requirements.
this example.
X2.1.2.1 Specimen standard geometry (7.1.1): flat panels X2.1.5 Establish Relationship between POD and Disconti-
X2.1.2.2 Specimen standard material (7.1.2): Nickel nuity Size—A generalized linear model with a logit link
X2.1.2.3 Examination date (7.1.3): 9/25/2016 function was fit to the data (7.1.10). The estimated model is
X2.1.2.4 Number of inspectors (7.1.4): 1
X2.1.2.5 Type of inspection method (7.1.5): line-of-sight
lnS D p
12p
519.27017.255·ln~ a ! where p=POD, ln
p
12p S D is the
Level 3 Fluorescent Penetrant Inspection logit link function, a is discontinuity size, and 19.270 and
X2.1.2.6 The known induced discontinuity sizes, which 7.255 are the model coefficients (7.1.11). Written in terms of
were found, and which were missed are shown in Table X2.1 exp~ 19.270 1 7.255 · ln ~ a !!
POD, p5 11exp 19.270 1 7.255 · ln a .
(7.1.7, 7.1.8). ~ ~ !!
X2.1.2.7 In this example, there were no false calls out of X2.1.6 Verify that Convergence has been Achieved—The
150 opportunities (7.1.9). software output is shown in Fig. X2.2. There is no warning
X2.1.3 Conduct Preliminary Review of Examination Proce- message stating that convergence was not achieved (7.1.13).
dure and Data: The software output also includes the variance-covariance
X2.1.3.1 According to the inspector and test administrator, matrix (7.1.12). Output from other software used to fit the
no issues occurred during the examination (7.1.6). generalized linear model may differ, but estimates of model
X2.1.3.2 The data was plotted in Excel to verify overlap coefficients, standard errors, and model diagnostic measures
between misses and hits as discontinuity size increases (6.4.4, (assuming the same methods are used) should be the same.
9
E2862 − 23
FIG. X2.1 Plot of Example Hit/Miss Data Showing Overlap Between Misses and Hits
X2.1.7 Assess Model Reliability and Adequacy using Infor- and re-running the analysis (7.1.18.3). Fig. X2.5 shows the
mal Methods (use formal methods as well if means for doing so POD curve (with 95 % lower confidence bound) when Discon-
are available)—The following informal and formal assess- tinuity ID 26 is excluded from the analysis. This influence of
ments were done (7.1.17). this data point on the POD curve, confidence bound, and
X2.1.7.1 (Formal) The statistical significance of discontinu- corresponding results of interest (that is, a90/95) was considered
ity size as a predictor variable was verified by evaluating the to be practically insignificant when compared to Fig. X2.3.
p-value (6.10). Since the p-value is less than 0.05 (see Fig.
NOTE X2.1—Formal statistical methods for assessing influential obser-
X2.2), discontinuity is considered a significant predictor vari- vations are also available.
able.
X2.1.7.2 (Informal) The software output shown in Fig. X2.2 X2.1.8 Construct a c % Lower Confidence Bound around
indicates that convergence was achieved after eight iterations the Model Fit—Since the objective is to estimate the disconti-
(7.1.14). nuity size that can be detected with 90 % probability and 95 %
X2.1.7.3 (Informal) The POD curve (with a 95 % lower confidence, a 95 % lower confidence bound was constructed
confidence bound) produced by the software is shown in Fig. and is shown in Fig. X2.3 (7.1.15).
X2.3 (7.1.15). The POD curve is S-shaped (6.11.2). The X2.1.8.1 The confidence bound roughly follows the shape
discontinuity size range over which the POD curve rises of the POD curve (6.12.1).
appears to correspond with the overlap between misses and hits X2.1.8.2 The impact of the possible outlying observation on
(6.11.3). the confidence bound was assessed in conjunction with the
X2.1.7.4 (Informal) The empirical POD was calculated in assessment on the POD curve (7.1.18.3).
Excel and is shown in Table X2.2 (6.11.4). A plot of the X2.1.9 Perform False Call Analysis—There were no false
empirical POD against the POD curve was done in Excel and calls out of 150 opportunities. The Clopper-Pearson binomial
is shown in Fig. X2.4 (6.11.4). The POD curve appears to agree method for constructing confidence intervals for proportions
with the empirical POD. was used to calculate the false call rate at the 50 %, 90 %, and
X2.1.7.5 (Informal) The impact of Discontinuity ID 26 on 95 % level of statistical confidence. See calculations in
the analysis results was assessed by excluding that data point X1.2.9.1, X1.2.9.2, and X1.2.9.3 respectively (7.1.19.3).
10
E2862 − 23
11
E2862 − 23
FIG. X2.5 Resulting POD Curve with 95 % Lower Confidence Bound when Discontinuity ID 26 is Removed
X3.1 The purpose of Appendix X3 is to serve as an function g(yi)=yi, the linear component is β0+β1•xi, and the
overview of generalized linear models (GLMs) when the random component denoted by the error term ɛi follows a
response variable is binary (that is, 0 or 1). The scope is limited normal distribution. Since the link function g can be any
to the logit and probit models with a single continuous monotonic differentiable function (making a non-linear func-
predictor variable, which fall in the class of binomial regres- tion possible as illustrated, for example, by the logit and probit
sion models. Appendix X3 relies heavily on well-known and link functions) and the distribution of the error term can be
well-respected sources for GLMs and binomial regression extended to any member of the exponential family (not just the
models. When a reference is cited, the reader is encouraged to normal distribution), GLMs are considered to be an extension
reference the source directly for more detailed explanations. of classical linear regression models (McCullagh et al., 1989).6
X3.1.1 Introduction to GLMs and Binomial Regression As with classical linear regression models, the assessment of
Models: model adequacy includes verifying the assumptions about the
X3.1.1.1 GLMs are a broad class of models that include the error term.
classical linear regression models as well as some non-linear X3.1.1.2 Binomial regression models, which belong to the
regression models. The term is attributed to John Nelder and family of GLMs, are used to describe the relationship between
Robert Wedderburn who showed how seemingly diverse sta-
continuous (or categorical) variables and binary outcomes
tistical models could be unified (McCullagh et al., 1989).6 In
(Lloyd, 1999).8 Suppose Y is a binomial random variable. That
general, the theoretical unifying form of a GLM can be broken
is, Y represents the number of “successful” responses out of n
down into three components: link function, linear component,
and random component. The link function is a monotonic independent trials where p is the probability of a “successful”
differentiable function, g, of the response variable that relates response on a single trial. p is estimated as Y/n, the number of
the linear component to the expected value of the response. The observed “successes” over the total number of trials. Now
linear component is a linear combination of parameters similar suppose we have a set of k independent binomial random
to a regression equation. The random component specifies the variables Y1, ..., Yk from n1, ..., nk possible trials. A simple
distribution which is a member of the exponential family (for binomial regression model, for example, relates the expected
example, normal binomial, gamma, Poisson). GLMs encom- value of Yi/ni, denoted by pi, to a continuous independent
pass the most important and widely used regression models, variable xi. For example, pi = G(β0+β1•xi) where G is non-
including classical linear regression models (Agresti, 2002).7 linear since probabilities are restricted between 0 and 1. If g =
Consider the theoretical form of a simple linear model for G–1, then g(pi)=β0+β1•xi where g is the link function that maps
example: yi = β0+β1•xi+ɛi. The link function is the identity [0, 1] onto the real number line and β0+β1•xi is the linear
12
E2862 − 23
component. For binomial regression models the random com- known as the normit link function.) Hence, the theoretical form
ponent ɛi follows a binomial distribution. of the probit model with a single predictor variable is
X3.1.1.3 Two common link functions for binomial regres- Φ–1(pi)=β0+β1•xi. When expressed in terms of pi, the result is
sion models, which have been shown to work well for 1 µ
2
13
E2862 − 23
ASTM International takes no position respecting the validity of any patent rights asserted in connection with any item mentioned
in this standard. Users of this standard are expressly advised that determination of the validity of any such patent rights, and the risk
of infringement of such rights, are entirely their own responsibility.
This standard is subject to revision at any time by the responsible technical committee and must be reviewed every five years and
if not revised, either reapproved or withdrawn. Your comments are invited either for revision of this standard or for additional standards
and should be addressed to ASTM International Headquarters. Your comments will receive careful consideration at a meeting of the
responsible technical committee, which you may attend. If you feel that your comments have not received a fair hearing you should
make your views known to the ASTM Committee on Standards, at the address shown below.
This standard is copyrighted by ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959,
United States. Individual reprints (single or multiple copies) of this standard may be obtained by contacting ASTM at the above
address or at 610-832-9585 (phone), 610-832-9555 (fax), or service@astm.org (e-mail); or through the ASTM website
(www.astm.org). Permission rights to photocopy the standard may also be secured from the Copyright Clearance Center, 222
Rosewood Drive, Danvers, MA 01923, Tel: (978) 646-2600; http://www.copyright.com/
14