Khan 2015
Khan 2015
Khan
Understanding Statistics
for Quality by Design
A TECHNICAL DOCUMENT SERIES
i
Part I
SPC is primarily a method for monitoring process performance. C pk Sigma level Yield Fallout
Many engineers believe that C pk can be used to quantify product
0.33 1 68.27% 317311
quality. This is simply untrue. While C pk can be used to calculate 0.67 2 95.45% 45500
process fallout (Table 1), the decision to accept or reject a production 1.00 3 99.73% 2700
1.33 4 99.99% 63
lot of items must be made by acceptance sampling. Sampling plans 1.67 5 99.9999% 1
can be derived using a variety of statistical techniques but are com- 2.00 6 99.9999998% 0.002
monly chosen by consulting tables outlined in ANSI/ASQ Z1.4 (for Table 1: Relationship between C pk and
non-conforming items (measured in
attribute data) or ANSI/ASQ Z1.9 (for variables data). PPM).
As mentioned above, the main goal of capability analysis is to help
reduce variability in the manufacturing process. Higher capability
Situation Minimum Capabilty
indices generally correspond to higher profits as they imply fewer
non-conforming parts and better customer satisfaction. Table 2 con- Existing Process
Regular 1.33
tains commonly used minimum values for a variety of processes. Critical 1.50
Finally, it is important to understand that C pk does not give us the New Process
Regular 1.50
whole picture. One of the disadvantages of C pk is that it does not Critical 1.67
take into consideration the target or nominal specification. Figure 1 Six Sigma Process 2.00
illustrates how the same C pk value can describe two very different Table 2: Recommended capability
values for two-sided specifications.
processes. For this reason, it is good practice not to base decisions
solely on the numerical value of a statistic, but also to graphically
visualize the data. Another way to address this difficulty is to use a
process capability index that is a better indicator of centering, such as
C pm or C pkm .
While there are numerous process capability indices, the two that
are most commonly used in industry are C p and C pk . These random
variables are estimated with the following equations (note the use of
the hat to denote the estimate):
U S L LS L
Ĉ p =
6 ŝ
" #
U S L µ̂ µ̂ LS L
Ĉ pk = min ,
3 ŝ 3 ŝ
Verifying assumptions
1. The individual data must be normally distributed. Normality Data that deviates from normality can
can be verified by visually inspecting a Q-Q plot or by using the sometimes be transformed to behave
better. In practice, one should instead
Anderson-Darling or Shapiro-Wilk tests. determine the cause for non-normality
if the data is expected to be normal (e.g.
2. The individual data must be independent (a particular observation dimensional data).
Xt cannot depend on a previous observation Xt 1 ). Independence
can be assumed if a plot of the data against the order it was col-
lected displays no obvious pattern. One can also use the Durbin-
Watson test for autocorrelation.
3. The process must be under statistical control, which is verified
using Shewart control charts. All data points (or subgroup aver-
ages) must fall in between the calculated control limits (not to be
confused with customer determined specification limits).
These values assume that the data were collected individually. When
rational subgrouping is employed, the required minimum value
of Ĉ pk will be less than what is tabulated. The exact calculation is
beyond the scope of this document, and the reader is referred to
Scholz and Vangel (1998) for more details.
C p and C pk can be extremely useful when used as part of a more This section is largely adapted from
comprehensive capability plan. However, these process capability Kotz and Lovelace (1998).
Interval estimation
USL LSL
Ĉ p =
6ŝ
and
USL µ̂ µ̂ LSL
Ĉ pk = min , .
3ŝ 3ŝ
Due to the variability involved in sampling, this is not the most ac-
curate method for quantifying capability. When a process is exactly
capable, for example, there is a 50% chance that the estimate will be
below the minimum value.2 2
Khan OA (June 19, 2015). A Practical
A better estimate can be obtained by calculating the 100(1 a)% Guide to Utilizing C p and C pk . West
Pharmaceutical Services Technical
confidence interval of the process capability index. Here, a is the Document.
probability of type I error one is willing to tolerate. The most com-
mon choice for a is 0.05, resulting in a 95% confidence interval. This Note that it is not entirely correct to
say that there is a 95% chance that the
is interpreted as follows: if repeated samples are taken and the 95%
population parameter lies within the
confidence interval is computed for each sample, 95% of the intervals interval.
will contain the true population parameter. Higher confidence levels
correspond to wider intervals.
Heavlin (1988)
v
u
u
t n 1 Ĉ2pk ✓ 6
◆
C pk = Ĉ pk ± Z1 a/2 + 1+ (2)
9n(n 3) 2(n 3) n 1
Bissell (1990)
v
u
u 1
t Ĉ2pk
C pk = Ĉ pk ± Z1 a/2 + (3)
9n 2(n 1)
Kushler-Hurley (1992)
!
Z
C pk = Ĉ pk 1 ± p 1 a/2 (4)
2( n 1)
Minitab
This formula, used in Minitab 16 and 17, is unique in that it takes The source for this equation is not clear.
batch effects into consideration: It seems to be a modification of the for-
mula proposed by Bissell (Equation 3).
v Minitab’s technical support is trying to
u find a proper citation. This document
u
t 1 Ĉ2pk
C pk = Ĉ pk ± Z1 + (5) will be updated when more information
a/2
N + (m/2)2 2n is available.
capable 100(1 a)% of the time). The second option also produces a
plot of the inverse function and a searchable table of values.
With this model, the variance of any observation is V (yijk ) = The experiment can easily be extended
sP2 + sO2 + sPO
2 + s2 = s2 + s2 to study different measurement systems
P Gauge . The gauge variability can be by adding an M` term and its two-way
decomposed into the repeatability variance component (s2 ) and and three-way interactions with Pi and
the gauge reproducibility (sO2 + s2 ). It is common to compare the Oj .
PO
estimate of gauge capability to the width of the specifications or the
tolerance band for the part that is being measured. This is called the
precision-to-tolerance (P/T) ratio:
kŝGauge
P/T = (2)
USL LSL
In Equation 2, popular choices for the constant k are k = 5.15 and
k = 6. The value k = 5.15 corresponds to the limiting value of the
number of standard deviations between bounds of a 95% tolerance
interval that contains at least 99% of a normal population, and k =
6 corresponds to the number of standard deviations between the
usual natural tolerance limits of a normal population. Values of the
estimated ratio P/T of 0.1 or less often are taken to imply adequate
gauge capability (Montgomery, 2009).
statistical procedures for inter-rater reliability 12
Cohen’s kappa
where f i+ is the total for the ith row and f+i is the total for the ith Viera & Garrett (2005) suggest the
column. The kappa statistic is: following descriptive scale for values of
Cohen’s kappa statistic:
p( a) p(e) Value of k Strength of agreement
k= (7) <0 Less than chance
1 p(e)
0.01 - 0.20 Slight
0.21 - 0.40 Fair
Cohen’s kappa is generally between 0 and 1, however negative
0.41 - 0.60 Moderate
values are possible when there is less than chance agreement. For 0.61 - 0.80 Substantial
ordinal data and partial scoring, it is possible to use a weighted form 0.81 - 0.99 Almost perfect
of kappa (Cohen, 1968). When there are more than two categorical
variables being compared, one can use Fleiss’s kappa.
Bland-Altman plot
r
(n1 1)s2X +(n2 1)s2X
where s X1 X2 = 1
n1 + n2 2
2
and the t-statistic has n =
n1 + n2 2 degrees of freedom
X X2
t = r1 (3)
s21 s22
n1 + n2
Paired samples
When the same set of samples is used in both groups, we can do the
paired t-test to get more power. The t-statistic is calculated as:
XD
t= p . (5)
sD / n
For this equation, the differences between all pairs must be calcu-
lated. The average (X D ) and standard deviation (s D ) of those dif-
ferences are used in the equation. The degrees of freedom for the
hypothesis test are calculated as n = n 1.
In some cases, it is acceptable to conclude equivalence if the differ- Minitab 17 has the functionality to
ence of the two means falls between an upper and lower bound. The do TOST under the menu heading
"Equivalence Tests."
null hypotheses for non-equivalence are:
H1 : dL < µ1 µ2 < dU
(X2 X1 ) dL
tL = (6)
SE
(X2 X1 ) dU
tU = (7)
SE
where the standard error is:
v
u n
u  1 X1i X 1 2 + Ân2 X2j X2
2 ✓ ◆
t i =1 j =1 1 1
SE = + (8)
n1 + n2 2 n1 n2
The theory and procedure of the ANOVA are beyond the scope of
this document. The reader is encouraged to look at any introductory
statistics book for a discussion on this versatile test. It is relatively The Kruskal-Wallis one-way analysis
robust to small deviations from normality, however the assumption of variance is the analogous nonpara-
metric test for testing whether three
of homoscedasticity (equal variance) must be satisfied. The ANOVA or more samples come from the same
cannot be used to determine which means are different if the null population. It does not require that the
samples be normally distributed.
hypothesis is rejected. Post-hoc testing procedures are therefore
necessary and are described in the next section.
Multiple comparisons
Bonferroni correction
H0 : µi = µ j vs. H1 : µi 6= µ j
Xi Xj
qobs = (9)
SE
r⇣ ⌘⇣ ⌘
MSE 1 1
where SE = 2 ni + nj is the standard error. The critical
value qcritical = qa,k,N k can be found in a table of values and is used
to make a decision for each pairwise comparison:
8
<if |q | < q
obs critical then do not reject H0
:if |qobs | qcritical then reject H0
Dunnett’s test
p
and ncontrol = n k for the control group. We are interested in testing For example, if we choose a total
the hypothesis: sample size of N = 60 with k = 4
treatments, then each
p treatment should
p
have n = N/(k + k ) = 60/(4 + 4) =
H0 : µi = µcontrol vs. H1 : µi 6= µcontrol 10 observations and
p the control should
p
have ncontrol = n k = 10 4 = 20
The test statistic is calculated as: observations.
X control X i
qobs = (10)
SE
r ⇣ ⌘
1
where SE = MSE ncontrol + n1 is the standard error. The critical
i
value qcritical = qa,k+1,N k+1 can be found in a table of values and This is not the same qcritical as the one
is used to make a decision for each pairwise comparison with the for Tukey’s HSD test.
control:
8
<if |q | < q
obs critical then do not reject H0
:if |qobs | qcritical then reject H0
Scheffé’s test
This is the most flexible multiple testing procedure as it allows for Some of the possible null hypotheses
comparing any number of possible contrasts. If only pairwise com- for Scheffé’s test include:
 ci X i
Sobs = (11)
SE
s ✓ ◆
c2
where SE = MSE Â ni is the standard error. The critical value
i
p
Scritical = (k 1) Fa,k 1,N k is calculated and used to make a deci-
sion:
8
<if S
obs < Scritical then do not reject H0
:if Sobs Scritical then reject H0