UOP 999-97 Precission Statements in UOP Methods
UOP 999-97 Precission Statements in UOP Methods
SCOPE
This method is for developing precision statements as reported in UOP methods and relative bias of the
same method between different laboratories. The calculation of precision, in terms of repeatability (within a
laboratory), reproducibility (between laboratories) and relative bias (between laboratories) is described.
Precision statements in methods having a 98 or later suffix were developed by the procedure described;
methods having an 88 through 97 suffix used UOP 888-88, while methods with suffixes earlier than 88 used
UOP 666-82.
UOP 999 is the minimum guideline for estimating the initial precision for new or revised methods. Once
a new or revised method is in use, a reference sample, if available, should be run on a regular basis. When
15 analyses have been completed, individual and range control charts are to be used to estimate the long-
term UOP repeatability and are to be maintained on an ongoing basis. The example calculations in UOP
999-97 are made using Minitab®, a statistical software package. The details of the calculations are in UOP
999-97 Supplement, available by request from UOP method coordinator.
Using the specified UOP test method, at least 16 tests are performed at a given laboratory or laboratories
on the same representative sample, or on samples at multiple concentrations, using one of the nested
sampling designs shown in the PROCEDURE section. Exceptions to nested sampling design can be made at
the discretion of the method coordinator. In each laboratory, the analysis is performed by at least two
different analysts on each of two separate days, each analyst performing two tests per day. The estimated
within-laboratory standard deviation (esd) and the estimated between-laboratory standard deviation (esd)
are calculated using a stepwise nested analysis of variance (ANOVA) procedure (see Appendix 2 of the
SUPPLEMENT to this document).
UOP Methods are available through ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken PA 19428-2959,
United States. The Methods may be obtained through the ASTM website, www.astm.org, or by contacting Customer Service at
service@astm.org, 610.832.9555 FAX, or 610.832.9585 PHONE.
2 of 18
The stepwise ANOVA procedure for the nested design is used to estimate the within-day test esd, day-to-
day esd, analyst-to-analyst esd and lab-to-lab esd. The estimated components of variability from the
stepwise ANOVA are used to calculate UOP repeatability, the ASTM repeatability, UOP reproducibility,
and relative bias as outlined in Table 1.
ASTM Repeatability is the expected maximum difference between two tests by the same analyst on the
same day in one laboratory at the 95% confidence level. ASTM Repeatability is calculated using the
Within-Day esd. UOP Repeatability is the expected maximum difference between two tests by different
analysts on different days in one laboratory at the 95% confidence level and is calculated using the Within-
Laboratory esd. Reproducibility is the expected maximum difference between two tests by different analysts
on different days in the different laboratories at the 95% confidence level and is calculated using the Within-
Laboratory esd plus the Between-Laboratory esd.
When all the tests are conducted in only one laboratory, then the UOP and the ASTM repeatability may
be calculated, but reproducibility cannot. For this case applicable nested designs are shown in the
PROCEDURE section. When the testing is conducted across two or more laboratories, then the UOP
repeatability, the ASTM repeatability, reproducibility and relative bias can be calculated. For two or three
laboratories, the applicable nested designs are found in the PROCEDURE section. Table 2 in the
PROCEDURE section summarizes standard nested designs and their attributes with cross-reference to the
design layout. If precision data is received from one or two laboratories, the repeatability is the only
precision information reported.
Table 1
Precision Statement Definitions
(The description of the terms is in the CALCULATIONS)
Quantity UOP Definition ASTM Definition
Repeatability Allowable difference between two tests performed by different Allowable difference between
analysts in one lab on different days at 95% confidence level. two tests performed by same
analyst in one lab on same day
at 95% confidence level.
Repeatability
Equation tDF 2 s Within-Lab tDF 2 s Within-Day
Within Lab
Standard s Within-Lab = s2Within-Day + sDay-to-Day
2
+ s2Analyst-to-Analyst s Within-Day
Deviation
Reproducibility Allowable difference between two tests performed by different Same as UOP definition but
analysts in different labs on different days at 95% confidence. estimated differently.
Reproducibility Similar to UOP Reproducibility,
Equation tDF 2 s2Within-Lab + sBetween-Lab
2 but refer to ASTM E 691-92,
section 30.
Relative Bias For specified sample, the average test difference between labs. Same as ASTM E177-90a,
section 23.4.
Within Lab 2 2 2
Mean Standard s s Within-Day sDay-to-Day s Analyst-to-Analyst
X Within-Lab
= + +
Deviation nR nDays nAnalysts nDays n Analysts nAnalysts
None
DF* = nLabs (nAnalysts -1) where nDays = No. days testing per analyst nLabs = No. labs
nAnalysts = No. analysts nR = No. reps per day
Note: See SUPPLEMENT Appendix 2 for other than full model.
Relative Bias
Confidence XLab 1 - XLab 2 ± tDF* 2 sX Within-Lab None
Interval
999-97
3 of 18
DEFINITIONS
95% Confidence Interval for Mean Difference, the interval about the average difference value that is
believed to include the “true” population difference for approximately 95 out of 100 such estimates.
Analysis of Variance (ANOVA), a statistical procedure that divides the total variability for a set of data
into meaningful component parts associated with specific sources of variation. The technique, in
conjunction with the F-ratio, is used to provide a test of significance for component sources of
variation and to obtain estimates of the standard deviations attributable to those sources.
ASTM Repeatability, the allowable difference between two tests performed by the same analyst in one
laboratory on the same day. The two tests should not differ by more than the stated allowable
difference more than five percent of the time, thus giving 95% confidence in the repeatability. This
definition more closely mirrors that of ASTM, and should allow for closer comparison to
independently reported ASTM repeatability statistics. Note that the ASTM repeatability is not
restricted to duplicates that may be run over a very short period.
Day, refers to the span of time over which the analyses are performed on the same day by the same
analyst in a laboratory and from which the within-day standard deviation is estimated.
Degrees of Freedom, the number of independent observations available to estimate the standard
deviation. In general, when r constants for a model have been estimated from n data values, only (n - r)
degrees of freedom remain to estimate the model’s variability.
Duplicate, the false replication of an analysis (i.e. one test immediately followed by another) in which all
the sources of inherent variability are not operational.
Mean Square (MS), an unbiased estimate of the population variance calculated by dividing the sum of
squares (SS) by its degrees of freedom (DF).
Pooling, two or more variances (standard deviations squared) may be pooled by adding their variances
weighted by ratio of the individual variance degrees of freedom divided by the sum of the degrees of
freedom for all the variances.
Randomization, the operation of assigning a testing sequence in a purely chance manner by using a list of
random numbers. Randomization increases the chance of obtaining a representative sample from the
population and thereby assures and ensures a valid estimate of experimental error and associated
significance tests.
Relative Bias, the average test difference between laboratories for the same sample analyses.
Replicate, the random repetition of an analysis over a specified period of time, such as a day, under
identical conditions subject only to (but to all of) the random inherent variability of that time interval.
Reproducibility, the allowable difference between two tests performed by different analysts in different
laboratories on different days. Two such tests should not differ by more than the stated allowable
difference more than five percent of the time (for 95% confidence). This definition is the same as the
ASTM convention; therefore, independently reported reproducibility statistics can be directly
compared.
999-97
4 of 18
Test, the result of a single analysis performed in a laboratory by a specified UOP method. Determinations
performed in duplicate (i.e. one test immediately followed by another) are discouraged, since little
information is gained. However, when duplicates are routinely performed, a test is defined as the
average of the two determinations.
UOP Repeatability, the allowable difference between two tests performed by different analysts in one
laboratory on different days. Two tests should not differ by more than the stated allowable difference
more than five percent of the time (for 95% confidence). This definition is the same as used in past
UOP methods describing precision, and should be used when comparing data generated on other than
the same day.
PROCEDURE
The laboratory supervisor under whose jurisdiction the method is performed is responsible for collecting
the necessary nested data using the proper sampling procedure to assure uniformity between the samples
analyzed, following the UOP method exactly as written, and reporting those data together with a record of
the analyst, day, time and test number. Care must be taken to accurately record, on the form provided, the
origin of each result, noting the analyst, test number, day and time of the test.
All the data collected must be reported and no effort should be made to eliminate data points by rejecting
individual tests. Nor should the data be truncated by rounding. It is better to record too many digits than not
enough. See Appendix 2 for deciding how many digits to record based on the pooled Within-Day esd. The
resultant data are referred to as a “balanced nested” data set.
Precision studies requiring no less than the minimum 16 separate analyses are characterized in Table 2
and shown in the Figures, Within-Laboratory and Within-plus-Between Laboratory Nested Sampling
Designs. To estimate only the within-laboratory repeatability for a single sample concentration in one
laboratory using four analysts, use Form 1A-1 in Appendix 1. When two sample concentrations in a single
laboratory are to be used to estimate only the within-laboratory repeatability, use Form 1A-2 in Appendix 1.
To estimate repeatability, reproducibility and relative bias between two laboratories for a single sample
concentration using two analysts per laboratory, use Form 1B-1 in Appendix 1.
To estimate repeatability, reproducibility and relative bias between two laboratories at two sample
concentrations using two analysts per laboratory, use Form 1B-1 in Appendix 1, at each laboratory. This
requires 16 separate analyses at each laboratory for a total of 32 analyses.
999-97
5 of 18
Table 2
Nested Designs for Dfferent Precision Objectives
Figure 1A-3 shows a nested design requiring 24 analyses. This plan is for estimating only the within-
laboratory repeatability using three different concentration samples in a single laboratory with two analysts.
Another nested design with 24 analyses is shown in Figure 1B-3, and is useful for estimating repeatability,
reproducibility and relative bias between three laboratories for a single concentration sample using two
analysts per laboratory. No corresponding data form was prepared for the nested designs requiring 24
analyses.
When a method claims applicability to a broad concentration range or to different sample types, the
samples analyzed must cover the entire concentration range and/or matrices of interest.
To obtain a representative sampling for components of variance estimation, the tests run in each
laboratory should be carried out over a period spanning three weeks. Furthermore, the testing intervals for
the analysts should overlap each other. In addition, the two analyses carried out each day or shift by the
analyst should be performed, if possible, at different times during the day or shift.
999-97
6 of 18
Note : Data forms corresponding to the Figure 1A-1 and Figure 1A-2 designs above are found in Appendix
1.
999-97
7 of 18
Note : The data form corresponding to Figure 1B-1 design above is found in Appendix 1.
999-97
8 of 18
The replicate analyses carried out during the course of a day, to provide a realistic estimate of the within-
day precision, must be capable of being considered to be a random sample of the day for the statistical
estimates to work as intended. For the standard nested design, some replicates should be run one after
another in the AM and/or PM. Other replicates should be run with the maximum time span between the
within-day tests made by the same analyst. See EXAMPLE CALCULATION NO. 1 Form 1B-1 in
SUPPLEMENT to this document showing the more-or-less random times that the within-day replicate
analysis were run by the same analyst.
Data Examination
It is conceivable that some of the data may be unusable, which then unbalances the nested design. When
the ANOVA is complicated by missing data, it is best to consult a statistician.
The within-day replicate data should be plotted as a function of the Lab or Concentration, Analyst and
Day identification, to see the reasonableness of the data prior to any statistical analysis. The within-day
replicate plot should be examined for unusual analysis results due possibly to a recording error or error in
carrying out the test. Figure 2 in the EXAMPLE CALCULATION NO. 1 section of the SUPPLEMENT
shows an example of this type of plot. The decision to reject test data should be made in consultation with a
statistician.
The within-day replicate data should be statistically analyzed using a simple range chart to detect
abnormalities (see EXAMPLE CALCULATION NO. 1 section, Figure 3 of the SUPPLEMENT).
The total analytical variability may be separated into specific causes or sources of variability by
statistically analyzing the chemical and/or physical analysis results using the stepwise ANOVA procedure.
The components of variance that must be estimated from the data are as follows:
1. Within-day variance component measuring variation between tests performed on a single day, by one
analyst, in one laboratory. This estimated within-day component (s2Within-Day ) is used to calculate the
ASTM Repeatability.
2. Day-to-day variance component measuring variation among single tests performed on different days,
2
by one analyst, in one laboratory. This estimated component (sDay-to-Day ) could be zero, under ideal
conditions.
3. Analyst-to-analyst variance component measuring variation between single tests performed on one
day, by different analysts, in one laboratory. This estimated component (s2Analyst-to-Analyst ) could be
zero, under ideal conditions.
4. A laboratory-to-laboratory variance component measuring the variation among single tests performed
on one day, by one analyst, in different laboratories. Combined with the three other components of
variation, this is used to develop the Reproducibility statement. Similarly, the laboratory-to-laboratory
2
estimated component (sLab-to-Lab ) could be zero, under ideal conditions.
The first three components are added to estimate the total variation in any of the testing laboratories, and
are used to develop the UOP Repeatability statement.
999-97
9 of 18
The nested design stepwise ANOVA procedure is described in Appendix 2 of the SUPPLEMENT to this
document. In the SUPPLEMENT the nested design stepwise ANOVA procedure is applied to copper
analysis data and tin analysis data in two separate examples.
Once the components of variance are estimated, they are used to calculate ASTM Repeatability, UOP
Repeatability, Reproducibility and Relative Bias with its confidence interval. When a method claims
applicability to a broad concentration range or to different sample types, the samples analyzed must cover
the entire concentration range and/or matrices of interest. Furthermore, a separate precision statement is
developed for each target concentration or matrix, unless statistical tests indicate that the components of
variance by concentration and/or matrices are not significantly different and hence, may be combined.
The ANOVA and other computations discussed in this document can be carried out using the Minitab®
software. The use of this software is recommended since it is relatively easy to use.
CALCULATIONS
The details of the precision calculations are in the SUPPLEMENT to UOP 999-97.
Repeatability Calculation
ASTM Repeatability is defined as the allowable difference between two tests performed by the same
analyst in one lab on the same day at the 95% confidence level. It is calculated as follows:
where:
tDF = t distribution value with DF degrees of freedom for two-tail 95% confidence, Table 3
SWithin-Day = Within-day estimated standard deviation with DF degrees of freedom
DF = (Total number of tests / No. tests by analyst per day) x (No. tests by analyst per day – 1)
Usually DF equals 8 where the total number of tests is 16 with each analyst performing
2 tests during the course of a day or shift.
2 = Value when multiplied by SWithin-Day estimates the standard deviation for the difference
between two tests made anytime during the day by the same analyst.
UOP Repeatability is defined as the allowable difference between two tests performed by different ana-
lysts in one lab on different days at the 95% confidence level. It is calculated as follows:
999-97
10 of 18
where:
tDF = t distribution value with DF degrees of freedom for two-tail 95% confidence, Table 3
sWithin-Lab = Within-Lab estimated standard deviation with DF degrees of freedom containing the
within-day variability and, if present, the day-to-day and analyst-to-analyst variability;
or,
= s2Within-Day + sDay-to-Day
2
+ s2Analyst-to-Analyst with DF degrees of freedom
DF = Estimated by the Satterthwaite equation as discussed in Appendix 2 of the
SUPPLEMENT. The Satterthwaite equation weights the degrees of freedom of the
within-day, day-to-day and analyst-to-analyst components of variance. For standard
nested design with 16 tests, the Satterthwaite calculated DF for sWithin-Lab is between 2
and 8 depending on the ANOVA MS and degree of freedom values.
2 = Value when multiplied by sWithin-Lab estimates the standard deviation for the difference
between two tests made by different analysts in one lab on different days.
Reproducibility Calculation
Reproducibility is defined as the allowable difference between two tests performed by different analysts
in different labs on different days at the 95% confidence level. It is calculated as follows:
where:
tDF = t distribution value with DF degrees of freedom for two-tail 95% confidence, Table 3
SWithin-Lab = Within-Lab estimated standard deviation with DF degrees of freedom containing the
within-day variability and, if present, the day-to-day and analyst-to-analyst variability;
or,
= s2Within-Day + sDay-to-Day
2
+ s2Analyst-to-Analyst with DF degrees of freedom
SLab-to-Lab = Lab-to-lab estimated standard deviation accounting for average differences between the
laboratories in the study
DF = Estimated by the Satterthwaite equation as discussed in Appendix 2 of the
SUPPLEMENT. The Satterthwaite equation weights the degrees of freedom of the
within-day, day-to-day, analyst-to-analyst and lab-to-lab components of variance. For
standard nested design with 8 tests at 2 labs, the Satterthwaite calculated DF for
s2Within-Lab + sLab-to-Lab
2
is between 1 and 8 depending on the ANOVA MS and degree
of freedom values.
2 = Value when multiplied by s2Within-Day + sBetween-Lab
2
estimates the standard deviation
for the difference between two tests made by different analysts in different laboratories
on different days.
999-97
11 of 18
Table 3
Degrees of Degrees of Degrees of
Freedom t-Value Freedom t-Value Freedom t-Value
1 12.7062 12 2.1788 23 2.0687
2 4.3027 13 2.1604 24 2.0639
3 3.1824 14 2.1448 25 2.0595
4 2.7765 15 2.1315 26 2.0555
5 2.5706 16 2.1199 27 2.0518
6 2.4469 17 2.1098 28 2.0484
7 2.3646 18 2.1009 29 2.0452
8 2.3060 19 2.0930 30 2.0423
9 2.2622 20 2.0860 60 2.0003
10 2.2281 21 2.0796 120 1.9799
11 2.2010 22 2.0739 Infinity 1.9600
Relative bias is defined as the average test difference between laboratories for the same sample analyses.
For three laboratories the relative biases are calculated as follows:
where:
XLab 1 = Lab 1 Average of n tests
XLab 2 = Lab 2 Average of n tests
XLab 2 = Lab 3 Average of n tests
The uncertainty of the laboratory biases is given by the 95% confidence interval for those differences:
999-97
12 of 18
where:
sx Within-Lab = s2Within-Day 2
sDay-to-Day s2Analyst-to-Analyst
+ +
nR nDays nAnalysts nDays nAnalysts nAnalysts
DF* = nLabs (nAnalysts−1) degrees of freedom for full model (see Model 1, Appendix 2 in the
SUPPLEMENT)
tDF* = t distribution value with DF* degrees of freedom for two-tail 95% confidence, Table 3
nLabs = Number of laboratories
nAnalysts
= Number of analysts per lab
nDays
= Number of days testing per analyst
nR = Number of replicates per day per analyst
2 = Value when multiplied by s estimates the standard deviation for the average
x Within-Lab
difference of nAnalysts nDays nR tests made at each of two laboratories.
In the event that bias exists in the data, efforts are expected to be made to find the source of the bias and
then to eliminate the bias, if possible.
REPORT
The statements included in the PRECISION section of a UOP method depend upon whether data were
collected from only one or two laboratories, or from three or more laboratories.
The within-day and within-laboratory esd and the repeatability are clearly stated. Then a reproducibility
statement is added to clearly show that there is insufficient data for reporting the reproducibility. For
example:
ASTM Repeatability
A nested design was carried out for copper analysis with four analysts. Each analyst carried out tests on
two separate days and performed two tests each day. The total number of tests performed was 16. Using a
stepwise analysis of variance procedure the within-day estimated standard deviation (esd) was 0.0009. The
average copper concentration of all 16 analyses was 0.392 mass-%. Two tests performed in one laboratory
by the same analyst on the same day should not differ by more than 0.003 with 95% confidence.
999-97
13 of 18
UOP Repeatability
A nested design was carried out for copper analysis with four analysts, with each analyst carrying out
tests on two separate days and performing two tests each day for a total of 16 analyses. Using a stepwise
analysis of variance procedure the within-lab estimated standard deviation (esd) was 0.0014. The average
copper concentration of all 16 analyses was 0.392 mass-%. Two tests performed in one laboratory by
different analysts on different days should not differ by more than 0.005 with 95% confidence.
Reproducibility
There is insufficient data to report the reproducibility of the test at this time.
The within-day esd, within-laboratory esd and between-laboratory esd are clearly stated. Then the
repeatability and reproducibility are stated. For example:
ASTM Repeatability
A nested design was carried for tin analysis. Forty eight analyses were performed in three laboratories at
two concentrations with two analysts per laboratory, with each analyst carrying out tests on two separate
days and performing two tests each day. Using a stepwise analysis of variance procedure the within-day
estimated standard deviation (esd) was 0.0007. The average tin concentration of 24 analyses at each
concentration was 0.377 mass-% and 0.596 mass-%, respectively. Two tests performed in one laboratory by
the same analyst on the same day should not differ by more than 0.002 with 95% confidence.
UOP Repeatability
A nested design was carried out for tin analysis. Forty eight analyses were performed in three laboratories
at two concentrations with two analysts per laboratory, with each analyst carrying out tests on two separate
days and performing two tests each day. Using a stepwise analysis of variance procedure the within-lab
estimated standard deviation (esd) was 0.0018. The average tin concentration of 24 analyses was 0.377
mass-% and 0.596 mass-%, respectively. Two tests performed in one laboratory by different analysts, on
different days, should not differ by more than 0.005 with 95% confidence.
Reproducibility
A nested design was carried out for tin analysis. Forty eight analyses were performed in three laboratories
at two concentrations with two analysts per laboratory, with each analyst carrying out tests on two separate
days and performing two tests each day. Using a stepwise analysis of variance procedure the between-lab
estimated standard deviation (esd) was 0.0019. The average tin concentration of 24 analyses was 0.377
mass-% and 0.596 mass-%. Two tests performed in different laboratories by different analysts, on different
days, should not differ by more than 0.008 with 95% confidence.
999-97
14 of 18
REFERENCES
SUGGESTED SUPPLIERS
999-97
15 of 18
APPENDIX 1
______________ 2 ____________________
______________ 2 ____________________
______________ 2 ____________________
______________ 2 ____________________
______________ 2 ____________________
______________ 2 ____________________
______________ 2 ____________________
______________ 2 ____________________
999-97
16 of 18
APPENDIX 1
___________ 2 _______________
___________ 2 _______________
___________ 2 _______________
___________ 2 _______________
___________ 2 _______________
___________ 2 _______________
___________ 2 _______________
___________ 2 _______________
999-97
17 of 18
APPENDIX 1
______________ 2 ____________________
______________ 2 ____________________
______________ 2 ____________________
______________ 2 ____________________
______________ 2 ____________________
______________ 2 ____________________
______________ 2 ____________________
______________ 2 ____________________
999-97
18 of 18
APPENDIX 2
Given the pooled Within-Day estimated standard deviation, the minimum number of digits to record the
analytical data is shown below:
When the Within-Day esd is not known from similar procedures, it is best to record the data using too many
digits than to find out later that more digits are needed.
Copper data from EXAMPLE CALCULATION NO. 1 section of the SUPPLEMENT is shown below. The
data in the rightmost columns has been intentionally truncated and rounded to the nearest thousandth. The
estimated Within-Day standard deviation is 0.000765 for the data rounded to the nearest ten-thousandth and
0.000866 for the data rounded to the nearest thousandth. When the data is truncated and rounded to the
nearest thousandth the resulting Within-Day standard deviation is 13% higher. The table above shows the
data should be recorded to the nearest ten-thousandth.
999-97