Standards 02 00014 v2
Standards 02 00014 v2
R&D Department, Portuguese Institute of Blood and Transplantation, Avenida Miguel Bombarda, 6,
1000-208 Lisbon, Portugal; paulo.pereira@ipst.min-saude.pt
Abstract: Background: The performance assessment of tests that express qualitative results in the
medical laboratory is of primary importance in characterization, diagnosis, follow-up, and screening.
An important contribution to this type of assessment may be the publication of the Eurachem AQA
2021 guide. The text intends to principally discuss the consistency of the subclauses of this guide
with ISO 15189 and CLSI EP12-A2. Methods: The study involves a literature review within the
scope of qualitative tests. Results: Tables are used for crossing AQA. with ISO 15189 and CLSI
EP12-A2. Conclusions: Consistency with ISO 15189 and CLSI EP12-A2 is demonstrated in the study.
Introducing “uncertainty of proportion” reflects the necessity of assessing uncertainties when dealing
with qualitative results.
Citation: Pereira, P.
Eurachem/CITAC Guide 1. Introduction
“Assessment of Performance and 1.1. The Performance Assessment of Qualitative Results in the Medical Laboratory
Uncertainty in Qualitative Chemical
The quality control principles in the medical laboratory became more systematized
Analysis”—A Medical Laboratory
from the 1970s onward. This was due to a vast number of publications that transposed
Perspective. Standards 2022, 2,
and reviewed approaches to chemistry from a clinical perspective. This development
194–201. https://doi.org/10.3390/
mainly focused on tests that express quantitative quantities, such as 20 IU/L of alanine
standards2020014
aminotransferase (ALT). The methodologies initially focused on validation and later on
Academic Editor: Mihajlo internal quality control (IQC) and external quality assessment (EQA). The availability of
(Michael) Jakovljevic guides for evaluating the performance of quantitative tests is vast, with the majority, within
Received: 23 December 2021
the medical laboratory, published by the Clinical Laboratory Standards Institute (CLSI) [1],
Accepted: 6 April 2022
a global agency. CLSI is an international not-for-profit group that develops laboratory
Published: 17 May 2022
standards for use in the medical laboratory stakeholders using volunteers’ expertise. Its
standards are recognized by medical laboratories, accreditors, government agencies, and
Publisher’s Note: MDPI stays neutral
manufacturers of in vitro diagnostic medical devices (IVDDs) as reference techniques for
with regard to jurisdictional claims in
improving medical laboratory testing.
published maps and institutional affil-
Aside from measurement uncertainty [2], we can argue that the medical laboratory’s
iations.
specific performance assessment methods comply with the general principles embodied
in the relevant standards, such as ISO 15189 [3]. Medical laboratories can use this global
standard to develop quality management systems and assess their own competence by
Copyright: © 2022 by the author.
accrediting laboratory tests and methods [4]. As a result of current performance assessment
Licensee MDPI, Basel, Switzerland. methodologies, laboratories can prove that they operate effectively, even complying with
This article is an open access article ISO/IEC 17025 [5]. Similar considerations can be made of the various national legal regimes
distributed under the terms and and the manner in which IVDDs are regulated. [6,7].
conditions of the Creative Commons We emphasize the role of the International Organization for Standardization (ISO),
Attribution (CC BY) license (https:// which has developed over 24,222 international standards. A total of 167 national standard-
creativecommons.org/licenses/by/ setting bodies are members of ISO, an independent, non-governmental international or-
4.0/). ganization. We can confidently say that no other global organization has contributed as
much to harmonizing practices in industry and services, with ISO 15189 being one of the
examples in the medical laboratory.
When it comes to tests that express qualitative results, the number of publications is
much smaller. For example, compared to quantitative assessments, CLSI has published
a small number of documents for qualitative performance assessment, with the central
guide being EP12-A2 [8], which is under review. The evaluation methodology in this
type of study focuses on Bayesian probability [9], often crossing with epidemiological
principles. Concepts such as uncertainty are not usually associated with these assessments.
Performance is generally assessed through clinical sensitivity and clinical specificity, which
are the proportions of true results for a given condition (5.3 of [8]), e.g., SARS-CoV-2-
infected individuals (condition), and the proportion of negative results in the absence of
a given condition (clinical specificity), i.e., in healthy individuals. Both are immediately
identified as estimates of diagnostic accuracy. Even when the 95% confidence interval (CI)
for these proportions is calculated, the evaluation usually focuses solely on the absolute
clinical sensitivity and clinical specificity values. A review of this guide can be found
anywhere [10].
The COVID-19 pandemic highlighted the importance of diagnostic accuracy assess-
ment in screening tests. The evaluation, like the one exposed by the Foundation for Inno-
vative New Diagnostics (FIND), a global health nonprofit organization based in Geneva,
Switzerland, and a World Health Organization (WHO) Collaborating Center for Laboratory
Strengthening and Diagnostic Technology Evaluation, allowed us to assess which test is
recommended for screening, in addition to differentiating the performance of both real-time
polymerase chain reaction (RT-PCR) tests for ribonucleic acid (RNA) detection and tests for
anti-IgG/IgM antibodies of SARS-CoV-2 [11].
also consulted the “Medical Devices Sector” websites of this Commission [6] and that of
the US Food and Drug Administration (FDA) “Medical Devices” [7].
3. Results
In the tables, crossings between clauses or subclauses are noted with a cross. We
can understand that the crossed clauses or subclauses comprise similar and replicable
methodologies in medical laboratory practice. The clauses or subclauses were identified on
the basis of their fit for the purpose of the AQA 2021 in the medical laboratory. Crossing
with related clauses or subclauses in whole or in part was allowed.
In the ISO standard, only subclauses related to intra-laboratory performance assess-
ment were considered to have the comparison within the same scope. For example, sub-
clauses 5.6.3 “interlaboratory comparisons” and 5.6.4 “comparability of examination results”
are somehow out of focus in this study, so they are not discussed.
Considering the application of the AQA 2021 guide in the medical laboratory, we
consider it to crosse with the technical requirements of ISO 15189. Table 1 shows this
crossing. The AQA 2021 and CLSI EP12-A2 technical guides were a second cross. Table 2
presents this crossover.
AQA 2021
3.2 3.3 Eval-
5 Report-
Quantifi- uating
3.4 Limit ing the
2 Types cation of False 4.3 4.5 Un-
of Detec- 4.2 Like- 4.4 Relia- Qualita-
of Quali- Qualita- Positive Posterior certainty
ISO 15189 tion and lihood bility of tive
tative tive and Probabil- of Pro-
Selectiv- Ratio Metrics Analyti-
Analysis Analysis False ity portions
ity cal
Perfor- Negative
Result
mance Rates
5.5.1.2 Verification of
X X X X X X X X X
examination procedures
5.5.1.3 Validation of
X X X X X X X X X
examination procedures
5.5.1.4 Measurement
uncertainty of measured X1
quantity values
1 Measurement uncertainty is not an ISO 15189 requirement for tests that do not express a quantitative value.
AQA 2021
3.2 3.3 Eval-
5 Report-
Quantifi- uating
3.4 Limit ing the
2 Types cation of False 4.3 4.5 Un-
of Detec- 4.2 Like- 4.4 Relia- Qualita-
of Quali- Qualita- Positive Posterior certainty
CLSI EP12-A2 tion and lihood bility of tive
tative tive and Probabil- of Pro-
Selectiv- Ratio Metrics Analyti-
Analysis Analysis False ity portions
ity cal
Perfor- Negative
Result
mance Rates
6 Device familiarization and
X
training
7 Evaluation materials X X X X X
8 Bias and imprecision studies
9 Comparison methods X X X X X X
10 Data analysis X X X X X X X
6 Device familiarization and
X
training
Standards 2022, 2 198
4. Discussion
Table 1 shows a significant crossover between AQA 2021 and ISO 15189. “Verifica-
tion”, in the ISO standard, comprises the performance assessment of validated examination
procedures used without modification before being introduced into routine use. It refers
to commercialized tests that have already exhausted validation by the manufacturers of
in vitro diagnostic medical devices and are approved by a notified body, as is the case in the
European Union or the US. On the other hand, the “validation” of examination procedures
is derived from the following sources: (a) nonstandard methods; (b) laboratory designed
or developed methods; (c) standard methods used outside their intended scope; (d) val-
idated methods subsequently modified. Compared to verification, validation involves
more complex models, for example, determining the cutoff in an “in-house” test. The
verification fundamentally aims to know if the manufacturer’s performance is replicable in
the laboratory. The deferent clauses can be operationalized through the AQA.
The crossing with CLSI EP12-A2 clauses is shown in Table 2. We can understand that
both the CLSI guide and the AQA 2021 aim to operationalize the technical requirements
stipulated mainly in subclauses ISO 15189 5.5.1.2 and 5.5.1.3. The mathematical models
for determining clinical/diagnostic accuracy, i.e., clinical sensitivity and clinical specificity,
are the same as those of the CLSI EP12-A2. As expected, they are based on Bayesian
probability [9]. The same is true for other probabilities computed from a 2 × 2 contingency
table, such as positive and negative predictive values. While clinical accuracy is more
relevant to the performance assessment of a given qualitative test, predictive values are
more important to the physician. While the former is the proportions of true results in
samples with a particular condition and without that condition, as mentioned above, the
predictive values are the proportion of individual results with a specific condition and
without that condition in positive and negative samples, respectively.
Whenever the diagnosis is unknown, it is possible to calculate, alternatively, the
agreement of positive and negative results. Similarly to diagnostic accuracy, both are
calculated from the results of the contingency table. Mathematical models are similar to
sensitivity and specificity. In this case, the ratios are influenced by non-concordant results.
Compliance assessment is one of the most important and least harmonized topics,
not being clear to all medical laboratories when validating clinical sensitivity and speci-
ficity results. The European Commission has published some guides on the performance
evaluation of in vitro diagnostic medical devices, such as SARS-CoV-2 [18]. This docu-
ment includes specimen type, number of samples, and acceptance criteria for the different
performance assessment parameters. As a rule of thumb, the medical laboratory must
set clinical sensitivity and specificity targets depending on the intended use of its re-
sults. Let us consider the blood bank case versus the clinical pathology laboratory. The
sensitivity/specificity tradeoff in blood banks favors sensitivity to minimize the risk of
false-negative results, implying a high risk of post-transfusion infection. In contrast, in
a pathology laboratory, the sensitivity may be lower than 100% because we can retest a
patient without causing harm to third parties. Compliance assessment is poorly discussed
in AQA 2021 and EP12-A2. Eurachem
The most significant difference in the diagnostic accuracy approaches is probably due
to the view introduced by the AQA, which is based on publications by Pereira et al. [19]
(4.4.6 of [20]): uncertainty of proportions. The calculation model is the same as published
in the CLSI guide EP12-A2 for 95% CI for clinical sensitivity and clinical specificity. A 95%
score confidence interval, attributed to Wilson [21], is calculated in both guides. The two-
sided 95% CI for sensitivity or specificity must exceed the lower bound criteria. The criteria
are easy to compute; a fixed n is considered for each sample type, or the requirements are
recalculated according to the n. For example, for n = 10 infected samples, the lower bound
criterion is 72%, which happens when sensitivity is 100%. On the other hand, if specificity
of 95% is acceptable, the lower bound criterion is 88.8%.
The introduction of the term “uncertainty of proportions” can be understood as a
milestone; as far as we know, it is the first global guide to address the uncertainty of binary
Standards 2022, 2 199
positive/negative data. This model is easily replicable for other qualitative outcomes
such as blood groups and karyotypes. In fact, the concept is similar to the expanded
measurement uncertainty, which is also associated with a 95% CI. Thus, a larger interval
expressing the uncertainty indicates a lower likelihood of the sensitivity or specificity value
being in the population with the epidemiological characteristics of the samples studied,
with a 95% confidence and a beta and alpha risk of 5%. We believe that this introduction
will demystify the principle of “impossibility of calculation” in qualitative expressions.
This myth is most likely because the measurement uncertainty is solely for quantitative
expressions. Note that, for example, subchapter ISO 15189 5.5.1.4. does not apply to
qualitative test results.
For a clearer understanding of the uncertainty of proportions, let us present an example
test for screening for antibodies against the hepatitis C virus (HCV) by immunoassay. The
performance assessment study involves 20 samples from patients diagnosed with HCV
infection and 80 healthy subjects. The claimed clinical sensitivity and specificity results
are 100% and 90%, respectively. For the target uncertainty in these proportions for a 95%
CI, lower bound criteria of 84% for sensitivity and 82% for specificity are claimed. We can
interpret that false-negative results are not allowable, admitting up to eight false-positive
results. Absolute values are 100% and 99% for sensitivity and specificity, respectively.
Therefore, the test is valid according to the first criterion. The lower limits of the 95% CI
were 84% for sensitivity and 93% for specificity; hence, the probability of true results is
lower than what was claimed. Note the importance of the consistency of claimed results
with the number of false results allowed and the number of samples tested.
CLSI EP12-A2 presents a qualitative method-precision experiment for measurand
concentrations near the cutoff (C50 ) (8.3 of [8]), recognized as the “C5 -C95 interval”, which,
as we can understand, is harmonized with IVDD manufacturers. This model is rarely
used in the medical laboratory. Anyway, it could be important for “in-house” or modified
tests since it lets you know the consistency of “high negatives” and “low positives” in true
results with 95% confidence (95% trueness). Low positives should report 95% of positive
results. This template is not part of the AQA content.
The area under the receiver operating characteristic (ROC) curve (AUC) [22] funda-
mentally determines the cutoff point in tests during development, based on the clinical
sensitivity/specificity tradeoff. This model allows, for hypothetical cutoff values, to know
the clinical sensitivity and specificity for each point. The “best” cutoff is chosen according
to the intended use of the reported results. In fact, the cutoff is selected on the basis of the
performance assessment of each candidate point. For example, in a blood bank, sensitivity
is favored over specificity. None of the guides provides a sufficient discussion of their use
in the medical laboratory.
Even though measurement uncertainty can be important in calculating the “gray
zone” [23] in ordinal qualitative tests, binary results are classified by comparing a numerical
result as a function of a clinical decision point (cutoff). Depending on the order relative to
the cutoff, it is classified as positive or negative. For example, if we use the signal-to-cutoff
ratio (S/CO), where the cutoff is one, positive is equal to or higher than one, and negative
is lower. Pereira et al. [24] demonstrated the calculation of measurement uncertainty in
ratios close to this decision point. From this uncertainty, the “guard band”/“gray zone”
was calculated, in which the results were classified as indeterminate. The importance of a
ternary classification depends on the fitness for the purpose/intended use of the reported
results. Again, it will be more significant in a blood bank than in a clinical pathology
laboratory. The empirical determination of the “gray zone” is not referred to in any of
the guides. Previously, Dimech et al. [25] published a measurement uncertainty study of
screening immunoassays based on EQA data.
Furthermore, another important performance assessment tool in this type of test is
the detection limit. This limit is measured in molecular biology tests, such as RT-PCR. This
value is also recognized as “analytical sensitivity”, i.e., the value from which we have 95%
true positives, identified as a “95% hit rate”, e.g., 102 target RNA copies per reaction. Its
Standards 2022, 2 200
determination employs probit regression (5.5 of [26]), also called the probit model. It is
used to model dichotomous or binary outcome variables. The inverse standard normal
distribution of the probability is modeled as a linear combination of the predictors. It is
closely related to the logit function and logit model. This model is not covered in EP12-A2.
Despite being referred to in the AQA, it is not presented in detail.
Lastly, let us discuss the importance of the delta value in test performance (4.5 of [20,27]).
This tool is important when at least two tests have identical performance assessments, e.g.,
when clinical sensitivity is equal. The delta value answers the following question: “Which of
these tests is most likely to report false or indeterminate results?”. This question is important,
mainly in validating blood components in blood banks or human organs, cells, and tissues.
What is intended to be mitigated goes beyond the risk of false results. It also includes the
risk of a negative impact on budget and stock. For example, it is recognized in the case of
false results as it implies retesting, elimination of blood components, and suspension of blood
donors. Delta values are determined separately for individuals with positive and negative
conditions, abbreviated as δ+ and δ−, respectively. The results are interpreted as follows: a
higher delta value indicates a lower tendency for a test to produce false or indeterminate
results in human samples with the same epidemiological prevalence as the study samples.
This approach is not covered in any of the guides.
5. Conclusions
The Eurachem/CITAC technical guide “Assessment of performance and uncertainty
in qualitative chemical analysis” is suggested to operationalize the ISO subclauses to
verify and validate examination procedures. At this point, AQA 2021, open-source, is an
alternative to EP12-A2, being consistent with its methodologies. The AQA examples of
clinical sensitivity and specificity are easily replicable in the medical laboratory. It would
be important for the compliance assessment to be discussed in depth in future reviews
of both guides. The introduction of the uncertainty of proportions allows the medical
laboratory to assess the uncertainty in this type of test, which is a surplus concerning ISO
15189. Compared to EP12-A2, no limitations emerged when applying the AQA in the
medical laboratory. However, both guides have limitations, especially in validation or
even developmental studies. They do not include complimentary performance assessment
models, such as the cutoff determination, even though the EP12-A2 allows us to assess their
consistency with the “C5 -C95 ” principle. Another missing point is the computation of the
“gray zone”, which is important in human product banks for transfusion or transplantation,
and the delta value.
Considering the similarities with EP12-A2, it is likely that AQA 2021 could serve as
another essential guide for medical laboratory stakeholders, such as med labs, regulatory
agencies and IVDD manufacturers.
References
1. CLSI. Who We Are. 2021. Available online: https://clsi.org/about/ (accessed on 23 March 2022).
2. Pereira, P.; Westgard, J. Letter to the Editor: Balance of the unsuccessful systematization of measurement uncertainty in medical
laboratories. Transfus. Apher. Sci. 2017, 56, 103–104. [CrossRef] [PubMed]
3. ISO 15189; Medical Laboratories-Requirements for Quality and Competence. ISO: Geneva, Switzerland, 2012.
4. Sciacovelli, L.; Aita, A.; Padoan, A.; Antonelli, G.; Plebani, M. ISO 15189 accreditation and competence: A new opportunity for
laboratory medicine (Perspective). J. Lab. Precis. Med. 2017, 2, 79. [CrossRef]
5. ISO/IEC 17025; Testing and Calibration Laboratories. ISO: Geneva, Switzerland, 2017.
Standards 2022, 2 201