Guide To Interprate ROC Analysis
Guide To Interprate ROC Analysis
Abstract:
This review article provides a concise guide to interpreting receiver operating characteristic (ROC)
curves and area under the curve (AUC) values in diagnostic accuracy studies. ROC analysis is
a powerful tool for assessing the diagnostic performance of index tests, which are tests that are
used to diagnose a disease or condition. The AUC value is a summary metric of the ROC curve
that reflects the test’s ability to distinguish between diseased and nondiseased individuals. AUC
values range from 0.5 to 1.0, with a value of 0.5 indicating that the test is no better than chance
at distinguishing between diseased and nondiseased individuals. A value of 1.0 indicates perfect
discrimination. AUC values above 0.80 are generally consideredclinically useful, while values below
0.80 are considered of limited clinical utility. When interpreting AUC values, it is important to consider
the 95% confidence interval. The confidence interval reflects the uncertainty around the AUC value.
A narrow confidence interval indicates that the AUC value is likely accurate, while a wide confidence
interval indicates that the AUC value is less reliable. ROC analysis can also be used to identify the
optimal cutoff value for an index test. The optimal cutoff value is the value that maximizes the test’s
sensitivity and specificity. The Youden index can be used to identify the optimal cutoff value. This
review article provides a concise guide to interpreting ROC curves and AUC values in diagnostic
accuracy studies. By understanding these metrics, clinicians can make informed decisions about
the use of index tests in clinical practice.
Keywords:
Submitted: 15‑08‑2023 Area under the curve, diagnostic study, receiver operating characteristic analysis, receiver operating
Revised: 24‑08‑2023 characteristic curve
Accepted: 12‑09‑2023
Published: 03-10-2023
© 2023 Turkish Journal of Emergency Medicine | Published by Wolters Kluwer - Medknow 195
Çorbacıoğlu and Aksel: ROC Analysis
to the gold standard reference test. Diagnostic ability the index test generates continuous (or occasionally
encompasses not only the index test’s diagnostic ordinal) outcomes, multiple potential cutoff values
prowess (specificity, PPV, and PLR) but also its ability emerge. Selecting the optimal cutoff value, especially
to distinguish healthy individuals from those with the for novel diagnostic tests, poses challenges. With
targeted condition (sensitivity, NPV, and NLR).[1‑4] continuous numerical outcomes, diagnostic accuracy
studies yield distinct distributions of test results for
Two Types of Diagnostic Studies both diseased and nondiseased groups.[9] For example, a
diagnostic accuracy study evaluating B‑type natriuretic
There are two main types of diagnostic studies in peptide (BNP) blood levels in diagnosing heart failure
medicine: two‑by‑two tables and receiver operating could yield the following distributions:
characteristic (ROC) analysis. The choice between these • An ideal diagnostic test would yield sensitivity and
depends on whether the index test yields dichotomous specificity of 100%, resulting in nonoverlapping BNP
or continuous results. distribution graphs for individuals with and without
heart failure [Figure 1a]
Diagnostic Accuracy Studies with • However, real‑world scenarios tend to involve
Dichotomous Index Test Results overlapping distributions [Figure 1b].
The two‑by‑two table is used when both the index test Receiver Operating Characteristic Analysis
and reference test results are dichotomous. As shown and Receiver Operating Characteristic
in Table 1, sensitivity, specificity, PPV, NPV, PLR, and
NLR are calculated based on the data in the table’s four
Curve
cells. True positive fraction (TPF) and False positive
ROC analysis involves dichotomizing all index test
fraction (FPF) are two other important parameters that
outcomes into positive (indicative of disease) and
have a diagnostic character in cases where the index
negative (nondisease) based on each measured index test
test is positive. TPF reflects the index test’s accuracy in
value. For instance, if a measured BNP result is 235 pg/
detecting disease (and is equivalent to sensitivity), while
ml, ROC analysis would classify all values exceeding 235
FPF gauges the index test’s positivity in nondiseased
as positive and the rest as negative. Relevant diagnostic
individuals (and is equivalent to 1 – specificity).[5] In
performance metrics (sensitivity, specificity, PPV,
cases where the reference test is also dichotomous, but
NPV, PLR, and NLR) are then calculated, mirroring the
the index test yields continuous numerical results, the
two‑by‑two table methodology. This process is repeated
diagnostic study method used is the ROC analysis.[6‑8]
for all measured values within the ROC analysis. This
While the ROC curve and the resultant area under the
approach enables the presentation and examination on
curve (AUC) offer a concise summary of the index test’s
of these metrics as a table, followed by 33 the graphical
diagnostic utility, clinicians may encounter challenges
depiction of this table, termed the ROC 34 curve
in interpreting these values. This concise review aims
[Figure 2].[10‑12] The ROC curve plots TPF (sensitivity)
to guide clinicians through the interpretation of ROC
curves and AUC values when presenting findings from and FPF (1 – specificity) values for each index test
their diagnostic accuracy studies. outcome on an x‑y coordinate graph. This curve results
from combining coordinate points from each outcome.
The diagonal reference line at a 45° angle signifies the
Diagnostic Accuracy Studies with
diagnostic test’s discriminative power akin to random
Numerical Index Test Results chance. The upper left corner corresponds to perfect
discriminatory power, represented by a TPF of 1 and
In cases where the index test yields a dichotomous
an FPF of 0 (where sensitivity and specificity both attain
outcome (a single cutoff value), the two‑by‑two table
100%).
is sufficient, as discussed earlier. However, when
b
Figure 1: Two different BNP distribution graphs of the subjects groups with and without heart failure. TN: true negative, TP: true positive, FN: false negative, FP: false positive
(a) An ideal diagnostic test would yield sensitivity and specificity of 100%, resulting in non-overlapping BNP distribution graphs for individuals with and without heart failure. (b)
real-world scenarios tend to involve overlapping distributions; sensitivity and specificity values are not 100%
Turkish Journal of Emergency Medicine - Volume 23, Issue 4, October-December 2023 197
Çorbacıoğlu and Aksel: ROC Analysis
198 Turkish Journal of Emergency Medicine - Volume 23, Issue 4, October-December 2023