0% found this document useful (0 votes)
68 views4 pages

Guide To Interprate ROC Analysis

Uploaded by

hmarekegn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views4 pages

Guide To Interprate ROC Analysis

Uploaded by

hmarekegn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Invited Review Article

Access this article online


Quick Response Code:
Receiver operating characteristic curve
analysis in diagnostic accuracy studies:
A guide to interpreting the area under
the curve value
Website:
www.turkjemerged.com Şeref Kerem Çorbacıoğlu1*, Gökhan Aksel2
DOI:
1
Department of Emergency Medicine, Atatürk Sanatoryum Training and Research Hospital, Ankara,
10.4103/tjem.tjem_182_23 2
Department of Emergency Medicine, Ümraniye Training and Research Hospital, Istanbul, Turkey
*Corresponding author

Abstract:
This review article provides a concise guide to interpreting receiver operating characteristic (ROC)
curves and area under the curve (AUC) values in diagnostic accuracy studies. ROC analysis is
a powerful tool for assessing the diagnostic performance of index tests, which are tests that are
used to diagnose a disease or condition. The AUC value is a summary metric of the ROC curve
that reflects the test’s ability to distinguish between diseased and nondiseased individuals. AUC
values range from 0.5 to 1.0, with a value of 0.5 indicating that the test is no better than chance
at distinguishing between diseased and nondiseased individuals. A value of 1.0 indicates perfect
discrimination. AUC values above 0.80 are generally consideredclinically useful, while values below
0.80 are considered of limited clinical utility. When interpreting AUC values, it is important to consider
the 95% confidence interval. The confidence interval reflects the uncertainty around the AUC value.
A narrow confidence interval indicates that the AUC value is likely accurate, while a wide confidence
interval indicates that the AUC value is less reliable. ROC analysis can also be used to identify the
optimal cutoff value for an index test. The optimal cutoff value is the value that maximizes the test’s
sensitivity and specificity. The Youden index can be used to identify the optimal cutoff value. This
review article provides a concise guide to interpreting ROC curves and AUC values in diagnostic
accuracy studies. By understanding these metrics, clinicians can make informed decisions about
the use of index tests in clinical practice.
Keywords:
Submitted: 15‑08‑2023 Area under the curve, diagnostic study, receiver operating characteristic analysis, receiver operating
Revised: 24‑08‑2023 characteristic curve
Accepted: 12‑09‑2023
Published: 03-10-2023

ORCID: Introduction tests can encompass a variety of elements,


SKC: 0000-0001-7802-8087 such as serum markers derived from blood
GA: 0000-0002-5580-3201
D iagnostic accuracy studies are
a cornerstone of medical research.
When evaluating novel diagnostic tests
samples, radiological imaging, specific
clinical findings, or clinical decision rules.
Diagnostic studies assess the index test’s
Address for or repurposing existing ones for different diagnostic performance by reporting specific
correspondence: clinical scenarios, physicians assess test metrics, such as sensitivity, specificity,
Dr. Şeref Kerem efficacy, which is referred to as index tests positive predictive value (PPV), negative
Çorbacıoğlu,
Department of
in diagnostic accuracy analyses. Index predictive value (NPV), positive likelihood
Emergency Medicine, ratio (PLR), negative likelihood ratio (NLR),
Atatürk Sanatoryum and accuracy. These metrics are compared
Training and Research This is an open access journal, and articles are
Hospital, Ankara, Turkey. distributed under the terms of the Creative Commons
E‑mail: serefkeremcorba Attribution‑NonCommercial‑ShareAlike 4.0 License, which
How to cite this article: Çorbacıoğlu ŞK, Aksel G.
cioglu@gmail.com allows others to remix, tweak, and build upon the work
Receiver operating characteristic curve analysis in
non‑commercially, as long as appropriate credit is given and
diagnostic accuracy studies: A guide to interpreting
the new creations are licensed under the identical terms.
the area under the curve value. Turk J Emerg Med
2023;23:195-8.
For reprints contact: WKHLRPMedknow_reprints@wolterskluwer.com

© 2023 Turkish Journal of Emergency Medicine | Published by Wolters Kluwer - Medknow 195
Çorbacıoğlu and Aksel: ROC Analysis

to the gold standard reference test. Diagnostic ability the index test generates continuous (or occasionally
encompasses not only the index test’s diagnostic ordinal) outcomes, multiple potential cutoff values
prowess (specificity, PPV, and PLR) but also its ability emerge. Selecting the optimal cutoff value, especially
to distinguish healthy individuals from those with the for novel diagnostic tests, poses challenges. With
targeted condition (sensitivity, NPV, and NLR).[1‑4] continuous numerical outcomes, diagnostic accuracy
studies yield distinct distributions of test results for
Two Types of Diagnostic Studies both diseased and nondiseased groups.[9] For example, a
diagnostic accuracy study evaluating B‑type natriuretic
There are two main types of diagnostic studies in peptide (BNP) blood levels in diagnosing heart failure
medicine: two‑by‑two tables and receiver operating could yield the following distributions:
characteristic (ROC) analysis. The choice between these • An ideal diagnostic test would yield sensitivity and
depends on whether the index test yields dichotomous specificity of 100%, resulting in nonoverlapping BNP
or continuous results. distribution graphs for individuals with and without
heart failure [Figure 1a]
Diagnostic Accuracy Studies with • However, real‑world scenarios tend to involve
Dichotomous Index Test Results overlapping distributions [Figure 1b].

The two‑by‑two table is used when both the index test Receiver Operating Characteristic Analysis
and reference test results are dichotomous. As shown and Receiver Operating Characteristic
in Table 1, sensitivity, specificity, PPV, NPV, PLR, and
NLR are calculated based on the data in the table’s four
Curve
cells. True positive fraction (TPF) and False positive
ROC analysis involves dichotomizing all index test
fraction (FPF) are two other important parameters that
outcomes into positive (indicative of disease) and
have a diagnostic character in cases where the index
negative (nondisease) based on each measured index test
test is positive. TPF reflects the index test’s accuracy in
value. For instance, if a measured BNP result is 235 pg/
detecting disease (and is equivalent to sensitivity), while
ml, ROC analysis would classify all values exceeding 235
FPF gauges the index test’s positivity in nondiseased
as positive and the rest as negative. Relevant diagnostic
individuals (and is equivalent to 1 – specificity).[5] In
performance metrics (sensitivity, specificity, PPV,
cases where the reference test is also dichotomous, but
NPV, PLR, and NLR) are then calculated, mirroring the
the index test yields continuous numerical results, the
two‑by‑two table methodology. This process is repeated
diagnostic study method used is the ROC analysis.[6‑8]
for all measured values within the ROC analysis. This
While the ROC curve and the resultant area under the
approach enables the presentation and examination on
curve (AUC) offer a concise summary of the index test’s
of these metrics as a table, followed by 33 the graphical
diagnostic utility, clinicians may encounter challenges
depiction of this table, termed the ROC 34 curve
in interpreting these values. This concise review aims
[Figure 2].[10‑12] The ROC curve plots TPF (sensitivity)
to guide clinicians through the interpretation of ROC
curves and AUC values when presenting findings from and FPF (1 – specificity) values for each index test
their diagnostic accuracy studies. outcome on an x‑y coordinate graph. This curve results
from combining coordinate points from each outcome.
The diagonal reference line at a 45° angle signifies the
Diagnostic Accuracy Studies with
diagnostic test’s discriminative power akin to random
Numerical Index Test Results chance. The upper left corner corresponds to perfect
discriminatory power, represented by a TPF of 1 and
In cases where the index test yields a dichotomous
an FPF of 0 (where sensitivity and specificity both attain
outcome (a single cutoff value), the two‑by‑two table
100%).
is sufficient, as discussed earlier. However, when

Table 1: Two‑by‑two table and calculating parameters


Area under the Curve Value and
of diagnostic test performance Interpretation
Index Reference test Total
test Diseased Nondiseased The AUC value is a widely used metric in clinical
Positive a (true positive) b (false positive) a+b studies, succinctly summarizing index test diagnostic
Negative c (false negative) d (true negative) c+d performance. The AUC value signifies the likelihood
Total a+c b+d that the index test will categorize a randomly selected
Summary parameters of diagnostic test performance: Sensitivity=a/(a + c), subject from a sample as a patient more accurately than
specificity=d/(b + d), PPV=a/(a + b), NPV=d/(c + d), TPF=Sensitivity=a/(a + c),
FPF=1 − specificity=d/(b + d). PPV: Positive predictive value, NPV: Negative
a nonpatient. AUC values range from 0.5 (equivalent to
predictive value, TPF: True positive fraction, FPF: False‑positive fraction chance) to 1 (indicating perfect discrimination).[13]
196 Turkish Journal of Emergency Medicine - Volume 23, Issue 4, October-December 2023
Çorbacıoğlu and Aksel: ROC Analysis

b
Figure 1: Two different BNP distribution graphs of the subjects groups with and without heart failure. TN: true negative, TP: true positive, FN: false negative, FP: false positive
(a) An ideal diagnostic test would yield sensitivity and specificity of 100%, resulting in non-overlapping BNP distribution graphs for individuals with and without heart failure. (b)
real-world scenarios tend to involve overlapping distributions; sensitivity and specificity values are not 100%

performance of an index test, means that the test is not


clinically adequate. However, some researchers make the
inference that the test is a clinically useful diagnostic test
by only looking at statistical significance. In diagnostic
value studies, AUC values above 0.90 are interpreted as
indicating a very good diagnostic performance of the test,
while AUC values below 0.80, even if they are statistically
significant, are interpreted as indicating a very limited
clinical usability of the test. The classification table of AUC
values and their clinical usability is presented in Table 2.

Notably, attention to the 95% confidence interval and its


width, alongside the AUC value, is pivotal in comprehending
diagnostic performance.[13,14] For instance, a BNP marker’s
AUC value of 0.81 might be tempered by a confidence
interval spanning 0.65–0.95. In this scenario, reliance
solely on an AUC value above 0.80 may be unwise, given
the potential for outcomes below 0.70. Thus, calculating
sample size and mitigating type‑2 error risk prove vital
prerequisites before undertaking diagnostic studies.[15]

Figure 2: Receiver Operating Characteristic (ROC) Curve


A common mistake made at this point is that when
two different index tests are wanted to be compared,
AUC values serve as a gauge for the index test’s ability to the index tests are made by considering only the
distinguish disease. An AUC value of 1 signifies flawless mathematical differences of the single AUC values from
discernment, while an AUC of 0.5 indicates performance each other. This decision should be made not only with
akin to random chance. New researchers often make errors the mathematical difference but also by considering
when interpreting the AUC value in diagnostic accuracy whether this mathematical difference is statistically
studies. This is usually due to an overestimation of the significant. The most common statistical method used to
clinical interpretation of the AUC value. For example, an statistically compare the AUC values of different index
AUC value of 0.65, calculated in a study of the diagnostic tests is the De‑Long test.

Turkish Journal of Emergency Medicine - Volume 23, Issue 4, October-December 2023 197
Çorbacıoğlu and Aksel: ROC Analysis

Table 2: Area under the curve values and its Funding


interpretation None.
AUC value Interpretation suggestion
0.9 ≤ AUC Excellent References
0.8 ≤ AUC < 0.9 Considerable
0.7 ≤ AUC < 0.8 Fair 1. Knottnerus JA, Buntinx F. The Evidence Base of Clinical Diagnosis:
0.6 ≤ AUC < 0.7 Poor Theory and Methods of Diagnostic Research. 2nd ed. Singapore:
Wiley‑Blackwell BMJ Books; 2009.
0.5 ≤ AUC < 0.6 Fail
AUC: Area under the curve
2. Guyatt G. Users’ Guides to the Medical Literature: A Manual for
Evidence‑Based Clinical Practice. 3rd ed. New York: McGraw‑Hill
Education; 2015.
Determination of Optimal Cutoff Value 3. Akobeng AK. Understanding diagnostic tests 1:
Sensitivity, specificity and predictive values. Acta Paediatr
ROC analysis also facilitates the identification of an 2007;96:338‑41.
optimal cutoff value, particularly when the AUC value 4. Akobeng AK. Understanding diagnostic tests 2: Likelihood ratios,
pre‑ and post‑test probabilities and their use in clinical practice.
surpasses 0.80. The Youden index, often employed, Acta Paediatr 2007;96:487‑91.
determines the threshold value that maximizes both 5. Nahm FS. Receiver operating characteristic curve: Overview
sensitivity and specificity. This index, calculated and practical use for clinicians. Korean J Anesthesiol
as sensitivity + specificity – 1, aids in selecting a 2022;75:25‑36.
threshold where both metrics achieve their peak. 6. Akobeng AK. Understanding diagnostic tests 3: Receiver
operating characteristic curves. Acta Paediatr 2007;96:644‑7.
Nonetheless, alternative thresholds might be chosen
7. Altman DG, Bland JM. Diagnostic tests 3: Receiver operating
based on cost‑effectiveness or varying clinical contexts, characteristic plots. BMJ 1994;309:188.
prioritizing either sensitivity or specificity. 8. Kumar R, Indrayan A. Receiver operating characteristic (ROC)
curve for medical researchers. Indian Pediatr 2011;48:277‑87.
Conclusion 9. Hajian‑Tilaki K. Receiver operating characteristic (ROC) curve
analysis for medical diagnostic test evaluation. Caspian J Intern
Studies employing ROC analysis follow reporting Med 2013;4:627‑35.
10. Hanley JA, McNeil BJ. The meaning and use of the area under
guidelines, such as the Standards for Reporting
a receiver operating characteristic (ROC) curve. Radiology
Diagnostic Accuracy Studies (STARD) guideline. The 1982;143:29‑36.
STARD guideline also states that when reporting 11. Zou KH, O’Malley AJ, Mauri L. Receiver‑operating characteristic
the diagnostic performance of an index test, not only analysis for evaluating diagnostic tests and predictive models.
sensitivity and specificity parameters should be reported, Circulation 2007;115:654‑7.
but also NLR and PLR values.[16] However, certain 12. Mandrekar JN. Receiver operating characteristic curve in
diagnostic test assessment. J Thorac Oncol 2010;5:1315‑6.
statistical programs might only report sensitivity and
13. Fischer JE, Bachmann LM, Jaeschke R. A readers’ guide to the
specificity parameters in ROC analysis. Therefore, interpretation of diagnostic test properties: Clinical example of
when an AUC value above 0.80 is attained, generating a sepsis. Intensive Care Med 2003;29:1043‑51.
two‑by‑two table based on the chosen optimal threshold 14. Tosteson TD, Buonaccorsi JP, Demidenko E, Wells WA.
and reporting all relevant metrics becomes imperative. Measurement error and confidence intervals for ROC curves.
Biom J 2005;47:409‑16.
Author contributions 15. Akoglu H. User’s guide to sample size estimation in diagnostic
Conceptualization; ŞKÇ and GA Literature search; ŞKÇ and GA accuracy studies. Turk J Emerg Med 2022;22:177‑85.
Writing-original draft: ŞKÇ, review and editing: ŞKÇ and GA. 16. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA,
Glasziou PP, Irwig L, et al. STARD 2015: An updated list of
Conflicts of interest essential items for reporting diagnostic accuracy studies. BMJ
None Declared. 2015;351:h5527.

198 Turkish Journal of Emergency Medicine - Volume 23, Issue 4, October-December 2023

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy