Skip to main content
Log in

Critical review of reporting of the data analysis step in metabolomics

  • Review Article
  • Published:
Metabolomics Aims and scope Submit manuscript

Abstract

Introduction

We present the first study to critically appraise the quality of reporting of the data analysis step in metabolomics studies since the publication of minimum reporting guidelines in 2007.

Objectives

The aim of this study was to assess the standard of reporting of the data analysis step in metabolomics biomarker discovery studies and to investigate whether the level of detail supplied allows basic understanding of the steps employed and/or reuse of the protocol. For the purposes of this review we define the data analysis step to include the data pretreatment step and the actual data analysis step, which covers algorithm selection, univariate analysis and multivariate analysis.

Method

We reviewed the literature to identify metabolomic studies of biomarker discovery that were published between January 2008 and December 2014. Studies were examined for completeness in reporting the various steps of the data pretreatment phase and data analysis phase and also for clarity of the workflow of these sections.

Results

We analysed 27 papers, published anytime in 2008 until the end of 2014 in the area or biomarker discovery in serum metabolomics. The results of this review showed that the data analysis step in metabolomics biomarker discovery studies is plagued by unclear and incomplete reporting. Major omissions and lack of logical flow render the data analysisā€™ workflows in these studies impossible to follow and therefore replicate or even imitate.

Conclusions

While we await the holy grail of computational reproducibility in data analysis to become standard, we propose that, at a minimum, the data analysis section of metabolomics studies should be readable and interpretable without omissions such that a data analysis workflow diagram could be extrapolated from the study and therefore the data analysis protocol could be reused by the reader. That inconsistent and patchy reporting obfuscates reproducibility is a given. However even basic understanding and reuses of protocols are hampered by the low level of detail supplied in the data analysis sections of the studies that we reviewed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Amathieu, R., et al. (2011). Metabolomic approach by 1H NMR spectroscopy of serum for the assessment of chronic liver failure in patients with cirrhosis. Journal of Proteome Research, 10(7), 3239ā€“3245.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Asiago, V. M., et al. (2010). Early detection of recurrent breast cancer using metabolite profiling. Cancer Research, 70(21), 8309ā€“8318.

    ArticleĀ  CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Bertini, I., et al. (2012). Metabolomic NMR fingerprinting to identify and predict survival of patients with metastatic colorectal cancer. Cancer Research, 72(1), 356ā€“364.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Boulesteix, A.-L. H., Hornung, R., & Sauerbrei, W. (2017). On fishing for significance and statisticianā€™sdegree of freedom in the era of big molecular data. In M. Ott, W. Pietsch & J. Wernecke (Eds.), Berechenbarkeit der Welt? Philosophie und Wissenschaft im Zeitalter von Big Data. Wiesbaden: Springer.

    Google ScholarĀ 

  • Braaksma, M., et al. (2009). The effect of environmental conditions on extracellular protease activity in controlled fermentations of Aspergillus niger. Microbiology, 155(Pt 10), 3430ā€“3439.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Brazma, A., et al. (2001). Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Genetics, 29(4), 365ā€“371.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Brereton, R. G., & Lloyd, G. R. (2014). Partial least squares discriminant analysis: Taking the magic away. Journal of Chemometrics, 28(4), 213ā€“225.

    ArticleĀ  CASĀ  Google ScholarĀ 

  • Chiarugi, A., et al. (2012). The NAD metabolome [mdash] a key determinant of cancer cell biology. Nature Reviews Cancer, 12(11), 741ā€“752.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Dunn, W. B., et al. (2017). Quality assurance and quality control processes: Summary of a metabolomics community questionnaire. Metabolomics, 13(5), 50.

    ArticleĀ  Google ScholarĀ 

  • Dupuy, A., & Simon, R. M. (2007). Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. Journal of the National Cancer Institute, 99(2), 147ā€“157.

    ArticleĀ  PubMedĀ  Google ScholarĀ 

  • Dutta, M., et al. (2012). A metabonomics approach as a means for identification of potential biomarkers for early diagnosis of endometriosis. Molecular BioSystems, 8(12), 3281ā€“3287.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Farshidfar, F., et al. (2012). Serum metabolomic profile as a means to distinguish stage of colorectal cancer. Genome Medicine, 4(5), 42.

    ArticleĀ  CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Fiehn, O., et al. (2007). Minimum reporting standards for plant biology context information in metabolomic studies. Metabolomics, 3(3), 195ā€“201.

    ArticleĀ  CASĀ  Google ScholarĀ 

  • Fiehn, O., et al. (2007). The metabolomics standards initiative (MSI). Metabolomics, 3(3), 175ā€“178.

    ArticleĀ  CASĀ  Google ScholarĀ 

  • Freedman, L. P., Cockburn, I. M., & Simcoe, T. S. (2015). The economics of reproducibility in preclinical research. PLoS Biology, 13(6), e1002165.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Glasziou, P., et al. (2014). Reducing waste from incomplete or unusable reports of biomedical research. The Lancet, 383(9913), 267ā€“276.

    ArticleĀ  Google ScholarĀ 

  • Godzien, J., et al. (2013). From numbers to a biological sense: How the strategy chosen for metabolomics data treatment may affect final results. A practical example based on urine fingerprints obtained by LC-MS. Electrophoresis 34, 2812ā€“2826.

    CASĀ  PubMedĀ  Google ScholarĀ 

  • Golbraikh, A., & Tropsha, A. (2002). Beware of q2! Journal of Molecular Graphics and Modelling, 20(4), 269ā€“276.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Goodacre, R., et al. (2007). Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics, 3(3), 231ā€“241.

    ArticleĀ  CASĀ  Google ScholarĀ 

  • Greenberg, N., et al. (2009). A proposed metabolic strategy for monitoring disease progression in Alzheimerā€™s disease. Electrophoresis, 30(7), 1235ā€“1239.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Griffin, J. L., et al. (2007). Standard reporting requirements for biological samples in metabolomics experiments: Mammalian/in vivo experiments. Metabolomics, 3(3), 179ā€“188.

    ArticleĀ  CASĀ  Google ScholarĀ 

  • Gromski, P. S., et al. (2015). A tutorial review: Metabolomics and partial least squares-discriminant analysisā€”A marriage of convenience or a shotgun wedding. Analytica Chimica Acta, 879, 10ā€“23.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Guan, W., et al. (2009). Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines. BMC Bioinformatics, 10, 259.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Hori, S., et al. (2011). A metabolomic approach to lung cancer. Lung Cancer, 74(2), 284ā€“292.

    ArticleĀ  PubMedĀ  Google ScholarĀ 

  • Hrydziuszko, O., & Viant, M. R. (2012). Missing values in mass spectrometry based metabolomics: An undervalued step in the data processing pipeline. Metabolomics, 8(1), 161ā€“174.

    ArticleĀ  CASĀ  Google ScholarĀ 

  • Jiang, Z., et al. (2011). A metabonomic approach applied to predict patients with cerebral infarction. Talanta, 84(2), 298ā€“304.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Jin, H., et al. (2014). Serum metabolomic signatures of lymph node metastasis of esophageal squamous cell carcinoma. Journal of Proteome Research, 13(9), 4091ā€“4103.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Jobard, E., et al. (2014). A serum nuclear magnetic resonance-based metabolomic signature of advanced metastatic human breast cancer. Cancer Letters, 343(1), 33ā€“41.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Johansen, K. K., et al. (2009). Metabolomic profiling in LRRK2-related Parkinsonā€™s disease. PLoS ONE, 4(10), e7551.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Lin, L., et al. (2010). Direct infusion mass spectrometry or liquid chromatography mass spectrometry for human metabonomics? A serum metabonomic study of kidney cancer. Analyst, 135(11), 2970ā€“2978.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Liu, Y., et al. (2014). NMR and LC/MS-based global metabolomics to identify serum biomarkers differentiating hepatocellular carcinoma from liver cirrhosis. International Journal of Cancer, 135(3), 658ā€“668.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Lu, Y., et al. (2012). Serum metabolomics for the diagnosis and classification of myasthenia gravis. Metabolomics, 8(4), 704ā€“713.

    ArticleĀ  CASĀ  Google ScholarĀ 

  • Mallett, S., et al. (2010). Reporting methods in studies developing prognostic models in cancer: A review. BMC Medicine, 8(1), 20.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • McShane, L. M., et al. (2005). REporting recommendations for tumour MARKer prognostic studies (REMARK). British Journal of Cancer, 93(4), 387ā€“391.

    ArticleĀ  CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Metz, C.E. (2011). Metz ROC software at the University of Chicago.

  • Michell, A. W., et al. (2008). Metabolomic analysis of urine and serum in Parkinsonā€™s disease. Metabolomics, 4(3), 191.

    ArticleĀ  CASĀ  Google ScholarĀ 

  • Mickiewicz, B., et al. (2013). Metabolomics as a novel approach for early diagnosis of pediatric septic shock and its mortality. American Journal of Respiratory and Critical Care Medicine, 187(9), 967ā€“976.

    ArticleĀ  CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Morrison, N., et al. (2007). Standard reporting requirements for biological samples in metabolomics experiments: Environmental context. Metabolomics, 3(3), 203ā€“210.

    ArticleĀ  CASĀ  Google ScholarĀ 

  • Mousavi, M., et al. (2014). Serum metabolomic biomarkers of dementia. Dementia and Geriatric Cognitive Disorders Extra, 4(2), 252ā€“262.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Osborn, M. P., et al. (2013). Metabolome-wide association study of neovascular age-related macular degeneration. PLoS ONE, 8(8), e72737.

    ArticleĀ  CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Ouyang, X., et al. (2011). 1H NMR-based metabolomic study of metabolic profiling for systemic lupus erythematosus. Lupus, 20(13), 1411ā€“1420.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Peng, R. (2015). The reproducibility crisis in science: A statistical counterattack. Significance, 12(3), 30ā€“32.

    ArticleĀ  Google ScholarĀ 

  • Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226ā€“1227.

    ArticleĀ  CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Peng, R. D., Dominici, F., & Zeger, S. L. (2006). Reproducible epidemiologic research. American Journal of Epidemiology, 163(9), 783ā€“789.

    ArticleĀ  PubMedĀ  Google ScholarĀ 

  • R Development Core Team. (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna.

    Google ScholarĀ 

  • Roberts, L. D., Koulman, A., & Griffin, J. L. (2014). Towards metabolic biomarkers of insulin resistance and type 2 diabetes: Progress from the metabolome. The Lancet Diabetes & Endocrinology, 2(1), 65ā€“75.

    ArticleĀ  CASĀ  Google ScholarĀ 

  • Roede, J. R., et al. (2013). Serum metabolomics of slow vs. rapid motor progression Parkinsonā€™s disease: A pilot study. PLoS ONE, 8(10), e77629.

    ArticleĀ  CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Rubtsov, D. V., et al. (2007). Proposed reporting requirements for the description of NMR-based metabolomics experiments. Metabolomics, 3(3), 223ā€“229.

    ArticleĀ  CASĀ  Google ScholarĀ 

  • Salek, R. M., et al. (2015). COordination of Standards in MetabOlomicS (COSMOS): Facilitating integrated metabolomics data access. Metabolomics, 11(6), 1587ā€“1597.

    ArticleĀ  CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Sangster, T., et al. (2006). A pragmatic and readily implemented quality control strategy for HPLC-MS and GC-MS-based metabonomic analysis. Analyst, 131(10), 1075ā€“1078.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Shah, J. S., Brock, G. N., & Rai, S. N. (2015). Metabolomics data analysis and missing value issues with application to infarcted mouse hearts. BMC Bioinformatics, 16(Suppl 15), P16ā€“P16.

    ArticleĀ  PubMed CentralĀ  Google ScholarĀ 

  • Spicer, R., Salek, R., & Steinbeck, C. (2017). Compliance with minimum information guidelines in public metabolomics repositories. Scientific Data, 4, 17137.

    Google ScholarĀ 

  • Steinbeck, C., et al. (2012). MetaboLights: Towards a new COSMOS of metabolomics data management. Metabolomics, 8(5), 757ā€“760.

    ArticleĀ  CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Sumner, L. W., et al. (2007). Proposed minimum reporting standards for chemical analysis. Metabolomics, 3(3), 211ā€“221.

    ArticleĀ  CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Taylor, C. F., et al. (2007). The minimum information about a proteomics experiment (MIAPE). Nature Biotechnology, 25(8), 887ā€“893.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • van den Berg, R. A., et al. (2006). Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genomics, 7, 142ā€“142.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Vinaixa, M., et al. (2012). A guideline to univariate statistical analysis for LC/MS-based untargeted metabolomics-derived data. Metabolites, 2(4), 775ā€“795.

    ArticleĀ  CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Walsh, B. H., et al. (2012). The metabolomic profile of umbilical cord blood in neonatal hypoxic ischaemic encephalopathy. PLoS ONE, 7(12), e50520.

    ArticleĀ  CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Wang, J., et al. (2013). Metabolomic identification of diagnostic plasma biomarkers in humans with chronic heart failure. Molecular BioSystems, 9(11), 2618ā€“2626.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Wei, C., et al. (2012). A metabonomics study of epilepsy in patients using gas chromatography coupled with mass spectrometry. Molecular Biosystems, 8(8), 2197ā€“2204.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Weiner, J., et al. (2012). Biomarkers of inflammation, immunosuppression and stress with active disease are revealed by metabolomic profiling of tuberculosis patients. PLoS ONE, 7(7), e40221.

    ArticleĀ  CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Williams, H. R. T., et al. (2012). Serum metabolic profiling in inflammatory bowel disease. Digestive Diseases and Sciences, 57(8), 2157ā€“2165.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Wishart, D. S. (2016). Emerging applications of metabolomics in drug discovery and precision medicine. Nature Reviews Drug Discovery, 15(7), 473ā€“484.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Xia, J., et al. (2009). MetaboAnalyst: A web server for metabolomic data analysis and interpretation. Nucleic Acids Research, 37(Web Server issue), W652ā€“W660.

    ArticleĀ  CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Xia, J., et al. (2013). Translational biomarker discovery in clinical metabolomics: An introductory tutorial. Metabolomics, 9(2), 280ā€“299.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  • Young, S. P., et al. (2013). The impact of inflammation on metabolomic profiles in patients with arthritis. Arthritis and Rheumatism, 65(8), 2015ā€“2023.

    ArticleĀ  CASĀ  PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  • Zang, X., et al. (2014). Feasibility of detecting prostate cancer by ultraperformance liquid chromatography-mass spectrometry serum metabolomics. Journal of Proteome Research, 13(7), 3444ā€“3454.

    ArticleĀ  CASĀ  PubMedĀ  Google ScholarĀ 

Download references

Acknowledgements

This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. C. Considine.

Ethics declarations

Conflict of interest

We declare no competing financial interests.

Appendix

Appendix

See Figs.Ā 1, 2, 3 and 4.

Fig. 1
figure 1

Flow of included studies on disease prediction from serum metabolomics

Fig. 2
figure 2

Overview of reproducibility, readability and the clarity of the workflow pipeline of the overall data analysis in the studies reviewed. Green: detail is reported; red: detail is not reported; blue: counts of method/program/algorithm/performance metric used. *All code and data are available upon request but the definition of complete reproducibility is the availability of linked and executable code, so this study is not fully reproducible

Fig. 3
figure 3

Reporting of pretreatment steps employed in the studies reviewed. Green: detail is reported; red: detail is not reported; blue: counts of method/program/algorithm/performance metric used

Fig. 4
figure 4

Completeness of reporting of supervised analysis steps and counts of the algorithms, performance metrics and validation methods employed. Green: detail is reported; red: detail is not reported; blue: counts of method/program/algorithm/performance metric used

See TablesĀ 1, 2, 3, 4, 5, 6, 7 and 8.

Table 1 General descriptive characteristics of studies reviewed
Table 2 Expected outcome of studies under review
Table 3 Overview of reproducibility or readability of data analysis steps of studies in this review
Table 4 Packages reported used or code used for analysis
Table 5 Pre-treatment phase details reported
Table 6 Univariate analysis details reported
Table 7 Unsupervised analysis details reported
Table 8 Supervised analysis details reported

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Considine, E.C., Thomas, G., Boulesteix, A.L. et al. Critical review of reporting of the data analysis step in metabolomics. Metabolomics 14, 7 (2018). https://doi.org/10.1007/s11306-017-1299-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11306-017-1299-3

Keywords

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy