Abstract
Introduction
We present the first study to critically appraise the quality of reporting of the data analysis step in metabolomics studies since the publication of minimum reporting guidelines in 2007.
Objectives
The aim of this study was to assess the standard of reporting of the data analysis step in metabolomics biomarker discovery studies and to investigate whether the level of detail supplied allows basic understanding of the steps employed and/or reuse of the protocol. For the purposes of this review we define the data analysis step to include the data pretreatment step and the actual data analysis step, which covers algorithm selection, univariate analysis and multivariate analysis.
Method
We reviewed the literature to identify metabolomic studies of biomarker discovery that were published between January 2008 and December 2014. Studies were examined for completeness in reporting the various steps of the data pretreatment phase and data analysis phase and also for clarity of the workflow of these sections.
Results
We analysed 27 papers, published anytime in 2008 until the end of 2014 in the area or biomarker discovery in serum metabolomics. The results of this review showed that the data analysis step in metabolomics biomarker discovery studies is plagued by unclear and incomplete reporting. Major omissions and lack of logical flow render the data analysisā workflows in these studies impossible to follow and therefore replicate or even imitate.
Conclusions
While we await the holy grail of computational reproducibility in data analysis to become standard, we propose that, at a minimum, the data analysis section of metabolomics studies should be readable and interpretable without omissions such that a data analysis workflow diagram could be extrapolated from the study and therefore the data analysis protocol could be reused by the reader. That inconsistent and patchy reporting obfuscates reproducibility is a given. However even basic understanding and reuses of protocols are hampered by the low level of detail supplied in the data analysis sections of the studies that we reviewed.
Similar content being viewed by others
References
Amathieu, R., et al. (2011). Metabolomic approach by 1H NMR spectroscopy of serum for the assessment of chronic liver failure in patients with cirrhosis. Journal of Proteome Research, 10(7), 3239ā3245.
Asiago, V. M., et al. (2010). Early detection of recurrent breast cancer using metabolite profiling. Cancer Research, 70(21), 8309ā8318.
Bertini, I., et al. (2012). Metabolomic NMR fingerprinting to identify and predict survival of patients with metastatic colorectal cancer. Cancer Research, 72(1), 356ā364.
Boulesteix, A.-L. H., Hornung, R., & Sauerbrei, W. (2017). On fishing for significance and statisticianāsdegree of freedom in the era of big molecular data. In M. Ott, W. Pietsch & J. Wernecke (Eds.), Berechenbarkeit der Welt? Philosophie und Wissenschaft im Zeitalter von Big Data. Wiesbaden: Springer.
Braaksma, M., et al. (2009). The effect of environmental conditions on extracellular protease activity in controlled fermentations of Aspergillus niger. Microbiology, 155(Pt 10), 3430ā3439.
Brazma, A., et al. (2001). Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Genetics, 29(4), 365ā371.
Brereton, R. G., & Lloyd, G. R. (2014). Partial least squares discriminant analysis: Taking the magic away. Journal of Chemometrics, 28(4), 213ā225.
Chiarugi, A., et al. (2012). The NAD metabolome [mdash] a key determinant of cancer cell biology. Nature Reviews Cancer, 12(11), 741ā752.
Dunn, W. B., et al. (2017). Quality assurance and quality control processes: Summary of a metabolomics community questionnaire. Metabolomics, 13(5), 50.
Dupuy, A., & Simon, R. M. (2007). Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. Journal of the National Cancer Institute, 99(2), 147ā157.
Dutta, M., et al. (2012). A metabonomics approach as a means for identification of potential biomarkers for early diagnosis of endometriosis. Molecular BioSystems, 8(12), 3281ā3287.
Farshidfar, F., et al. (2012). Serum metabolomic profile as a means to distinguish stage of colorectal cancer. Genome Medicine, 4(5), 42.
Fiehn, O., et al. (2007). Minimum reporting standards for plant biology context information in metabolomic studies. Metabolomics, 3(3), 195ā201.
Fiehn, O., et al. (2007). The metabolomics standards initiative (MSI). Metabolomics, 3(3), 175ā178.
Freedman, L. P., Cockburn, I. M., & Simcoe, T. S. (2015). The economics of reproducibility in preclinical research. PLoS Biology, 13(6), e1002165.
Glasziou, P., et al. (2014). Reducing waste from incomplete or unusable reports of biomedical research. The Lancet, 383(9913), 267ā276.
Godzien, J., et al. (2013). From numbers to a biological sense: How the strategy chosen for metabolomics data treatment may affect final results. A practical example based on urine fingerprints obtained by LC-MS. Electrophoresis 34, 2812ā2826.
Golbraikh, A., & Tropsha, A. (2002). Beware of q2! Journal of Molecular Graphics and Modelling, 20(4), 269ā276.
Goodacre, R., et al. (2007). Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics, 3(3), 231ā241.
Greenberg, N., et al. (2009). A proposed metabolic strategy for monitoring disease progression in Alzheimerās disease. Electrophoresis, 30(7), 1235ā1239.
Griffin, J. L., et al. (2007). Standard reporting requirements for biological samples in metabolomics experiments: Mammalian/in vivo experiments. Metabolomics, 3(3), 179ā188.
Gromski, P. S., et al. (2015). A tutorial review: Metabolomics and partial least squares-discriminant analysisāA marriage of convenience or a shotgun wedding. Analytica Chimica Acta, 879, 10ā23.
Guan, W., et al. (2009). Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines. BMC Bioinformatics, 10, 259.
Hori, S., et al. (2011). A metabolomic approach to lung cancer. Lung Cancer, 74(2), 284ā292.
Hrydziuszko, O., & Viant, M. R. (2012). Missing values in mass spectrometry based metabolomics: An undervalued step in the data processing pipeline. Metabolomics, 8(1), 161ā174.
Jiang, Z., et al. (2011). A metabonomic approach applied to predict patients with cerebral infarction. Talanta, 84(2), 298ā304.
Jin, H., et al. (2014). Serum metabolomic signatures of lymph node metastasis of esophageal squamous cell carcinoma. Journal of Proteome Research, 13(9), 4091ā4103.
Jobard, E., et al. (2014). A serum nuclear magnetic resonance-based metabolomic signature of advanced metastatic human breast cancer. Cancer Letters, 343(1), 33ā41.
Johansen, K. K., et al. (2009). Metabolomic profiling in LRRK2-related Parkinsonās disease. PLoS ONE, 4(10), e7551.
Lin, L., et al. (2010). Direct infusion mass spectrometry or liquid chromatography mass spectrometry for human metabonomics? A serum metabonomic study of kidney cancer. Analyst, 135(11), 2970ā2978.
Liu, Y., et al. (2014). NMR and LC/MS-based global metabolomics to identify serum biomarkers differentiating hepatocellular carcinoma from liver cirrhosis. International Journal of Cancer, 135(3), 658ā668.
Lu, Y., et al. (2012). Serum metabolomics for the diagnosis and classification of myasthenia gravis. Metabolomics, 8(4), 704ā713.
Mallett, S., et al. (2010). Reporting methods in studies developing prognostic models in cancer: A review. BMC Medicine, 8(1), 20.
McShane, L. M., et al. (2005). REporting recommendations for tumour MARKer prognostic studies (REMARK). British Journal of Cancer, 93(4), 387ā391.
Metz, C.E. (2011). Metz ROC software at the University of Chicago.
Michell, A. W., et al. (2008). Metabolomic analysis of urine and serum in Parkinsonās disease. Metabolomics, 4(3), 191.
Mickiewicz, B., et al. (2013). Metabolomics as a novel approach for early diagnosis of pediatric septic shock and its mortality. American Journal of Respiratory and Critical Care Medicine, 187(9), 967ā976.
Morrison, N., et al. (2007). Standard reporting requirements for biological samples in metabolomics experiments: Environmental context. Metabolomics, 3(3), 203ā210.
Mousavi, M., et al. (2014). Serum metabolomic biomarkers of dementia. Dementia and Geriatric Cognitive Disorders Extra, 4(2), 252ā262.
Osborn, M. P., et al. (2013). Metabolome-wide association study of neovascular age-related macular degeneration. PLoS ONE, 8(8), e72737.
Ouyang, X., et al. (2011). 1H NMR-based metabolomic study of metabolic profiling for systemic lupus erythematosus. Lupus, 20(13), 1411ā1420.
Peng, R. (2015). The reproducibility crisis in science: A statistical counterattack. Significance, 12(3), 30ā32.
Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226ā1227.
Peng, R. D., Dominici, F., & Zeger, S. L. (2006). Reproducible epidemiologic research. American Journal of Epidemiology, 163(9), 783ā789.
R Development Core Team. (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna.
Roberts, L. D., Koulman, A., & Griffin, J. L. (2014). Towards metabolic biomarkers of insulin resistance and type 2 diabetes: Progress from the metabolome. The Lancet Diabetes & Endocrinology, 2(1), 65ā75.
Roede, J. R., et al. (2013). Serum metabolomics of slow vs. rapid motor progression Parkinsonās disease: A pilot study. PLoS ONE, 8(10), e77629.
Rubtsov, D. V., et al. (2007). Proposed reporting requirements for the description of NMR-based metabolomics experiments. Metabolomics, 3(3), 223ā229.
Salek, R. M., et al. (2015). COordination of Standards in MetabOlomicS (COSMOS): Facilitating integrated metabolomics data access. Metabolomics, 11(6), 1587ā1597.
Sangster, T., et al. (2006). A pragmatic and readily implemented quality control strategy for HPLC-MS and GC-MS-based metabonomic analysis. Analyst, 131(10), 1075ā1078.
Shah, J. S., Brock, G. N., & Rai, S. N. (2015). Metabolomics data analysis and missing value issues with application to infarcted mouse hearts. BMC Bioinformatics, 16(Suppl 15), P16āP16.
Spicer, R., Salek, R., & Steinbeck, C. (2017). Compliance with minimum information guidelines in public metabolomics repositories. Scientific Data, 4, 17137.
Steinbeck, C., et al. (2012). MetaboLights: Towards a new COSMOS of metabolomics data management. Metabolomics, 8(5), 757ā760.
Sumner, L. W., et al. (2007). Proposed minimum reporting standards for chemical analysis. Metabolomics, 3(3), 211ā221.
Taylor, C. F., et al. (2007). The minimum information about a proteomics experiment (MIAPE). Nature Biotechnology, 25(8), 887ā893.
van den Berg, R. A., et al. (2006). Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genomics, 7, 142ā142.
Vinaixa, M., et al. (2012). A guideline to univariate statistical analysis for LC/MS-based untargeted metabolomics-derived data. Metabolites, 2(4), 775ā795.
Walsh, B. H., et al. (2012). The metabolomic profile of umbilical cord blood in neonatal hypoxic ischaemic encephalopathy. PLoS ONE, 7(12), e50520.
Wang, J., et al. (2013). Metabolomic identification of diagnostic plasma biomarkers in humans with chronic heart failure. Molecular BioSystems, 9(11), 2618ā2626.
Wei, C., et al. (2012). A metabonomics study of epilepsy in patients using gas chromatography coupled with mass spectrometry. Molecular Biosystems, 8(8), 2197ā2204.
Weiner, J., et al. (2012). Biomarkers of inflammation, immunosuppression and stress with active disease are revealed by metabolomic profiling of tuberculosis patients. PLoS ONE, 7(7), e40221.
Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.
Williams, H. R. T., et al. (2012). Serum metabolic profiling in inflammatory bowel disease. Digestive Diseases and Sciences, 57(8), 2157ā2165.
Wishart, D. S. (2016). Emerging applications of metabolomics in drug discovery and precision medicine. Nature Reviews Drug Discovery, 15(7), 473ā484.
Xia, J., et al. (2009). MetaboAnalyst: A web server for metabolomic data analysis and interpretation. Nucleic Acids Research, 37(Web Server issue), W652āW660.
Xia, J., et al. (2013). Translational biomarker discovery in clinical metabolomics: An introductory tutorial. Metabolomics, 9(2), 280ā299.
Young, S. P., et al. (2013). The impact of inflammation on metabolomic profiles in patients with arthritis. Arthritis and Rheumatism, 65(8), 2015ā2023.
Zang, X., et al. (2014). Feasibility of detecting prostate cancer by ultraperformance liquid chromatography-mass spectrometry serum metabolomics. Journal of Proteome Research, 13(7), 3444ā3454.
Acknowledgements
This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare no competing financial interests.
Appendix
Appendix
Overview of reproducibility, readability and the clarity of the workflow pipeline of the overall data analysis in the studies reviewed. Green: detail is reported; red: detail is not reported; blue: counts of method/program/algorithm/performance metric used. *All code and data are available upon request but the definition of complete reproducibility is the availability of linked and executable code, so this study is not fully reproducible
See TablesĀ 1, 2, 3, 4, 5, 6, 7 and 8.
Rights and permissions
About this article
Cite this article
Considine, E.C., Thomas, G., Boulesteix, A.L. et al. Critical review of reporting of the data analysis step in metabolomics. Metabolomics 14, 7 (2018). https://doi.org/10.1007/s11306-017-1299-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11306-017-1299-3