Abstract
The present chapter describes basic aspects of the main steps for data processing on mass spectrometry-based metabolomics platforms, focusing on the main objectives and important considerations of each step. Initially, an overview of metabolomics and the pivotal techniques applied in the field are presented. Important features of data acquisition and preprocessing such as data compression, noise filtering, and baseline correction are revised focusing on practical aspects. Peak detection, deconvolution, and alignment as well as missing values are also discussed. Special attention is given to chemical and mathematical normalization approaches and the role of the quality control (QC) samples. Methods for uni- and multivariate statistical analysis and data pretreatment that could impact them are reviewed, emphasizing the most widely used multivariate methods, i.e., principal components analysis (PCA), partial least squares-discriminant analysis (PLS-DA), orthogonal partial least square-discriminant analysis (OPLS-DA), and hierarchical cluster analysis (HCA). Criteria for model validation and softwares used in data processing were also approached. The chapter ends with some concerns about the minimal requirements to report metadata in metabolomics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Abbreviations
- ANN:
-
Artificial Neural Network
- CE-MS:
-
Capillary Electrophoresis-Mass Spectrometry
- COW:
-
Correlation-Optimized Warping
- CV:
-
Coefficient of Variation
- DI FT-ICR MS:
-
Direct-Infusion Fourier-Transform Ion-Cyclotron-Resonance Mass Spectrometry
- DTW:
-
Dynamic Time Warping
- GA:
-
Genetic Algorithm
- GC-MS:
-
Gas Chromatography-Mass Spectrometry
- HCA:
-
Hierarchical Cluster Analysis
- HILIC:
-
Hydrophilic Interaction Chromatography
- IC:
-
Intensity Count
- kNN:
-
k-Nearest Neighbors
- LC-MS:
-
Liquid Chromatography-Mass Spectrometry
- LDA:
-
Linear Discriminant Analysis
- LOESS:
-
Lowest Point of Smoothed Spectrum
- MS:
-
Mass Spectrometry
- NMR:
-
Nuclear Magnetic Resonance
- NOMIS:
-
Normalization Using the Optimal Selection of Multiple Internal Standards
- OPLS-DA:
-
Orthogonal Partial Least-Square Discriminant Analysis
- PARAFAC:
-
Parallel Factor Analysis
- PC:
-
Principal Component
- PCA:
-
Principal Components Analysis
- PLS-DA:
-
Partial Least Squares-Discriminant Analysis
- PQN:
-
Probabilistic Quotient Normalization
- PTW:
-
Parametric Time Warping
- QCs:
-
Quality Control Samples
- RAFFT:
-
Rapid Fast Fourier Transform
- RF:
-
Random Forest
- ROC:
-
Receiver Operating Characteristic Curve
- ROI:
-
Region of Interest
- S/N:
-
Signal-Noise Ratio
- SIMCA:
-
Soft Independent Modeling of Class Analogy
- SOM:
-
Self-Organization Map
- SVM:
-
Support Vector Machine
- TOF-MS:
-
Time of Flight-Mass Spectrometry
- XIC:
-
Extracted Ion Chromatogram
References
Boccard J, Rudaz S (2019) Analysis of metabolomics data—a chemometrics perspective. In: Reference module in chemistry, molecular sciences and chemical engineering. Elsevier, Amsterdam, Netherlands, pp 1–23
Hendriks MMWB, van Eeuwijk FA, Jellema RH et al (2011) Data-processing strategies for metabolomics studies. TrAC – Trends Anal Chem 30:1685–1698. https://doi.org/10.1016/j.trac.2011.04.019
Madsen R, Lundstedt T, Trygg J (2010) Chemometrics in metabolomics – a review in human disease diagnosis. Anal Chim Acta 659:23–33. https://doi.org/10.1016/j.aca.2009.11.042
Shree M, Lingwan M, Masakapalli SK (2019) Metabolite profiling and metabolomics of plant systems using 1H NMR and GC-MS. In: Banerjee R, Kumar GV, Kumar SPJ (eds) OMICS-based approaches in plant biotechnology. John Wiley & Sons, Inc., Hoboken, pp 129–144
Duan L-X, Qi X (2015) Metabolite qualitative methods and the introduction of metabolomics database. In: Qi X, Chen X, Wang Y (eds) Plant metabolomics. Springer Netherlands, Dordrecht, pp 171–193
Tang J (2011) Microbial Metabolomics. Curr Genomics 12:391–403. https://doi.org/10.2174/138920211797248619
Belinato J, Bazioli J, Sussulini A et al (2019) Metabolômica microbiana: inovações e aplicações. Quim Nova 42:546–559. https://doi.org/10.21577/0100-4042.20170324
Dixon RA, Gang DR, Charlton AJ et al (2006) Applications of metabolomics in agriculture. J Agric Food Chem 54:8984–8994. https://doi.org/10.1021/jf061218t
Emwas A-H, Roy R, McKay RT et al (2019) NMR spectroscopy for metabolomics research. Meta 9:123. https://doi.org/10.3390/metabo9070123
Wishart DS (2019) NMR metabolomics: a look ahead. J Magn Reson 306:155–161. https://doi.org/10.1016/j.jmr.2019.07.013
Theodoridis GA, Gika HG, Want EJ, Wilson ID (2012) Liquid chromatography-mass spectrometry based global metabolite profiling: a review. Anal Chim Acta 711:7–16. https://doi.org/10.1016/j.aca.2011.09.042
Ramautar R, Somsen GW, de Jong GJ (2019) CE-MS for metabolomics: developments and applications in the period 2016–2018. Electrophoresis 40:165–179. https://doi.org/10.1002/elps.201800323
Buzatto AZ, de Sousa AC, Guedes SF et al (2014) Metabolomic investigation of human diseases biomarkers by CE and LC coupled to MS. Electrophoresis 35:1285–1307. https://doi.org/10.1002/elps.201300470
Tang H-Y, Chiu DT, Lin J-F et al (2017) Disturbance of plasma lipid metabolic profile in Guillain-Barre syndrome. Sci Rep 7:8140. https://doi.org/10.1038/s41598-017-08338-7
Marshall DD, Powers R (2017) Beyond the paradigm: combining mass spectrometry and nuclear magnetic resonance for metabolomics. Prog Nucl Magn Reson Spectrosc 100:1–16. https://doi.org/10.1016/j.pnmrs.2017.01.001
Canuto G, Costa JL, Cruz P et al (2017) Metabolômica: definições, estado-da-arte e aplicações representativas. Quim Nova 41:75–91. https://doi.org/10.21577/0100-4042.20170134
Tang D-Q, Zou L, Yin X-X, Ong CN (2016) HILIC-MS for metabolomics: an attractive and complementary approach to RPLC-MS. Mass Spectrom Rev 35:574–600. https://doi.org/10.1002/mas.21445
Karaman I (2017) Preprocessing and pretreatment of metabolomics data for statistical analysis. In: Sussulini A (ed) Metabolomics: from fundamentals to clinical applications. Springer International Publishing, Cham, pp 145–161
Katajamaa M, Orešič M (2007) Data processing for mass spectrometry-based metabolomics. J Chromatogr A 1158:318–328. https://doi.org/10.1016/j.chroma.2007.04.021
Euceda LR, Giskeodegård GF, Bathen TF (2015) Preprocessing of NMR metabolomics data. Scand J Clin Lab Invest 75:193–203. https://doi.org/10.3109/00365513.2014.1003593
Veltri P (2008) Algorithms and tools for analysis and management of mass spectrometry data. Brief Bioinform 9:144–155. https://doi.org/10.1093/bib/bbn007
Stolt R, Torgrip RJO, Lindberg J et al (2006) Second-order peak detection for multicomponent high-resolution LC/MS data. Anal Chem 78:975–983. https://doi.org/10.1021/ac050980b
Gorrochategui E, Jaumot J, Lacorte S, Tauler R (2016) Data analysis strategies for targeted and untargeted LC-MS metabolomic studies: overview and workflow. TrAC – Trends Anal Chem 82:425–442. https://doi.org/10.1016/j.trac.2016.07.004
Chambers MC, Maclean B, Burke R et al (2012) A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol 30:918–920
Yi L, Dong N, Yun Y et al (2016) Chemometric methods in data processing of mass spectrometry-based metabolomics: a review. Anal Chim Acta 914:17–34. https://doi.org/10.1016/j.aca.2016.02.001
Tian H, Li B, Shui G (2017) Untargeted LC–MS data preprocessing in metabolomics. J Anal Test 1:187–192. https://doi.org/10.1007/s41664-017-0030-8
Rowlands C, Elliott S (2011) Automated algorithm for baseline subtraction in spectra. J Raman Spectrosc 42:363–369. https://doi.org/10.1002/jrs.2691
Eliasson M, Rannar S, Trygg J (2011) From data processing to multivariate validation - essential steps in extracting interpretable information from metabolomics data. Curr Pharm Biotechnol 12:996–1004. https://doi.org/10.2174/138920111795909041
Hermansson M, Uphoff A, Käkelä R, Somerharju P (2005) Automated quantitative analysis of complex lipidomes by liquid chromatography/mass spectrometry. Anal Chem 77:2166–2175. https://doi.org/10.1021/ac048489s
Tsugawa H, Cajka T, Kind T et al (2015) MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nat Methods 12:523–526. https://doi.org/10.1038/nmeth.3393
Março PH, Valderrama P, Alexandrino GL et al (2014) Multivariate curve resolution with alternating least squares: description, operation and aplications. Quim Nova 37:1525–1532. https://doi.org/10.5935/0100-4042.20140205
Chen T, Dai R (2015) Metabolomic data processing based on mass spectrometry platforms. In: Qi X, Chen X, Wang Y (eds) Plant metabolomics. Springer Netherlands, Dordrecht, pp 123–169
van Nederkassel AM, Daszykowski M, Eilers PHC, Vander HY (2006) A comparison of three algorithms for chromatograms alignment. J Chromatogr A 1118:199–210. https://doi.org/10.1016/j.chroma.2006.03.114
Mogollón NGS, de Lima PF, Gama MR et al (2014) State of the art two-dimensional liquid chromatography: fundamental concepts, instrumentation, and applications. Quim Nova 37:1680–1691. https://doi.org/10.5935/0100-4042.20140261
Zhang D, Huang X, Regnier FE, Zhang M (2008) Two-dimensional correlation optimized warping algorithm for aligning GCxGC-MS data. Anal Chem 80:2664–2671. https://doi.org/10.1021/ac7024317
Reinhold D, Pielke-Lombardo H, Jacobson S et al (2019) Pre-analytic considerations for mass spectrometry-based untargeted metabolomics data. In: D’Alessandro A (ed) High-throughput metabolomics: methods and protocols. Humana Press, New York, pp 323–340
Wei R, Wang J, Su M et al (2018) Missing value imputation approach for mass spectrometry-based metabolomics data. Sci Rep 8:1–10. https://doi.org/10.1038/s41598-017-19120-0
Hrydziuszko O, Viant MR (2012) Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline. Metabolomics 8:161–174. https://doi.org/10.1007/s11306-011-0366-4
Sysi-Aho M, Katajamaa M, Yetukuri L, Orešič M (2007) Normalization method for metabolomics data using optimal selection of multiple internal standards. BMC Bioinformatics 8:93. https://doi.org/10.1186/1471-2105-8-93
Cook T, Ma Y, Gamagedara S (2020) Evaluation of statistical techniques to normalize mass spectrometry-based urinary metabolomics data. J Pharm Biomed Anal 177:112854. https://doi.org/10.1016/j.jpba.2019.112854
Wu Y, Li L (2016) Sample normalization methods in quantitative metabolomics. J Chromatogr A 1430:80–95. https://doi.org/10.1016/j.chroma.2015.12.007
Dieterle F, Ross A, Schlotterbeck G, Senn H (2006) Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1 H NMR Metabonomics. Anal Chem 78:4281–4290. https://doi.org/10.1021/ac051632c
Lee J, Park J, Lim MS et al (2012) Quantile normalization approach for liquid chromatography- mass spectrometry-based metabolomic data from healthy human volunteers. Anal Sci 28:801–805. https://doi.org/10.2116/analsci.28.801
Ferreira MMC (2015) Quimiometria: conceitos, métodos e aplicações. Editora da Unicamp, Campinas
van den Berg RA, Hoefsloot HCJ, Westerhuis JA et al (2006) Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 7:1–15. https://doi.org/10.1186/1471-2164-7-142
Gougeon L, da Costa G, Guyon F, Richard T (2019) 1H NMR metabolomics applied to Bordeaux red wines. Food Chem 301:125257. https://doi.org/10.1016/j.foodchem.2019.125257
Kumar N, Bansal A, Sarma GS, Rawal RK (2014) Chemometrics tools used in analytical chemistry: an overview. Talanta 123:186–199. https://doi.org/10.1016/j.talanta.2014.02.003
Boccard J, Rudaz S (2014) Harnessing the complexity of metabolomic data with chemometrics. J Chemom 28:1–9. https://doi.org/10.1002/cem.2567
Brereton RG (2013) Chemometrics and statistics: multivariate classification techniques. Elsevier Inc., Oxford, UK
Bylesjö M (2015) Extracting meaningful information from Metabonomic data using multivariate statistics. In: Bjerrum JT (ed) Metabonomics: methods and protocols. Humana Press, New York, pp 137–146
Worley B, Powers R (2012) Multivariate analysis in metabolomics. Curr Metabol 1:92–107. https://doi.org/10.2174/2213235x130108
Pinto RC (2017) Chemometrics methods and strategies in metabolomics. In: Sussulini A (ed) Metabolomics: from fundamentals to clinical applications. Springer International Publishing, Cham, pp 163–190
Liu R, Zhang G, Sun M et al (2019) Integrating a generalized data analysis workflow with the single-probe mass spectrometry experiment for single cell metabolomics. Anal Chim Acta 1064:71–79. https://doi.org/10.1016/j.aca.2019.03.006
Ebbels TMD, Karaman I, Graça G (2019) Processing and analysis of untargeted multicohort NMR data. In: Gowda GAN, Raftery D (eds) NMR-based metabolomics: methods and protocols. Humana Press, New York, pp 453–470
Surowiec I, Johansson E, Stenlund H et al (2018) Quantification of run order effect on chromatography - mass spectrometry profiling data. J Chromatogr A 1568:229–234. https://doi.org/10.1016/j.chroma.2018.07.019
Peña-Bautista C, Roca M, Hervás D et al (2019) Plasma metabolomics in early Alzheimer’s disease patients diagnosed with amyloid biomarker. J Proteome 200:144–152. https://doi.org/10.1016/j.jprot.2019.04.008
Kantz ED, Tiwari S, Watrous JD et al (2019) Deep neural networks for classification of LC-MS spectral peaks. Anal Chem 91:12407–12413. https://doi.org/10.1021/acs.analchem.9b02983
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27:861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Want E, Masson P (2011) Processing and analysis of GC/LC-MS-based metabolomics data. In: Metz TO (ed) Metabolic profiling. Humana Press, New York, pp 277–298
Truntzer C, Ducoroy P (2017) Statistical approach for biomarker discovery using label-free LC-MS data: an overview. In: Datta S, Mertens BJA (eds) Statistical analysis of proteomics, metabolomics, and Lipidomics data using mass spectrometry. Springer International Publishing, Cham, pp 177–201
Blaženović I, Kind T, Ji J, Fiehn O (2018) Software tools and approaches for compound identification of LC-MS/MS data in metabolomics. Meta 8:31. https://doi.org/10.3390/metabo8020031
Baran R, Kochi H, Saito N et al (2006) MathDAMP: a package for differential analysis of metabolite profiles. BMC Bioinformatics 7:1–9. https://doi.org/10.1186/1471-2105-7-530
Agrawal S, Kumar S, Sehgal R et al (2019) El-MAVEN: a fast, robust, and user-friendly mass spectrometry data processing engine for metabolomics. In: D’Alessandro A (ed) High-throughput metabolomics: methods and protocols. Humana Press, New York, pp 301–321
Xia J, Psychogios N, Young N, Wishart DS (2009) MetaboAnalyst: a web server for metabolomic data analysis and interpretation. Nucleic Acids Res 37:W652–W660. https://doi.org/10.1093/nar/gkp356
Hao L, Zhu Y, Wei P et al (2019) Metandem: an online software tool for mass spectrometry-based isobaric labeling metabolomics. Anal Chim Acta 1088:99–106. https://doi.org/10.1016/j.aca.2019.08.046
Zhang W, Chang J, Lei Z et al (2014) MET-COFEA: a liquid chromatography/mass spectrometry data processing platform for metabolite compound feature extraction and annotation. Anal Chem 86:6245–6253. https://doi.org/10.1021/ac501162k
Broeckling CD, Reddy IR, Duran AL et al (2006) MET-IDEA: data extraction tool for mass spectrometry-based metabolomics. Anal Chem 78:4334–4341. https://doi.org/10.1021/ac0521596
Pluskal T, Castillo S, Villar-Briones A, Orešič M (2010) MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics 11. https://doi.org/10.1186/1471-2105-11-395
González-Ruiz V, Gagnebin Y, Drouin N et al (2018) ROMANCE: a new software tool to improve data robustness and feature identification in CE-MS metabolomics. Electrophoresis 39:1222–1232. https://doi.org/10.1002/elps.201700427
Liang YJ, Lin YT, Chen CW et al (2016) SMART: statistical metabolomics analysis – an R tool. Anal Chem 88:6334–6341. https://doi.org/10.1021/acs.analchem.6b00603
Luedemann A, von Malotky L, Erban A, Kopka J (2011) TagFinder: preprocessing software for the fingerprinting and the profiling of gas chromatography–mass spectrometry based metabolome analyses. In: Hardy NW, Hall RD (eds) Plant metabolomics: methods and protocols. Humana Press, New York, pp 255–286
Smith CA, Want EJ, O’Maille G et al (2006) XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem 78:779–787. https://doi.org/10.1021/ac051437y
Mahieu NG, Genenbacher JL, Patti GJ (2016) A roadmap for the XCMS family of software solutions in metabolomics. Curr Opin Chem Biol 30:87–93. https://doi.org/10.1016/j.cbpa.2015.11.009
Sumner LW, Amberg A, Barrett D et al (2007) Proposed minimum reporting standards for chemical analysis: Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI). Metabolomics 3:211–221. https://doi.org/10.1007/s11306-007-0082-2
Spicer RA, Salek R, Steinbeck C (2017) Comment: a decade after the metabolomics standards initiative it’s time for a revision. Sci Data 4:2–4. https://doi.org/10.1038/sdata.2017.138
Spicer RA, Salek R, Steinbeck C (2017) Compliance with minimum information guidelines in public metabolomics repositories. Sci Data 4:1–8. https://doi.org/10.1038/sdata.2017.137
Duarte GHB (2016) Metabolomics by LC-ESI-QTOF-MS in NOD/SCID mice under chemoterapy treatment: potential biomarkers of leukemia. Master’s thesis. Universidade Estadual de Campinas
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Zamora Obando, H.R., Duarte, G.H.B., Simionato, A.V.C. (2021). Metabolomics Data Treatment: Basic Directions of the Full Process. In: Colnaghi Simionato, A.V. (eds) Separation Techniques Applied to Omics Sciences. Advances in Experimental Medicine and Biology(), vol 1336. Springer, Cham. https://doi.org/10.1007/978-3-030-77252-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-77252-9_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77251-2
Online ISBN: 978-3-030-77252-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)