0% found this document useful (0 votes)
33 views50 pages

Statistical Analysis

Uploaded by

Marry FCC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views50 pages

Statistical Analysis

Uploaded by

Marry FCC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Statistical Analysis of

Metabolomics Data

Xiuxia Du
Department of Bioinformatics & Genomics
University of North Carolina at Charlotte
Outline

• Introduction
• Data pre-treatment
1. Normalization
2. Centering, scaling, transformation
• Univariate analysis
1. Student’s t-tes
2. Volcano plot
• Multivariate analysis
1. PCA
2. PLS-DA
• Machine learning
• Software packages
2  
Results from data processing

Next, analysis of the quantitative metabolite information …


3  
Metabolomics data analysis

• Goals
– biomarker discovery by identifying significant features associated
with certain conditions
– Disease diagnosis via classification

• Challenges
– Limited sample size
– Many metabolites / variables

• Workflow
univariate multivariate machine
pre-treatment
analysis analysis learning
4  
Data Pre-treatment

5  
Normalization (I)

• Goals
– to reduce systematic variation
– to separate biological variation from variations introduced in the
experimental process
– to improve the performance of downstream statistical analysis

• Sources of experimental variation


– sample inhomogeneity
– differences in sample preparation
– ion suppression

6  
Normalization (II)

• Approaches
– Sample-wise normalization: to make samples comparable to each
other
– Feature/variable-wise normalization: to make features more
comparable in magnitude to each other.
• Sample-wise normalization
– normalize to a constant sum
– normalize to a reference sample
– normalize to a reference feature (an internal standard)
– sample-specific normalization (dry weight or tissue volume)
• Feature-wise normalization (i.e., centering, scaling, and
transformation)

7  
Centering, scaling, transformation

8  
Centering

• Converts all the concentrations to fluctuations around zero


instead of around the mean of the metabolite
concentrations
• Focuses on the fluctuating part of the data
• Is applied in combination with data scaling and
transformation

9  
Scaling

• Divide each variable by a factor


• Different variables have a different scaling factor
• Aim to adjust for the differences in fold differences between
the different metabolites.
• Results in the inflation of small values
• Two subclasses
– Uses a measure of the data dispersion
– Uses a size measure

10  
Scaling: subclass 1

• Use data dispersion as a scaling factor


– auto: use the standard deviation as the scaling factor. All the
metabolites have a standard deviation of one and therefore the data
is analyzed on the basis of correlations instead of covariance.
– pareto: use the square root of the standard deviation as the scaling
factor. Large fold changes are decreased more than small fold
changes and thus large fold changes are less dominant compared to
clean data.
– vast: use standard deviation and the coefficient of variation as
scaling factors. This results in a higher importance for metabolites
with a small relative sd.
– range: use (max-min) as scaling factors. Sensitive to outliers.

11  
Scaling: subclass 2

• Use average as scaling factors


– The resulting values are changes in percentages compared to the
mean concentration.
– The median can be used as a more robust alternative.

12  
Transformation

• Log and power transformation


• Both reduce large values relatively more than the small
values.
• Log transformation
– pros: removal of heteroscedasticity
– cons: unable to deal with the value zero.
• Power transformation
– pros: similar to log transformation
– cons: not able to make multiplicative effects additive

13  
Centering,
scaling,
transformation

A. original data
B. centering
C. auto
D. pareto
E. range
F. vast
G. level
H. log
I. power

14  
Log transformation, again

• Hard to do useful statistical tests with a skewed


distribution.

• A skewed distribution or exponentially decaying


distribution can be transformed into a Gaussian distribution
by applying a log transformation.
15  
http://www.medcalc.org/manual/transforming_data_to_normality.php
Univariate vs. multivariate analysis

• Univariate analysis examines each variable separately.


– t-tests
– volcano plot

• Multivariate analysis considers two or more variables


simultaneously and takes into account relationships
between variables.
– PCA: Principle Component Analysis
– PLS-DA: Partial Least Squares-Discriminant Analysis

• Univariate analyses are often first used to obtain an


overview or rough ranking of potentially important features
before applying more sophisticated multivariate analyses.
16  
Univariate Statistics
t-test
volcano plot

17  
Univariate statistics

• A basic way of presenting univariate data is to create a


frequency distribution of the individual cases.

Due to the Central Limit


Theorem, many of these
frequency distributions can be
modeled as a normal/Gaussian
distribution.

18  
Gaussian distribution

• The total area underneath each


density curve is equal to 1. 1.0
2
= 0, = 0.2,
2
= 0, = 1.0,
0.8 = 0, 2
=
2
= =

0.6

(x)
2
0.4

2
" x−µ %
1 −$
# σ &
' 0.2

ϕ (x) = e 0.0

σ 2π 0
x
1 2 4

mean = µ

0.4
variance = σ 2
standard deviation = σ 0.3
0.2
0.1 34.1% 34.1%

2.1% 2.1%
0.1% 13.6% 13.6% 0.1%
0.0

µ
19  
https://en.wikipedia.org/wiki/Normal_distribution
Sample statistics

1 n
Sample mean: X= ∑ Xi
n i=1
n
1 2
Sample variance: S 2 = ∑
n −1 i=1
( X i − X )
Sample standard deviation: S = S 2

20  
https://en.wikipedia.org/wiki/Normal_distribution
t-test (I)

• One-sample t-test: is the sample


drawn from a known population?

Null hypothesis H 0 : µ = µ 0
Alternative hypethesis H1: µ <µ 0
x − µ0
Test statistic: t =
s n
1 n 2
Sample standard deviation: s = ∑
n −1 i=1
( xi − x )

The test statistic t follows a student’s t


distribution. The distribution has n-1
degrees of freedom.
21  
t-test (II): p-value

When the null hypothesis is


rejected, the result is said to be
statistically significant.

22  
t-test (III)

• Two-sample t-test: are the two populations different?

Null hypothesis H 0 : µ1 − µ 2 = 0
Alternative hypethesis H1: µ1 − µ 2 ≠ 0

Test statistic: t =
( x1 − x2 ) − (µ 1 −µ 2 )
s12 s22
+
n1 n2

• The two samples should be


independent.

23  
t-test (IV)

Equivalent statements:
• The p-value is small.
• The difference between the two populations is unlikely to
have occurred by chance, i.e. is statistically significant.

24  
t-test (V)

• The p-value is big.


• The difference between the
two populations are said
NOT to be statistically
significant.

25  
t-test (VI)

• Paired t-test: what is the effect of a treatment?


• Measurements made on the same individuals before and
after the treatment.
Example: Subjects participated in a study on the effectiveness of a
certain diet on serum cholesterol levels.

Subject Before After Difference


1 201 200 -1 H 0 : µd = 0
2 231 236 +5 H a : µd ≠ 0
3 221 216 -5
d − µd
5 260 243 -17 Test statistic: t =
6 228 224 -4 sd n
7 245 235 -10
26  
Volcano plot (I)

• Plot fold change vs. significance


• y-axis: negative log of the p-value
• x-axis: log of the fold change so that changes in both
directions (up and down) appear equidistant from the
center
• Two regions of interest: those points that are found towards
the top of the plot that are far to either the left- or the right-
hand side.

27  
Volcano plot (II)

28  
Multivariate statistics
PCA
PLS-DA

29  
PCA (I)

• PCA is a statistical procedure to transform a set of


correlated variables into a set of linearly uncorrelated
variables.

30  
PCA (II)

• The uncorrelated variables are ordered in such a way that


the first one accounts for as much of the variability in the
data as possible and each succeeding one has the highest
variance possible in the remaining variables.

• These ordered uncorrelated variables are called principle


components.

• By discarding low-variance variables, PCA helps us reduce


data dimension and visualize the data.

31  
PCA (III)

• The transformation matrix

• The transformation

• p1, p2 ,, pn are called the 1st, 2nd, and nth principle
components, respectively.
32  
PCA (IV)

• Each original sample is represented by an n-dimensional


vector:
Sbefore transformation = [ x1, x2 ,, xn ]
• After the transformation
– If all of the principle components are kept, then each sample is still
represented by an n-dimensional vector:

Safter transformation = [ y1, y2 ,, yn ]


– If only m < n principle components are kept, then each sample will
be represented by an m-dimensional vector:

Safter transformation = [ y1, y2 ,, ym ]


33  
PCA (V)

• y are called scores.


• For visualization purpose, m is usually chosen to be 2 or 3.
• As a result, each sample will be represented by a 2- or 3-
dimenational point in the score plot.
Principal Component Analysis (Scores)
30

wt15

20

wt16
10

● wt22

wt18
PC2

wt19 wt21

ko15

0

ko22
ko21 ●

−10

ko19

ko16

ko18

34  

−30 −20 −10 0 10

PC1
PCA (VI)

• Loadings

• [ p11, p12 ], [ p21, p22 ],, [ pn1, pn2 ] are denoted as points in the
loadings plot
35  
Loadings plot for all variables PCA (VII)
Principal Component Analysis (Loadings)

548.1/3425
324/3268
472/2954 414.2/3062
0.06

533.2/2870 507.2/3030 316.2/3089


317.2/3092
485.2/3041
324.1/2789 341.2/3556
567.2/3016 560.3/3089
547.3/3019
594.2/3376
523.1/3037
443/2684
548.4/4232 595.2/3004 458.2/3046
472.2/3121
549.3/3428
420.2/2722
299.2/4126
244.9/3130
551.2/3013
348.2/3499 567.1/3030
596.2/3008
361.1/2693
215.2/3478
592.2/2998 502.2/3032 225/3232248.2/3240
591.3/3005
344.2/3350
344.2/3152
506.3/3310
577.2/2861 419.2/3807
363.1/3481317.2/3492 590.3/3005 463.2/3046
504.2/3031
459.2/3047
512.2/3358
395.2/3570
444.2/2897
560.2/3096
573.2/3004 507.1/3017
466.1/3203402.1/2688 554.2/3490
419.1/3062
338.1/2789
576.4/3863
547.2/3021
219.1/2523
591.2/2982
302.1/2818
337.2/3438
310.1/2798
556.2/3444 548.3/3015 402.1/2672 546.2/3018
488.2/2882 591.5/4105 562.1/3514
557.1/2574 286.2/2855
532.2/2870
417.1/2683 573.3/3391
382/2677
389.2/3349 551.2/3032
366.1/3395
318.1/3246
548.2/3020 581.2/3575 580.2/3574
479.1/3171 387.2/3243 502.1/4192
516.2/3262582.4/3827 464.1/3047 459.3/3866
396.1/3560
359/3507 510.2/3527
563.4/4133
363.2/3263
311.1/3080 380.1/3203
576.2/2858 315.1/2773
235/3130 474.4/3722 231.1/2528
507.2/3390
576.3/4112 598.3/3704
289.1/2718
364.1/3264 449.1/2895 372.1/3626
420.1/3786 202.1/3497 426.2/3060 292.1/3492
568.4/4231 495.3/3417 524.1/2884 455.2/3288 288.1/2794 589.2/2797
0.04

392.1/3631
532.2/3458 381/2686 553.1/3357 439.2/3459 391.2/3864 386.3/3577 200.9/3473
495.2/3430
205/2788 459.1/4137
313.2/3290 515.2/2869
437.1/2789 326.1/3090 421.1/2798
346.2/3493
577.4/4208 595/2627456.2/3289 588.2/2794 305.2/3290
553/2545
391.2/3490
283.2/3675 445/2684
206/2789
274.2/3431
549/3262 377/3074 460.2/3450 225.1/3151
432.1/2683
485.3/3057 244.1/2830 380.2/3333
432.2/2719 482.3/3201
440.1/3150
317/2788
564.2/3452 325/3255 340.1/3307
584.4/4137
495.3/2930 283.2/3134
590.2/3467
422.3/2572 553.1/4164 326.4/4114 460.4/3550
550.3/3600
528.1/3241
533.2/3488 231.1/2849
484.2/3599
502.1/3189 368.2/3529
493.1/2882
550.2/3245
357.2/3474 504.2/3260
236.1/2522 529.2/3240
390.1/3376 562.2/3426 566.3/3595
538.2/3975 566.4/4231
314.2/3289
581.2/2861
444/2666 214.1/2519 462.3/4433
330.2/3166 399.1/2799 324.2/3507 534.3/2667
405/2686
281.1/2788 215.9/3506
327.1/2927 473.2/3138 362.2/3156 305.1/2877 355.2/3567 590.1/3369
361.2/3167 478.4/3022
483.2/3553
367.2/3528
494.2/3428 529.2/3170 458.3/2945
471.9/2941 282.1/2616 592.3/3006 427/2687
416.1/2686 368.1/2792
286.2/3256
286.1/3283 350/3385 568.8/2627 438.3/3548
322.1/2899
567.2/3573
568.2/3207 273.1/3425 500.2/2798 449.1/3534
315/3270 537.2/2870
419.3/3531
297.3/3381 305.1/2926
500.1/3045
478.2/3639 314.1/3289
388.1/3651 427.1/3015 287.2/3262 475.2/2787
392.1/3472 484.2/2845 509.1/3030 302.3/3901
329.2/3489
485.2/2847
429.2/3888 557.3/3121
285.2/3637 586.3/3301
490.3/3253
509.2/3528
452.1/3076
531.2/3350
508.2/3528 534.1/3455
458.2/2941 512.1/3368313.1/4137
304.1/2924 345.2/3852
463.3/3826 565.4/3903 424.2/3155
580.3/3314
267.2/3628 317.3/3942
246.8/3281
551.3/3601
523.2/3426 308/2623 381/2663
436.2/3631 452.2/2547
567.4/4231 268.1/3868
285.2/3861
347.2/3494 391.2/3610 512.3/3127
433.2/2703 597.4/3795
523.4/4140
313.1/2824
281.2/3672 524.1/3157 207.1/2717
464.2/3468
532.2/3353 554.2/3339
575.4/4130 549.1/3514 561.4/3504
517.9/2512 478/3033 570.4/3509
481.2/3314 427.3/3269
371.2/3073
429.2/3155
325.2/3238
579.2/2788
506.3/3028
556.3/3118 584.3/3371
503.2/3167
518.2/3112 546.3/3016
0.02

279/2755
257.1/3660
474.1/3075 289.1/3118
307/2621 501.2/3493
388.1/2663 309.1/2923 556.2/3111 313.1/3506 467.3/3460584.1/3394
419.2/3062
331.2/3497 429.2/4031
421.1/3265
336.2/3192
304.1/3236 572.3/2827
583.4/4097
426.1/3017544.2/3207
533.2/3459
284.1/2716
385.2/3449 557/3161 468.3/2941
568.4/2913 577.3/4136
531.2/3514 546.4/4141
505.1/3262 344.2/2940 486.2/3476541.4/4132
430.2/3885
595.3/3016 314.1/2767 250.2/3627
532.2/3489
535.2/3583 330.1/3503
579.1/2789
569.2/3212 331.1/3498 365.2/3385 569.1/3295 530.2/3541
440.2/3471
292.2/3217 385.2/3174 528.3/2835 592.2/3629
397.1/3315
310.2/3635 392.1/3495
477.1/3288
587.2/3274 582.3/3485 561.3/3635 259.1/2802 475.2/2511
431.2/3884
454.2/3286
550.3/3685
200.1/2924 245.1/2882
527.2/3191
271.2/3404 396.2/2868
533.2/3343
233.1/3025 571.4/3677
534.2/3583442.9/2691
522.2/3426 306.1/3506
269.2/4129 365/2595 358.1/2778
311.2/3875
339.2/3319
397.2/2872 272.1/2727 487.2/2799
550.4/4115 389/3886 438.2/3461
439.2/3481
554.2/3215
464.2/3044 480.3/4131 564.2/3481
394.2/3077
548.1/3187 343/2597
229.1/3424 389.1/3646
235.1/2716386.2/4133 580.5/2523
530.2/3352302/2621
497.2/3337
545.2/3201
536.2/3719 352.1/2789310.2/3468
454.1/3291
251.2/3882
599.3/4126
526.1/3177
496.2/3403
376.2/3722 285.1/2716
257.2/3647
505.2/3264
513.1/3756
558.2/3535
567.5/4234362.1/2685
303.1/3238 306.1/2929
500.2/3040
246.4/3287
596.4/3803
328.1/3508 593.3/4128
588/3273
520.4/4140
323.2/3511
568.2/3224
440.2/2854
429.4/3464
486.2/3457
333.1/2548
526.2/3180
596.2/3844
287.1/3256
355.2/3104
351.2/3485
288.2/2922 391.1/3595
458.9/2940
527.1/3178
551.2/3603 504.1/3598498.2/3403
301/2787
546.2/3205431.1/3910
343/2662
501.1/3046
512.2/3127 280/2788 270.2/3880
285.1/4129429.2/4080
512.2/3761 256.2/3662 340.2/2758
594.3/3619
281/2793 588.3/2789 364.2/2582 436.2/3596
291.1/3629
523.2/3389
364.2/3262 496.2/3337
250.1/3674
550.3/4130
579.3/3844 332.1/3496
572.2/3390 292.2/3459
547.4/4141 349.2/3665
324.1/3509 581.1/2789 452.1/2550
272.1/3123 297.2/3854 475.2/2689
506.2/3391475.2/2665 345.1/2692
377.2/2997
461.3/3933
475.2/2536 323.2/3496
494.2/3120
434.1/3792 502.2/3171 370.2/3044
485.2/3609
455.1/3290
523.2/3362
528.1/3179
359.2/3720 576.2/3663 365/2684
329.1/3499
331.2/3453
589.4/4134
365/2659 379.1/3505
302/2789 520.2/3212
347.1/3499 365.2/3404 379.2/3336 317.2/3844 528.2/3239 576.3/2859
269.2/3888 353/2528 266.1/3624
565.4/2649 564.2/3621 256.2/3636
395.2/3719
347.2/3669
522.2/3363
366/2662 367.2/4132
476.1/3290
367.2/3584 277.2/3845
368.3/3863 513.2/3365
565.2/3452
538.4/4138
322.1/3507 310.1/2923 479.3/3630
480.2/3317 265.2/4114 283.2/3640
565.3/3479 441.3/3495
453.2/3074 316.1/3087
381.2/3194
476.2/2678431.2/2880 558/3514 241.2/3641
344/2684 322.2/3504 433.1/2682 369.2/4113
566.2/3573 505.4/3699
503.1/3166 478.1/3162 457.2/3293
585/3503
436.3/2950
497.2/3407
520.2/3268 301.1/3384 580.1/2789 307.1/3503
388.1/2685
287.1/2635
385.1/3181
327.1/3505 544.1/3444 545.2/3381
485.3/4134
358/2783 327.2/3851
449.2/2739 330.2/3491 354.1/2782
280.2/3319 422.2/2887
PC2

448.1/3016 433.2/3810
477.2/3293
465.2/3466 364.2/3464
586.1/3285 469.2/3864
400.1/2792
596/3029267.1/2783 396.1/3348
207/2744
311.1/4131 546.3/3657
0.00

265.2/3659
539.3/4135 524.3/2927
523.3/4145 329.2/3880
576.4/4123 283.2/3883
594/2618
512.2/2923 313/2786243.1/3542
335/2785 506.2/3258
265.1/2620 239.1/3252
475.2/2830
486.2/2799
425.1/2950 551.3/3025
564.3/3049 402.2/3325
412.2/3369
468.2/3139 323.1/3896
405.3/3940
498.4/4051 430.1/2686
550.2/3604 303/2622 494.2/3165 557.2/3115 473.2/3290
353.2/3495 521.2/3212
314/2786 233/3026
271.3/3264
225.1/3668 247.1/2648 532.1/3517 533.4/4086
245.9/3132 453.2/2959
579.5/3513
360/2692 243.2/3651 288.2/2795 462.2/3273
480.1/3336
366.2/3842
510.1/3471 305.1/3506 480.1/3318 270.1/3891 333.1/3121
528.2/2837
492.3/3873 337/2537 362.2/3325
334/2531 392.2/2929
383.2/4087 367/2666
564.4/4140
312.1/3000 345/2684 346.9/2782 482.2/3318 457.2/4112
377.1/3077 289/2952
528.2/3168
475.2/3058 486.3/4121
433/3506 277/2638 428.3/3634
408.2/2576
552.2/3350 525.2/3162 395.2/3674
368.2/4133 353.1/3498
456.1/3288 504.1/3260 414.1/3624
572.2/3663 338.1/3022 558.1/3396
439.1/3456
510.2/3761 336/2785
392.1/4127 485.1/4136 395/4121 537.2/3564
386.2/3179
383.2/3241
569.3/3594
429.2/4054 480.2/3099 594.2/3408309.2/3633
506.1/3261 502.1/3166
351.1/3499
390.2/3353
536.3/3419 544.1/3370 269.1/3886
301.2/3886 312.2/4134
419.2/3854 390.3/3050 334.2/2811
501.2/2797
475.2/2725
279/2788 521.3/3163
564.3/3886 399/2530
529.1/3242
466.2/3695
321.1/3673 547.2/3683
559.2/3536
366/2684 436.2/2954343/2684
360.3/3397
593.2/3274
511.2/3763
533.1/3351 457.1/3292
348.2/3430
592.4/4130 303.2/3388 542.3/3209 440.3/3089 351.2/3503 535.3/3867 338/2536
441.3/3050 349.1/3633
547.2/3206
438.1/3455 599.4/4131
312.2/3877 597.3/3000
448.2/3862
339.2/4124
408.1/3494
525.2/3697
530.2/3744447.2/3859
313.1/3290
586.2/3279 533.2/2681
323.1/3503 589.5/4402 475.2/2617
401.2/2862 371.2/3605 379.2/3307
315/2536
571.4/4130 258/3497
545.3/4120
444.1/2686 364.9/2553 244/4103
482.2/3598 314.1/3857 537.4/3174 284.2/3627
593.4/3672
391.1/3632
386.1/3177 552.1/3353515.9/2929 338.2/3829
562.2/3740 581.2/2789
529.4/2928 492.3/3633
473.9/3293302.1/3253 553.2/3348 559.2/3393 404.1/2686 535.3/3117
323.2/3883
475.2/2563 375/3071
554.1/2526
508.1/3367
256.1/3652 364.2/3655
549.1/3186
568.3/2910
561.2/2916
593.3/3260 351.1/3028 483.1/3054 349.1/3421
565.2/3881405/3251
426.9/2692
547.3/4139286.1/3263 475.2/3089 512/2971
284.2/3884 559.1/3391
496.2/3363 536.2/3673
551.2/3665 517.1/2937 330.2/3630
282.2/3471
491.3/3539
491.3/3840 366.1/2788 352.2/3490 343.1/3510
454.1/3997 441.2/3604
560.2/3626
322.2/3635 431.1/2685
319.2/3245
479.2/3599
448.1/3549
575.3/4115
379.1/3476
524.2/3696 413.1/3634
582.3/4122
481.1/3318365.1/3388
325.2/3649
340.2/3317 315/2782
215.9/3170 419.1/3505
447.2/3877
440.2/3160 328.2/3500 351.2/3615
454.2/3088 236.9/3632 594.3/3602
356.1/3869 545.5/2646 572.4/3932
−0.02

538.3/4134 421.3/3867 385.2/4134


250.2/4059
521.3/4137 521.4/4138 513.4/3526 449.1/2743
308.2/3261
285.2/4127 392.1/3604
508.2/3373 332/2537
527.3/3532
467.2/3699
301.1/3681 535.3/3518
550.1/3237
513.3/3522
552.1/3369
520.3/4135 256.1/3467
518.3/3522 362/2690
518.2/3409 535.3/3832
289.2/3125286/2620 322.2/3149
430.2/4073 471.3/4112 546.2/3701558.2/3392 570.5/3514 537.2/3672 338.2/3630
571.4/3715
593/3501 256.1/3444
343.1/3650 550.2/3665 483.1/3538
397/2780 307.1/3864
504.1/3164
563/3515 566.3/4130
597.3/3789 270.5/3166 536.1/3940
204.9/2590
386/2535 268.2/3888 572.1/3676573.2/3420
333/2514338/2511
557.2/2916 354.1/3671
568.3/3617526.2/3693
553.3/3817
519/3727296.1/2727
552.3/3819
534.3/4115 565.4/4140556.2/3592
509.3/3835
408.1/2894 350.2/3626
258/3232
597.2/4128 461.2/3912 247/3490
380.1/3329
536.2/3565
550.4/4134 497.2/3363
517.1/2909
569.3/3616 537.4/4125
307.1/3141 557.3/2914
513.3/2928
403.1/2950
508.3/3839 316.1/3653 329/2921
440.3/4059 392.2/3616 538.9/2509
335/3278 355.2/3591 498.2/3444
256.2/3454
579.4/3510
544.2/3429
308.1/3257
380.2/2971 436/3126 466.2/3470 462.1/3094 326.1/3300
590.2/3214 358/3958 553.2/4057
540.4/4132
597.3/3812
315.1/3653
524.2/3624
254.1/3226
593.4/3705
483.2/3598
297.1/3848
539.2/2908
425.2/2955
571/3508
336.2/2995
304.2/3471
314.1/4141
310.1/3463
280.1/3315
573.3/3608 273.9/2640 485.1/3607340.1/3328
403.2/3889 401.1/3334 570.1/3212
499.4/4120 379.1/3327397.2/3326520.2/3284
396.2/3326
331.1/3459
486.1/3464
311.1/3625 424.2/2952
363.1/3259
412.3/4118 574.3/4051
570.3/4124
596.3/3807 306.9/2617 591.2/3220
540.5/4170
561.9/2539
226/2962 313.2/4137 336.1/3533 345.3/3386 466.2/3658
549.3/4128
468.2/3718 284.2/3675
382.2/3788 407.2/3501 329/3150
450.1/3879
556.3/2914
549/3522
512.3/2926 362/2664
255.1/3668
357.1/3469 466.2/3200
578.3/3941
564.3/3511
213.2/3497
397.2/3877 596.2/4052 365/3484298.1/3671527.2/3701 241.1/3646
499.3/4112
354/2681
349.2/3407
257.1/3640 381.1/3194594.1/3671 512.4/4066
398.3/3636
337/2509 250.1/3639
496.2/3442
249.1/3631
539.4/4141
538.3/4098 513.2/2913
538/3520
468.2/2940 377.2/3458
439.3/4054 571.6/2509
403.1/3329350.1/3630 584.2/3507 533.3/3892 405.2/3391 264.2/3158
478.2/3603 544.2/3389
413.3/4116 301.2/3656
296.9/4127
286.1/3851247.9/2979
395.1/3558 345/3238 321.1/3641
284.1/3634
543.4/3699
537/3930 416.2/2915
540.4/3698
544/3714 484.1/3249
332/2509
315/2509
572.2/3417518.2/3436
368.2/2786 517.2/2925
376.2/3459 377/3509 566.2/3204
448.2/3550 257.9/3425 594.4/2537 225.1/3265330.2/3606
533/2886
346.1/3275
352.1/3498282.1/3468 493.2/3667
423.1/3254 411.2/3944
298.2/3187 287/2647
384.3/4003 339.1/4124 273.2/3424
309.1/3649 218.2/3366
302.1/3395
554.1/3215 564.3/2977
294.1/2577
559.1/3520
473.2/2935 597.2/4057 301.1/3891
492.2/3643
404.3/3900
−0.04

328.2/3631
546.3/4137 444.3/3867
574.2/4044
438.3/4058
461.1/3542 424.2/4005 486/3425
251.1/3880
572.2/3844 444.2/3883 372.1/3291
337.1/3650
398.3/4059 285.1/3854
429.2/2951 404/2707
348.2/3285
361.1/3167 564.2/3883 324.2/3277 266.2/3634 467.2/3661
349.1/3666 452.2/3146 585.3/3087 376.1/3450 533.2/3898 558.1/3526 411.3/3933
484.1/3588
590.1/3453
402.1/3337 346.1/3499
548.3/4235
271.1/3879 590.2/3007
361.9/2922 381.1/3154 328.2/3607
460.1/3452 261.1/3148 571.8/2913 361/2684
427/2707 391.1/3611
359.1/3514 298.1/3182
380.2/3660 327.2/3420 388/3959
384.2/3999 533.3/3171 328.2/3376
449.1/3290
354.2/3599
410.3/3944 561.3/2910
360.8/2915572.8/2915
246.7/3295 424.3/4002
358.2/3832 362.1/3165 573.8/2921
345.2/3388 326.2/3420
359/3225
569.8/2919
574.8/2915 273.1/3655
423.2/3253 349.2/3278 523.2/3459
351.9/2564 423.2/3272
283.1/3636 534.3/3864 329.2/3611
357.2/3833
396.2/3862 323.1/3391 440.2/4060 323.2/3392 380.1/3154
445.1/2683
356.2/3830 382.1/3246 348.1/3420
348.1/3288
370.2/3918 438.2/4074
576/3212
349.1/3290412.2/3944322.1/3394 378.2/3831
410.2/3938
−0.06

301.2/3391 302.2/3389
300.2/3390

−0.06 −0.04 −0.02 0.00 0.02 0.04


36  
PC1
Loadings plot for the top 25 varaibles
PCA (VIII)
Principal Component Analysis (Loadings) Top 25
548.1/3425 ● 356.2/3830
324/3268 ●
● 472/2954
● 357.2/3833
0.06

533.2/2870 ● ● 396.2/3862
● ● 349.1/3290
● 301.2/3391
● 324/3268
556.2/3444 ● 548.1/3425

359/3507 ● 300.2/3390
● ● 358.2/3832
0.04

392.1/3631
532.2/3458
● ●
● 412.2/3944
● 382.1/3246
● 322.1/3394
● 523.2/3459
● 359/3507
● 460.1/3452
0.02

● 302.2/3389
● 472/2954
● 533.2/2870
● 410.2/3938
● 466.2/3658
● 323.1/3391
PC2

0.00

● 532.2/3458
● 556.2/3444
● 362.1/3165
● 392.1/3631
−0.02

466.2/3658

−0.04

460.1/3452

358.2/3832362.1/3165 523.2/3459
357.2/3833 ●

396.2/3862 323.1/3391

356.2/3830
● 382.1/3246
●322.1/3394
● 412.2/3944
349.1/3290 410.2/3938
−0.06

● ●
● ● ●
301.2/3391 ●
302.2/3389
300.2/3390 ●
● ●

−0.04 −0.02 0.00 0.02 0.04 37  


PC1
PCA (IX)
Scree plot: variance vs. principle component number

38  
PLS-DA (I)

• A supervised method to find a predictive model that


describes the direction of maximum covariance between a
dataset (X) and the class membership (Y)
• Similar to PCA, the original variables are summarized into
much fewer new variables using their weighted averages.
• The new variables are called scores.
• The weighting profiles are called loadings.
• PLS-DA can perform both classification and feature
selection.
• Feature importance measure: VIP (Variable Importance in
Projection)

39  
PLS-DA (II)

• Interpretation of the model


– R2X and R2Y
• fraction of the variance that the model explains in the
independent (X) and dependent variables (Y)
• Range: 0-1
– Q2Y
• measure of the predictive accuracy of the model
• usually estimated by cross validation or permutation testing
• Range: 0-1
• > 0.5 is considered good while > 0.9 is outstanding

40  
PLS-DA (III)

• Note of caution
– Supervised classification methods are powerful.
– BUT, they can overfit your data, severely.

41  
Machine Learning
Clustering
Classification

42  
Clustering

• Group similar objects together


• Any clustering method requires
– A method to measure similarity/dissimilarity between objects
– A threshold to decide whether an object belongs to a cluster
– A way to measure the distance between two clusters
• Common clustering algorithms
– K-means
– Hierarchical
– Self-organizing map
• Unsupervised machine learning techniques

43  
Hierarchical clustering (I)

1. Find the two closest objects and merge them into a cluster
2. Find and merge the next two closest objects (or an object
and a cluster, or two clusters)
3. Repeat step 2 until all objects have been clustered

44  
Hierarchical clustering (II)

• Methods to measure similarity between objects


– Euclidean, Manhattan
Organizing Maps– Pearson correlation
Clustering Emotion Clustering
SOM Maps
Self Organizing Self
Emotion
Clustering
Organizing
SOMMaps
Self Organizing Maps
Emotion SOM
Emotion SOM

Variants of Clustering: Average Linkage


ngleVariants CosineComplete
Linkage of–Clustering: similarity
Variants of Clustering:
Linkage Centroid Linkage

• Linkage: ways to measure theAverage


distance between
distance between two
each point in the clusters
first cluster and all
membersMaximum
of the twodistance Difference
clusters between members between
of the two clusters other
the centroids of the twopoints in the second cluster
clusters


1
) d(Ci , Cj ) = max d(x, y ) 1 ⇧ 1 ⇧ d(Ci , Cj ) = d(x, y )
x Ci ,y Cj d(Ci , Cj ) = d ⇤ x, y⌅ NCi NCj
x Ci ,y Cj
NCi NCj
x Ci y Cj

single complete centroid average


Dr. Bert Arnrich Wearable Computing Lab.
Dr. Bert Arnrich Wearable ComputingDr. Bert Arnrich
Lab. Wearable Computing Lab. Wearable Computing Lab.

45  
Hierarchical clustering (III)

46  
Classification

• Use a training set of correctly-identified observations to


build a predictive model
• Predict to which of a set of categories a new observation
belongs
• Supervised machine learning
• Methods
– Linear discriminant analysis
– Support vector machine (SVM)
– Artificial neural network (ANN)
– k-nearest neighbor
– Random forest
– PLS-DA
47  
Software Packages
MetaboAnalyst
XCMS

48  
For in-depth statistical analysis and data interpretation,
please make an appointment with a biostatistician.

49  
Thank you!

50  

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy