Statistical Analysis
Statistical Analysis
Metabolomics Data
Xiuxia Du
Department of Bioinformatics & Genomics
University of North Carolina at Charlotte
Outline
• Introduction
• Data pre-treatment
1. Normalization
2. Centering, scaling, transformation
• Univariate analysis
1. Student’s t-tes
2. Volcano plot
• Multivariate analysis
1. PCA
2. PLS-DA
• Machine learning
• Software packages
2
Results from data processing
• Goals
– biomarker discovery by identifying significant features associated
with certain conditions
– Disease diagnosis via classification
• Challenges
– Limited sample size
– Many metabolites / variables
• Workflow
univariate multivariate machine
pre-treatment
analysis analysis learning
4
Data Pre-treatment
5
Normalization (I)
• Goals
– to reduce systematic variation
– to separate biological variation from variations introduced in the
experimental process
– to improve the performance of downstream statistical analysis
6
Normalization (II)
• Approaches
– Sample-wise normalization: to make samples comparable to each
other
– Feature/variable-wise normalization: to make features more
comparable in magnitude to each other.
• Sample-wise normalization
– normalize to a constant sum
– normalize to a reference sample
– normalize to a reference feature (an internal standard)
– sample-specific normalization (dry weight or tissue volume)
• Feature-wise normalization (i.e., centering, scaling, and
transformation)
7
Centering, scaling, transformation
8
Centering
9
Scaling
10
Scaling: subclass 1
11
Scaling: subclass 2
12
Transformation
13
Centering,
scaling,
transformation
A. original data
B. centering
C. auto
D. pareto
E. range
F. vast
G. level
H. log
I. power
14
Log transformation, again
17
Univariate statistics
18
Gaussian distribution
0.6
(x)
2
0.4
2
" x−µ %
1 −$
# σ &
' 0.2
ϕ (x) = e 0.0
σ 2π 0
x
1 2 4
mean = µ
0.4
variance = σ 2
standard deviation = σ 0.3
0.2
0.1 34.1% 34.1%
2.1% 2.1%
0.1% 13.6% 13.6% 0.1%
0.0
µ
19
https://en.wikipedia.org/wiki/Normal_distribution
Sample statistics
1 n
Sample mean: X= ∑ Xi
n i=1
n
1 2
Sample variance: S 2 = ∑
n −1 i=1
( X i − X )
Sample standard deviation: S = S 2
20
https://en.wikipedia.org/wiki/Normal_distribution
t-test (I)
Null hypothesis H 0 : µ = µ 0
Alternative hypethesis H1: µ <µ 0
x − µ0
Test statistic: t =
s n
1 n 2
Sample standard deviation: s = ∑
n −1 i=1
( xi − x )
22
t-test (III)
Null hypothesis H 0 : µ1 − µ 2 = 0
Alternative hypethesis H1: µ1 − µ 2 ≠ 0
Test statistic: t =
( x1 − x2 ) − (µ 1 −µ 2 )
s12 s22
+
n1 n2
23
t-test (IV)
Equivalent statements:
• The p-value is small.
• The difference between the two populations is unlikely to
have occurred by chance, i.e. is statistically significant.
24
t-test (V)
25
t-test (VI)
27
Volcano plot (II)
28
Multivariate statistics
PCA
PLS-DA
29
PCA (I)
30
PCA (II)
31
PCA (III)
• The transformation
• p1, p2 ,, pn are called the 1st, 2nd, and nth principle
components, respectively.
32
PCA (IV)
wt15
●
20
wt16
10
● wt22
●
wt18
PC2
wt19 wt21
●
●
ko15
●
0
ko22
ko21 ●
●
−10
ko19
●
ko16
●
ko18
34
●
PC1
PCA (VI)
• Loadings
• [ p11, p12 ], [ p21, p22 ],, [ pn1, pn2 ] are denoted as points in the
loadings plot
35
Loadings plot for all variables PCA (VII)
Principal Component Analysis (Loadings)
548.1/3425
324/3268
472/2954 414.2/3062
0.06
392.1/3631
532.2/3458 381/2686 553.1/3357 439.2/3459 391.2/3864 386.3/3577 200.9/3473
495.2/3430
205/2788 459.1/4137
313.2/3290 515.2/2869
437.1/2789 326.1/3090 421.1/2798
346.2/3493
577.4/4208 595/2627456.2/3289 588.2/2794 305.2/3290
553/2545
391.2/3490
283.2/3675 445/2684
206/2789
274.2/3431
549/3262 377/3074 460.2/3450 225.1/3151
432.1/2683
485.3/3057 244.1/2830 380.2/3333
432.2/2719 482.3/3201
440.1/3150
317/2788
564.2/3452 325/3255 340.1/3307
584.4/4137
495.3/2930 283.2/3134
590.2/3467
422.3/2572 553.1/4164 326.4/4114 460.4/3550
550.3/3600
528.1/3241
533.2/3488 231.1/2849
484.2/3599
502.1/3189 368.2/3529
493.1/2882
550.2/3245
357.2/3474 504.2/3260
236.1/2522 529.2/3240
390.1/3376 562.2/3426 566.3/3595
538.2/3975 566.4/4231
314.2/3289
581.2/2861
444/2666 214.1/2519 462.3/4433
330.2/3166 399.1/2799 324.2/3507 534.3/2667
405/2686
281.1/2788 215.9/3506
327.1/2927 473.2/3138 362.2/3156 305.1/2877 355.2/3567 590.1/3369
361.2/3167 478.4/3022
483.2/3553
367.2/3528
494.2/3428 529.2/3170 458.3/2945
471.9/2941 282.1/2616 592.3/3006 427/2687
416.1/2686 368.1/2792
286.2/3256
286.1/3283 350/3385 568.8/2627 438.3/3548
322.1/2899
567.2/3573
568.2/3207 273.1/3425 500.2/2798 449.1/3534
315/3270 537.2/2870
419.3/3531
297.3/3381 305.1/2926
500.1/3045
478.2/3639 314.1/3289
388.1/3651 427.1/3015 287.2/3262 475.2/2787
392.1/3472 484.2/2845 509.1/3030 302.3/3901
329.2/3489
485.2/2847
429.2/3888 557.3/3121
285.2/3637 586.3/3301
490.3/3253
509.2/3528
452.1/3076
531.2/3350
508.2/3528 534.1/3455
458.2/2941 512.1/3368313.1/4137
304.1/2924 345.2/3852
463.3/3826 565.4/3903 424.2/3155
580.3/3314
267.2/3628 317.3/3942
246.8/3281
551.3/3601
523.2/3426 308/2623 381/2663
436.2/3631 452.2/2547
567.4/4231 268.1/3868
285.2/3861
347.2/3494 391.2/3610 512.3/3127
433.2/2703 597.4/3795
523.4/4140
313.1/2824
281.2/3672 524.1/3157 207.1/2717
464.2/3468
532.2/3353 554.2/3339
575.4/4130 549.1/3514 561.4/3504
517.9/2512 478/3033 570.4/3509
481.2/3314 427.3/3269
371.2/3073
429.2/3155
325.2/3238
579.2/2788
506.3/3028
556.3/3118 584.3/3371
503.2/3167
518.2/3112 546.3/3016
0.02
279/2755
257.1/3660
474.1/3075 289.1/3118
307/2621 501.2/3493
388.1/2663 309.1/2923 556.2/3111 313.1/3506 467.3/3460584.1/3394
419.2/3062
331.2/3497 429.2/4031
421.1/3265
336.2/3192
304.1/3236 572.3/2827
583.4/4097
426.1/3017544.2/3207
533.2/3459
284.1/2716
385.2/3449 557/3161 468.3/2941
568.4/2913 577.3/4136
531.2/3514 546.4/4141
505.1/3262 344.2/2940 486.2/3476541.4/4132
430.2/3885
595.3/3016 314.1/2767 250.2/3627
532.2/3489
535.2/3583 330.1/3503
579.1/2789
569.2/3212 331.1/3498 365.2/3385 569.1/3295 530.2/3541
440.2/3471
292.2/3217 385.2/3174 528.3/2835 592.2/3629
397.1/3315
310.2/3635 392.1/3495
477.1/3288
587.2/3274 582.3/3485 561.3/3635 259.1/2802 475.2/2511
431.2/3884
454.2/3286
550.3/3685
200.1/2924 245.1/2882
527.2/3191
271.2/3404 396.2/2868
533.2/3343
233.1/3025 571.4/3677
534.2/3583442.9/2691
522.2/3426 306.1/3506
269.2/4129 365/2595 358.1/2778
311.2/3875
339.2/3319
397.2/2872 272.1/2727 487.2/2799
550.4/4115 389/3886 438.2/3461
439.2/3481
554.2/3215
464.2/3044 480.3/4131 564.2/3481
394.2/3077
548.1/3187 343/2597
229.1/3424 389.1/3646
235.1/2716386.2/4133 580.5/2523
530.2/3352302/2621
497.2/3337
545.2/3201
536.2/3719 352.1/2789310.2/3468
454.1/3291
251.2/3882
599.3/4126
526.1/3177
496.2/3403
376.2/3722 285.1/2716
257.2/3647
505.2/3264
513.1/3756
558.2/3535
567.5/4234362.1/2685
303.1/3238 306.1/2929
500.2/3040
246.4/3287
596.4/3803
328.1/3508 593.3/4128
588/3273
520.4/4140
323.2/3511
568.2/3224
440.2/2854
429.4/3464
486.2/3457
333.1/2548
526.2/3180
596.2/3844
287.1/3256
355.2/3104
351.2/3485
288.2/2922 391.1/3595
458.9/2940
527.1/3178
551.2/3603 504.1/3598498.2/3403
301/2787
546.2/3205431.1/3910
343/2662
501.1/3046
512.2/3127 280/2788 270.2/3880
285.1/4129429.2/4080
512.2/3761 256.2/3662 340.2/2758
594.3/3619
281/2793 588.3/2789 364.2/2582 436.2/3596
291.1/3629
523.2/3389
364.2/3262 496.2/3337
250.1/3674
550.3/4130
579.3/3844 332.1/3496
572.2/3390 292.2/3459
547.4/4141 349.2/3665
324.1/3509 581.1/2789 452.1/2550
272.1/3123 297.2/3854 475.2/2689
506.2/3391475.2/2665 345.1/2692
377.2/2997
461.3/3933
475.2/2536 323.2/3496
494.2/3120
434.1/3792 502.2/3171 370.2/3044
485.2/3609
455.1/3290
523.2/3362
528.1/3179
359.2/3720 576.2/3663 365/2684
329.1/3499
331.2/3453
589.4/4134
365/2659 379.1/3505
302/2789 520.2/3212
347.1/3499 365.2/3404 379.2/3336 317.2/3844 528.2/3239 576.3/2859
269.2/3888 353/2528 266.1/3624
565.4/2649 564.2/3621 256.2/3636
395.2/3719
347.2/3669
522.2/3363
366/2662 367.2/4132
476.1/3290
367.2/3584 277.2/3845
368.3/3863 513.2/3365
565.2/3452
538.4/4138
322.1/3507 310.1/2923 479.3/3630
480.2/3317 265.2/4114 283.2/3640
565.3/3479 441.3/3495
453.2/3074 316.1/3087
381.2/3194
476.2/2678431.2/2880 558/3514 241.2/3641
344/2684 322.2/3504 433.1/2682 369.2/4113
566.2/3573 505.4/3699
503.1/3166 478.1/3162 457.2/3293
585/3503
436.3/2950
497.2/3407
520.2/3268 301.1/3384 580.1/2789 307.1/3503
388.1/2685
287.1/2635
385.1/3181
327.1/3505 544.1/3444 545.2/3381
485.3/4134
358/2783 327.2/3851
449.2/2739 330.2/3491 354.1/2782
280.2/3319 422.2/2887
PC2
448.1/3016 433.2/3810
477.2/3293
465.2/3466 364.2/3464
586.1/3285 469.2/3864
400.1/2792
596/3029267.1/2783 396.1/3348
207/2744
311.1/4131 546.3/3657
0.00
265.2/3659
539.3/4135 524.3/2927
523.3/4145 329.2/3880
576.4/4123 283.2/3883
594/2618
512.2/2923 313/2786243.1/3542
335/2785 506.2/3258
265.1/2620 239.1/3252
475.2/2830
486.2/2799
425.1/2950 551.3/3025
564.3/3049 402.2/3325
412.2/3369
468.2/3139 323.1/3896
405.3/3940
498.4/4051 430.1/2686
550.2/3604 303/2622 494.2/3165 557.2/3115 473.2/3290
353.2/3495 521.2/3212
314/2786 233/3026
271.3/3264
225.1/3668 247.1/2648 532.1/3517 533.4/4086
245.9/3132 453.2/2959
579.5/3513
360/2692 243.2/3651 288.2/2795 462.2/3273
480.1/3336
366.2/3842
510.1/3471 305.1/3506 480.1/3318 270.1/3891 333.1/3121
528.2/2837
492.3/3873 337/2537 362.2/3325
334/2531 392.2/2929
383.2/4087 367/2666
564.4/4140
312.1/3000 345/2684 346.9/2782 482.2/3318 457.2/4112
377.1/3077 289/2952
528.2/3168
475.2/3058 486.3/4121
433/3506 277/2638 428.3/3634
408.2/2576
552.2/3350 525.2/3162 395.2/3674
368.2/4133 353.1/3498
456.1/3288 504.1/3260 414.1/3624
572.2/3663 338.1/3022 558.1/3396
439.1/3456
510.2/3761 336/2785
392.1/4127 485.1/4136 395/4121 537.2/3564
386.2/3179
383.2/3241
569.3/3594
429.2/4054 480.2/3099 594.2/3408309.2/3633
506.1/3261 502.1/3166
351.1/3499
390.2/3353
536.3/3419 544.1/3370 269.1/3886
301.2/3886 312.2/4134
419.2/3854 390.3/3050 334.2/2811
501.2/2797
475.2/2725
279/2788 521.3/3163
564.3/3886 399/2530
529.1/3242
466.2/3695
321.1/3673 547.2/3683
559.2/3536
366/2684 436.2/2954343/2684
360.3/3397
593.2/3274
511.2/3763
533.1/3351 457.1/3292
348.2/3430
592.4/4130 303.2/3388 542.3/3209 440.3/3089 351.2/3503 535.3/3867 338/2536
441.3/3050 349.1/3633
547.2/3206
438.1/3455 599.4/4131
312.2/3877 597.3/3000
448.2/3862
339.2/4124
408.1/3494
525.2/3697
530.2/3744447.2/3859
313.1/3290
586.2/3279 533.2/2681
323.1/3503 589.5/4402 475.2/2617
401.2/2862 371.2/3605 379.2/3307
315/2536
571.4/4130 258/3497
545.3/4120
444.1/2686 364.9/2553 244/4103
482.2/3598 314.1/3857 537.4/3174 284.2/3627
593.4/3672
391.1/3632
386.1/3177 552.1/3353515.9/2929 338.2/3829
562.2/3740 581.2/2789
529.4/2928 492.3/3633
473.9/3293302.1/3253 553.2/3348 559.2/3393 404.1/2686 535.3/3117
323.2/3883
475.2/2563 375/3071
554.1/2526
508.1/3367
256.1/3652 364.2/3655
549.1/3186
568.3/2910
561.2/2916
593.3/3260 351.1/3028 483.1/3054 349.1/3421
565.2/3881405/3251
426.9/2692
547.3/4139286.1/3263 475.2/3089 512/2971
284.2/3884 559.1/3391
496.2/3363 536.2/3673
551.2/3665 517.1/2937 330.2/3630
282.2/3471
491.3/3539
491.3/3840 366.1/2788 352.2/3490 343.1/3510
454.1/3997 441.2/3604
560.2/3626
322.2/3635 431.1/2685
319.2/3245
479.2/3599
448.1/3549
575.3/4115
379.1/3476
524.2/3696 413.1/3634
582.3/4122
481.1/3318365.1/3388
325.2/3649
340.2/3317 315/2782
215.9/3170 419.1/3505
447.2/3877
440.2/3160 328.2/3500 351.2/3615
454.2/3088 236.9/3632 594.3/3602
356.1/3869 545.5/2646 572.4/3932
−0.02
328.2/3631
546.3/4137 444.3/3867
574.2/4044
438.3/4058
461.1/3542 424.2/4005 486/3425
251.1/3880
572.2/3844 444.2/3883 372.1/3291
337.1/3650
398.3/4059 285.1/3854
429.2/2951 404/2707
348.2/3285
361.1/3167 564.2/3883 324.2/3277 266.2/3634 467.2/3661
349.1/3666 452.2/3146 585.3/3087 376.1/3450 533.2/3898 558.1/3526 411.3/3933
484.1/3588
590.1/3453
402.1/3337 346.1/3499
548.3/4235
271.1/3879 590.2/3007
361.9/2922 381.1/3154 328.2/3607
460.1/3452 261.1/3148 571.8/2913 361/2684
427/2707 391.1/3611
359.1/3514 298.1/3182
380.2/3660 327.2/3420 388/3959
384.2/3999 533.3/3171 328.2/3376
449.1/3290
354.2/3599
410.3/3944 561.3/2910
360.8/2915572.8/2915
246.7/3295 424.3/4002
358.2/3832 362.1/3165 573.8/2921
345.2/3388 326.2/3420
359/3225
569.8/2919
574.8/2915 273.1/3655
423.2/3253 349.2/3278 523.2/3459
351.9/2564 423.2/3272
283.1/3636 534.3/3864 329.2/3611
357.2/3833
396.2/3862 323.1/3391 440.2/4060 323.2/3392 380.1/3154
445.1/2683
356.2/3830 382.1/3246 348.1/3420
348.1/3288
370.2/3918 438.2/4074
576/3212
349.1/3290412.2/3944322.1/3394 378.2/3831
410.2/3938
−0.06
301.2/3391 302.2/3389
300.2/3390
533.2/2870 ● ● 396.2/3862
● ● 349.1/3290
● 301.2/3391
● 324/3268
556.2/3444 ● 548.1/3425
●
359/3507 ● 300.2/3390
● ● 358.2/3832
0.04
392.1/3631
532.2/3458
● ●
● 412.2/3944
● 382.1/3246
● 322.1/3394
● 523.2/3459
● 359/3507
● 460.1/3452
0.02
● 302.2/3389
● 472/2954
● 533.2/2870
● 410.2/3938
● 466.2/3658
● 323.1/3391
PC2
0.00
● 532.2/3458
● 556.2/3444
● 362.1/3165
● 392.1/3631
−0.02
466.2/3658
●
−0.04
460.1/3452
●
358.2/3832362.1/3165 523.2/3459
357.2/3833 ●
●
396.2/3862 323.1/3391
●
356.2/3830
● 382.1/3246
●322.1/3394
● 412.2/3944
349.1/3290 410.2/3938
−0.06
● ●
● ● ●
301.2/3391 ●
302.2/3389
300.2/3390 ●
● ●
38
PLS-DA (I)
39
PLS-DA (II)
40
PLS-DA (III)
• Note of caution
– Supervised classification methods are powerful.
– BUT, they can overfit your data, severely.
41
Machine Learning
Clustering
Classification
42
Clustering
43
Hierarchical clustering (I)
1. Find the two closest objects and merge them into a cluster
2. Find and merge the next two closest objects (or an object
and a cluster, or two clusters)
3. Repeat step 2 until all objects have been clustered
44
Hierarchical clustering (II)
⇥
1
) d(Ci , Cj ) = max d(x, y ) 1 ⇧ 1 ⇧ d(Ci , Cj ) = d(x, y )
x Ci ,y Cj d(Ci , Cj ) = d ⇤ x, y⌅ NCi NCj
x Ci ,y Cj
NCi NCj
x Ci y Cj
45
Hierarchical clustering (III)
46
Classification
48
For in-depth statistical analysis and data interpretation,
please make an appointment with a biostatistician.
49
Thank you!
50