0% found this document useful (0 votes)
28 views3 pages

Lec9 Manova

Uploaded by

Enzo Garabatos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views3 pages

Lec9 Manova

Uploaded by

Enzo Garabatos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Lecture 9.

MANOVA
We have considered testing mean difference for two multivariate normal samples in Lec-
ture 3. Let X11 , . . . , X1n1 be i.i.d. Np (µ1 , Σ) and X21 , . . . , X2n2 be i.i.d. Np (µ2 , Σ). Consider
testing
H0 : µ1 = µ2 , H1 : µ1 6= µ2 , Σ is unknown.
Since
n1 + n2
X̄1 − X̄2 ∼ Np (µ1 − µ2 , Σ),
n1 n2
(n1 + n2 − 2)SP ∼ Wp (n1 + n2 − 2, Σ),

we have Hotelling’s T 2 statistic for two-sample problem


n1 n2
T 2 (n1 + n2 − 2) = (X̄1 − X̄2 )0 S−1
P (X̄1 − X̄2 ),
n1 + n2
and by Theorem 5 in lecture 2
n1 + n2 − p − 1 2
T (n1 + n2 − 2) ∼ Fp,n1 +n2 −p−1 ,
(n1 + n2 − 2)p
under the null hypothesis. The likelihood ratio statistic is a monotone function of T 2 (n1 +
n2 − 2).
This extends the two-sample t-test for multivariate observations. When we have several
(pre-determined) groups of samples, one wish to test whether there are any difference between
the group means. If a sample is univariate, ANOVA (Analysis of Variance) is the statistical
method for such situation. MANOVA (Multivariate ANOVA) is the multivariate analogue
of ANOVA.
Suppose we have K groups of observations and Xki ∼ Np (µk , Σ). Here Xki is the ith
observation from the kth group. We assume there are nk observations in the kth group and
n = n1 + n2 + · · · + nK observations altogether.
The hypotheses are

H0 : µ1 = · · · = µK , H1 : not H0 (µ1 , . . . , µK are not all equal.)

Similar to one-way ANOVA, the P variance-covariance matrix of Xki is decomposed into


treatment and error. Let X̄k. = n1k ni=1 k
Xki denote the sample mean vector of the kth group
and X̄.. = n1 K
P Pnk
k=1 i=1 Xki the overall mean of the observations. We have the decomposition
XX X XX
(Xki − X̄.. )(Xki − X̄.. )0 = nk (X̄k. − X̄.. )(X̄k. − X̄.. )0 + (Xki − X̄k. )(Xki − X̄k. )0 ,
k i k k i
T = B + W,

with degrees of freedom (n − 1) = (K − 1) + (n − K).


One can then build an one-way MANOVA table, see Table 1. If this were a usual ANOVA
table, we would have two more columns for
Source d.f. Sum of Squares Mean Squares
Treatment (between groups) K −1 B B/(K − 1)
Error (within groups) n−K W SP = W/(n − K)
Total n−1 T S = T/n − 1
Table 1: one-way MANOVA table

• the F value (by dividing MS(Treatment) by MS(Error)), and

• the p-value,

which makes the table useful. Unlike the ANOVA table, the one-way MANOVA table consists
of matrix-valued sum of squares (T, B, W are p × p matrices.) MANOVA uses the p × p
matrix W−1 B as an analogue of the F value.
The matrix-valued statistic W−1 B is in fact closely related to the likelihood ratio statistic.
One can show that the m.l.e. of Σ is S0 = T/n under H0 and S1 = W/n = SP n−k n
under
H1 . The likelihood ratio statistic is
|S0 | |T| |W + B|
W = n log = n log = n log ,
|S1 | |W| |W|

which has the distribution χ2p(K−1) , approximately for large n. Bartlett’s modification for
the likelihood ratio statistic is
1 |W + B|
W ∗ = [(n − 1) − (p + K)] log ,
2 |W|

The hypothesis testing based on W ∗ (or on W ) is called Wilk’s test. Now see that W is an
increasing function of
|W + B|
U= = |Ip + W−1 B|.
|W|
Moreover the statistic U is a function of the eigenvalues λi of W−1 B, since
p
Y
−1
U = |Ip + W B| = (1 + λi ).
i=1

A test based on the test statistic U is also called Wilk’s test.


There are, however, several other test statistics which may be defined in terms of the
eigenvalues of W−1 B. Widely used statistics are:
Q
i (1 + λi ) (Wilk)
max{λ
P i} (Roy)
Pi λi (Lawley-Hotelling)
Qi i λ /(1 + λi ) (Pillai)
i λi /(1 + λi ) (Roy-Gnanadesikan-Srivastava)

2
Two bad ideas

Situation I A physician has a multivariate dataset consisting of patient information (all quantita-
tive) related to a disease. He has a strong belief that there are two or three subtypes
of the disease. Accordingly, a statistician helps him finding clusters of the dataset
(cluster analysis). Three clusters were found, each of them is nicely interpreted with
distinct characteristics. He goes further by testing whether the mean vectors of these
clusters are different or not. The p-value is extremely small. Does it mean that there
are actually three clusters?

Situation II The effects of two drug compounds are compared with baseline (or placebo), measured
by differentially expressed genes of lab mice. Suppose we have a sample of size n = 120
in total. The number of genes is p > 20, 000, where the usual MANOVA is not
applicable. (why?) Initial dimension reduction leads d = 10 principal components.
MANOVA is applied to this dataset of size 10 × 120. The p-value suggests that the
difference in effects of drug compounds is statistically significant. A biologist is further
interested in a list of genes that are responsible for the difference. For more than 20,000
genes, applications of ANOVA for each of the genes (variables) lead more than 1,000
genes with p-value less than 0.01. Does it mean that all 1,000 genes are important?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy