0% found this document useful (0 votes)
80 views82 pages

Mixed Analysis of Variance Models With Spss

This document provides an outline for a discussion on mixed analysis of variance models with SPSS. It begins by classifying effects as either fixed or random and discusses examples. It then explains the differences between between-subject and within-subject effects. The general linear model is introduced and its assumptions are listed. Finally, some diagnostic tests for validating the assumptions are mentioned.

Uploaded by

lgovindarajan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views82 pages

Mixed Analysis of Variance Models With Spss

This document provides an outline for a discussion on mixed analysis of variance models with SPSS. It begins by classifying effects as either fixed or random and discusses examples. It then explains the differences between between-subject and within-subject effects. The general linear model is introduced and its assumptions are listed. Finally, some diagnostic tests for validating the assumptions are mentioned.

Uploaded by

lgovindarajan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 82

Mixed Analysis of

Variance Models with


SPSS
Robert A.Yaffee, Ph.D.
Statistics, Social Science, and Mapping
Group
Information Technology Services/Academic
Computing Services
Office location: 75 Third Avenue, Level C-3
Phone: 212-998-3402

1
Outline

1. Classification of Effects
2. Random Effects
1. Two-Way Random Layout
2. Solutions and estimates
3. General linear model
1. Fixed Effects Models
1. The one-way layout
4. Mixed Model theory
1. Proper error terms
5. Two-way layout
6. Full-factorial model
1. Contrasts with interaction terms
2. Graphing Interactions
2
Outline-Cont’d

• Repeated Measures ANOVA


• Advantages of Mixed Models
over GLM.

3
Definition of Mixed
Models by their
component effects
1. Mixed Models contain both
fixed and random effects
2. Fixed Effects: factors for
which the only levels under
consideration are contained
in the coding of those effects
3. Random Effects: Factors for
which the levels contained in
the coding of those factors
are a random sample of the
total number of levels in the
population for that factor.

4
Examples of Fixed and
Random Effects
1. Fixed effect:
2. Sex where both male and
female genders are included
in the factor, sex.
3. Agegroup: Minor and
Adult are both included in the
factor of agegroup
4. Random effect:
1. Subject: the sample is a
random sample of the target
population

5
Classification of effects
1. There are main effects:
Linear Explanatory Factors
2. There are interaction effects:
Joint effects over and above
the component main effects.

6
Interactions are Crossed Effects
All of the cells are filled
Each level of X is crossed with each level of Y

Variable Y

Level Level Level Level


1 2 3 4

Level 1
X11 X12 X13 X14

Variable X
Level 2
X21 X22 X23 X24

X31 X32 X33 X34


Level 3

7
Classification of Effects-
cont’d
Hierarchical designs have nested
effects. Nested effects are
those with subjects within
groups.
An example would be patients
nested within doctors and
doctors nested within hospitals
This could be expressed by
patients(doctors)
doctors(hospitals)

8
Nesting of patients within
Doctors and Doctors within
Hospitals

Hospital 1 Hospital 2

Doctor1 Doctor Doctor Doctor 5


Doctor 4
2 3

Pat 1 Pat 2 Pat 3 Pat 4 Pat 5 Pat 6 Pat 7 Pat 8

9
Between and Within-
Subject effects
• Such effects may sometimes be fixed or random. Their
classification depends on the experimental design
Between-subjects effects are those who are in one group or
another but not in both. Experimental group is a fixed effect
because the manager is considering only those groups in his
experiment. One group is the experimental group and the
other is the control group. Therefore, this grouping

factor is a between- subject effect.


Within-subject effects are experienced by subjects
repeatedly over time. Trial is a random effect when
there are several trials in the repeated measures
design; all subjects experience all of the trials. Trial is
therefore a within-subject effect.
Operator may be a fixed or random effect, depending
upon whether one is generalizing beyond the sample
If operator is a random effect, then the
machine*operator interaction is a random effect.
There are contrasts: These contrast the values of one
level with those of other levels of the same effect.

10
Between Subject
effects
• Gender: One is either male or
female, but not both.
• Group: One is either in the
control, experimental, or the
comparison group but not more
than one.

11
Within-Subjects Effects

• These are repeated effects.


• Observation 1, 2, and 3 might
be the pre, post, and follow-up
observations on each person.
• Each person experiences all of
these levels or categories.
• These are found in repeated
measures analysis of variance.

12
Repeated Observations
are Within-Subjects
effects

Repeated Measures
Design

Trial 1 Trial 2 Trial 3

Experimental
Experimental
Group Experimental Group
Group
Pre-test Follow-up
Post-test

Group

Control Group Control Group Control Group


Pre-test Post-test Follow-up

Group is a between subjects effect, whereas Trial is a


within subjects effect.
13
The General
Linear Model
1. The main effects general
linear model can be
parameterized as
Yij     i  b j   ij

where
Yij  observation for ith 
  grand mean (an unknown fixed parm)
 i  effect of ith value of  (ai   )
b j  effect of jth value of b (b j   )
 ij  exp erimental error ~ N (0,  2 )

14
A factorial model

If an interaction term were included,


the formula would be

yij     i   i   ij  eij
The interaction or crossed effect is the joint effect, over and
above the individual main effects. Therefore, the main effects
must be in the model for the interaction to be properly
specified.

 i j  ( yij   )        (    )
 yij      

15
Higher-Order
Interactions
If 3-way interactions are in the
model, then the main effects
and all lower order interactions
must be in the model for the 3-
way interaction to be properly
specified. For example, a
3-way interaction model would
be:
yijk    ai  b j  ck  abij  acik  bc jk
 abcijk  eijk

16
The General Linear
Model
• In matrix terminology, the
general linear model may be
expressed as

Y  X  
where
Y  the observed data vector
X  the design matrix
  the vector of unknown fixed effect parameters
  the vector of errors

17
Assumptions

Of the general linear model

E ( )  0
var( )   I 2

var(Y )   I 2

E (Y ) X 
18
General Linear Model
Assumptions-cont’d
1. Residual Normality.
2. Homogeneity of error variance
3. Functional form of Model:
Linearity of Model
4. No Multicollinearity
5. Independence of observations
6. No autocorrelation of errors
7. No influential outliers
We have to test for these to be sure that the model is
valid.
We will discuss the robustness of the model in face
of violations of these assumptions.
We will discuss recourses when these assumptions are 19
violated.
Explanation of these
assumptions
1. Functional form of Model: Linearity of
Model: These models only analyze the
linear relationship.
2. Independence of observations
3. Representativeness of sample
4. Residual Normality: So the alpha
regions of the significance tests are
properly defined.
5. Homogeneity of error variance: So the
confidence limits may be easily found.
6. No Multicollinearity: Prevents efficient
estimation of the parameters.
7. No autocorrelation of errors:
Autocorrelation inflates the R2 ,F and t
tests.
8. No influential outliers: They bias the
parameter estimation.

20
Diagnostic tests for these
assumptions
1. Functional form of Model:
Linearity of Model: Pair plot
2. Independence of observations:
Runs test
3. Representativeness of sample:
Inquire about sample design
4. Residual Normality: SK or SW
test
5. Homogeneity of error variance
Graph of Zresid * Zpred
6. No Multicollinearity: Corr of X
7. No autocorrelation of errors: ACF
8. No influential outliers: Leverage
and Cook’s D.

21
Testing for outliers

Frequencies analysis of stdres


cksd.
Look for standardized residuals
greater than 3.5 or less than –
3.5
• And look for Cook’s D.

22
Studentized Residuals

ei
ei 
s

s 2 (i ) (1  hi )
where
ei s  studentized residual
s( i )  standard deviation where ith obs is deleted
hi  leverage statistic

Belsley et al (1980) recommend the use of studentized


Residuals to determine whether there is an outlier.

23
Influence of Outliers

1. Leverage is measured by the


diagonal components of the
hat matrix.
2. The hat matrix comes from
the formula for the regression
of Y.
Yˆ  X   X '( X ' X ) 1 X 'Y
where X '( X ' X ) 1 X '  the hat matrix, H
Therefore,
Yˆ  HY

24
Leverage and the Hat
matrix
1. The hat matrix transforms Y into the
predicted scores.
2. The diagonals of the hat matrix indicate
which values will be outliers or not.
3. The diagonals are therefore measures
of leverage.
4. Leverage is bounded by two limits: 1/n
and 1. The closer the leverage is to
unity, the more leverage the value has.
5. The trace of the hat matrix = the
number of variables in the model.
6. When the leverage > 2p/n then there is
high leverage according to Belsley et
al. (1980) cited in Long, J.F. Modern
Methods of Data Analysis (p.262). For
smaller samples, Vellman and Welsch
(1981) suggested that 3p/n is the
criterion.

25
Cook’s D

1. Another measure of
influence.
2. This is a popular one. The
formula for it is:
 1   hi   ei 2 
Cook ' s Di      2 
 p   1  hi   s (1  hi ) 

Cook and Weisberg(1982) suggested that values of


D that exceeded 50% of the F distribution (df = p, n-p)
are large.

26
Cook’s D in SPSS

Finding the influential outliers


Select those observations for which
cksd > (4*p)/n
Belsley suggests 4/(n-p-1) as a
cutoff
If cksd > (4*p)/(n-p-1);

27
What to do with outliers

1. Check coding to spot typos


2. Correct typos
3. If observational outlier is correct,
examine the dffits option to see
the influence on the fitting
statistics.
4. This will show the standardized
influence of the observation on
the fit. If the influence of the
outlier is bad, then consider
removal or replacement of it with
imputation.

28
Decomposition of the
Sums of Squares
1. Mean deviations are computed
when means are subtracted from
individual scores.
1. This is done for the total, the
group mean, and the error terms.
2. Mean deviations are squared and
these are called sums of squares
3. Variances are computed by
dividing the Sums of Squares by
their degrees of freedom.
4. The total Variance = Model
Variance + error variance

29
Formula for Decomposition
of Sums of Squares

yi j  y  ( yij  y. j )  ( y. j  y ..)
total effect  error within  model effect

we square the terms


( yi j  y ) 2  ( yij  y. j ) 2  ( y. j  y ..) 2
and sum them over the data set
 ij
( y  y ) 2
  ij . j  . j
( y  y ) 2
 ( y  y ..) 2

SS total  SSerror  Group SS


where SS  Sums of Squares
SS total = SS error + SSmodel

30
Variance
Decomposition
Dividing each of the sums of
squares by their respective
degrees of freedom yields the
variances.
SStotal SSerror SSmodel
 
n1 nk k 1
Total variance= error variance
+ model variance.
model variance
F in fixed effects models 
error variance

31
Proportion of Variance
Explained
R2 = proportion of variance
explained.
SStotal = SSmodel + SSerrror
Divide all sides by SStotal

SSmodel/SStotal
=1 - SSError/SStotal

R2=1 - SSError/SStotal

32
The Omnibus F test

The omnibus F test is a test that all of the means of the


levels of the main effects and
as well as any interactions specified
are not significantly different from one another.

Suppose the model is a one way anova on breaking


pressure of bonds of different metals.

Suppose there are three metals: nickel, iron, and


Copper.

H0: Mean(Nickel)= mean (Iron) = mean(Copper)


Ha: Mean(Nickel) ne Mean(Iron) or
Mean(Nickel) ne Mean(Copper)
or Mean(Iron) ne Mean(Copper)

33
Testing different Levels of
a Factor against one
another
• Contrast are tests of the mean
of one level of a factor against
other levels.

H 0 : 1   2   3
1   2

Ha : 2  3
  
 1 3

34
Contrasts-cont’d

• A contrast statement computes

  ' ˆ  1    
  L '( L ' V L) L   
 Z  Z 
F
rank ( L)

The estimated V- is the generalized inverse of the


coefficient matrix of the mixed model.
The L vector is the k’b vector.

The numerator df is the rank(L) and the denominator


df is taken from the fixed effects table unless otherwise
specified.

35
Construction of the F
tests in different models
The F test is a ratio of two variances (Mean Squares).
It is constructed by dividing the MS of the effect to be
tested by a MS of the denominator term. The division
should leave only the effect to be tested left over as a remainder.

A Fixed Effects model F test for a = MSa/MSerror.


A Random Effects model F test for a = MSa/MSab
A Mixed Effects model F test for b = MSa/MSab
A Mixed Effects model F test for ab = MSab/MSerror

36
Data format

• The data format for a GLM is that


of wide data.

37
Data Format for Mixed
Models is Long

38
Conversion of Wide to
Long Data Format
• Click on Data in the header bar
• Then click on Restructure in the
pop-down menu

39
A restructure wizard
appears
Select restructure selected variables into cases
and click on Next

40
A Variables to Cases: Number of
Variable Groups dialog box appears.
We select one and click on next.

41
We select the repeated
variables and move them
to the target variable box

42
After moving the repeated variables into the target variable
box, we move the fixed variables into the Fixed variable
box, and select a variable for case id—in this case, subject.
Then we click on Next

43
A create index variables dialog box
appears. We leave the number of index
variables to be created at one and click on
next at the bottom of the box

44
When the following box
appears we just type in
time and select Next.

45
When the options dialog box appears, we select the
option for dropping variables not selected.
We then click on Finish.

46
We thus obtain our data
in long format

47
The Mixed Model

The Mixed Model uses long data format.


It includes fixed and random effects.
It can be used to model merely fixed or
random effects, by zeroing out the
other parameter vector.
The F tests for the fixed, random, and
mixed models differ.
Because the Mixed Model has the
parameter vector for both of these
and can estimate the error covariance
matrix for each, it can provide the
correct standard errors for either the
fixed or random effects.

48
The Mixed Model
y  X   Z  
where
  fixed effects parameter estimates
X  fixed effects
Z  Random effects parameter estimates
  random effects
  errors

Variance of y  V  ZGZ '  R


G and R require covariance structure
fitting
49
Mixed Model Theory-
cont’d
Little et al.(p.139) note that u and e are
uncorrelated random variables with 0
means and covariances, G and R,
respectively.
Because the
covariance matrix
V  ZGZ '  R,
the solution for
ˆ  ( X 'Vˆ 1 X )  X 'Vˆ 1 y
ˆ 'Vˆ  ( y  X ˆ )
u  GZ
V- is a generalized inverse. Because V is
usually singular and noninvertible AVA = V- is
an augmented matrix that is invertible. It can
later be transformed back to V.
The G and R matrices must be positive definite.
In the Mixed procedure, the covariance type of
the random (generalized) effects defines the
50
structure of G and a repeated covariance type
defines structure of R.
Mixed Model
Assumptions
A linear relationship between dependent and
independent variables

u 
E 0
 

u  G 0
Variance    0
   R 

51
Random Effects
Covariance Structure
• This defines the structure of
the G matrix, the random
effects, in the mixed model.
• Possible structures permitted
by current version of SPSS:
– Scaled Identity
– Compound Symmetry
– AR(1)
– Huynh-Feldt

52
Structures of Repeated
effects (R matrix)-cont’d
Variance Components
 1 2 0 0 0 
 
 0  2
2
0 0 
  
0 0  2
0
 3  1   2  3 
0 0 0  4 2   2 
 1  
AR(1)   2
  1 
 
  3  2  1 

Compound Symmetry
 2   1 2  1 2  1 2  1 2 
 2 2 
 1    2  1  1 
2 2 2

  2 2 

 1  1
2
 2
  3
2
 1 
 1  1
2 2
 1    4 
2 2 2

53
Structures of Repeated
Effects (R matrix)

Huynh  Feldt 
 2  12   2 2  12   3 2 
 1   
 2 2 2 
 2   1 2
2 3
2 2

  2 2
 
2 2
 2 
 3   1 3 2
2 2 2

   3 2

 2 2 

54
Structures of Repeated
effects (R matrix) –con’td

unstructured 

 1  1 2 12  1 3 13
2

 
 2 1 21  2  2 3 23
2

       2 
 3 1 31 3 2 32 3 

55
R matrix, defines the
correlation among
repeated random effects
 1   2  1 1 
 
 1 1   1
2

 
 1   1  1   2

 . 
R 
 . 
    2
1 1 
 1 
 1 1   2 1 
 
 1 1  1   2 

One can specify the nature of the correlation among the


repeated random effects.

56
GLM  Mixed Model

The General Linear Model is a special case of the


Mixed Model with Z = 0 (which means that
Zu disappears from the model) and R   2 I

57
Mixed Analysis of a Fixed
Effects model

SPSS tests these fixed effects just as it does with the GLM
Procedure with type III sums of squares.

We analyze the breaking pressure of bonds made


from three metals. We assume that we do not
generalize beyond our sample and that our
effects are all fixed.

Tests of Fixed Effects is performed with the help of


the L matrix by constructing the following F test:

 ' L '[ L( X 'V X ) L '] L ˆ


ˆ ˆ 1  
F
rank ( L)
Numerator df = rank(L)
Denominator df = RESID (n-rank(X) 58
df = Satherth
Estimation: Newton
Scoring

 i  1   i  sH 1 g
where
g  gradient matrix of 1st derivatives
H  Hessian matrix of 2nd derivatives
s  increment of step parameter

59
Estimation: Minimization
of the objective functions

Using Newton Scoring, the following functions are


minimized

1 n n
ML(G, R ):  log | V |  log r 'V 1r  (1  log(2 / n))
2 2 2
1 n
REML(G, R ) : log | V |  log | X 'V 1 X |
2 2
n p n p
 log r 'V 1r   1  log | 2 /(n  p) |
2 2
where r  y  X ( X 'V 1 X )  X 'V 1 y
p  rank of X

so that the probabilities of


ˆ  ( X 'V 1 X )  X 'V 1 y and
ˆ  (GZ 'V 1 ( y  X '  ) are maximized .

60
Significance of
Parameters
 
L   is a linear combination
 

 
Ho :    0
 

 
 
t  
LCL '
where
 X ' R 1 X X ' R 1Z 
C  1 
 Z ' R X ZR 1Z '  G 1 

61
Test one covariance
structure against the other
with the IC
• The rule of thumb is smaller is
better
• -2LL
• AIC Akaike
• AICC Hurvich and Tsay
• BIC Bayesian Info Criterion
• Bozdogan’s CAIC

62
Measures of Lack of fit:
The information Criteria
-2LL is called the deviance. It is a
measure of sum of squared errors.
AIC = -2LL + 2p (p=# parms)
BIC = Schwartz Bayesian Info
criterion = 2LL + plog(n)
AICC= Hurvich and Tsay’s small
sample correction on AIC: -2LL +
2p(n/(n-p-1))
CAIC = -2LL + p(log(n) + 1)

63
Procedures for Fitting the
Mixed Model
• One can use the LR test or the
lesser of the information criteria.
The smaller the information
criterion, the better the model
happens to be.
• We try to go from a larger to a
smaller information criterion
when we fit the model.

64
LR test

1. To test whether one model is


significantly better than the
other.
2. To test random effect for
statistical significance
3. To test covariance structure
improvement
4. To test both.  2
5. Distributed as a
6. With df= p2 – p1 where pi =#
parms in model i
65
Applying the LR test

• We obtain the -2LL from the


unrestricted model.
• We obtain the -2LL from the
restricted model.
• We subtract the latter from the
larger former.
• That is a chi-square with df=
the difference in the number of
parameters.
• We can look this up and
determine whether or not it is
statistically significant.
66
Advantages of the Mixed
Model
1. It can allow random effects to be
properly specified and computed,
unlike the GLM.
2. It can allow correlation of errors,
unlike the GLM. It therefore has
more flexibility in modeling the error
covariance structure.
3. It can allow the error terms to exhibit
nonconstant variability, unlike the
GLM, allowing more flexibility in
modeling the dependent variable.
4. It can handle missing data, whereas
the repeated measures GLM cannot.
67
Programming A Repeated
Measures ANOVA with
PROC Mixed

Select the Mixed Linear Option in Analysis


68
Move subject ID into the
subjects box and the
repeated variable into the
repeated box.

69
Click on continue
We specify subjects and
repeated effects with the
next dialog box

We set the repeated covariance type to “Diagonal”


70
& click on continue
Defining the Fixed
Effects
When the next dialog box appears,
we insert the dependent Response
variable and the fixed effects of
anxiety and tension

71
Click on continue
We select the Fixed
effects to be tested

72
Move them into the model box,
selecting main effects, and type III
sum of squares

Click on continue
73
When the Linear Mixed
Models dialog box
appears, select random

74
Under random effects, select
scaled identity as covariance
type and move subjects over into
combinations

Click on continue
75
Select Statistics and check of the
following in the dialog box that
appears

Then click continue


76
When the Linear Mixed
Models box appears, click
ok

77
You will get your tests

78
Estimates of Fixed effects
and covariance
parameters

79
R matrix

80
Rerun the model with
different nested covariance
structures and compare the
information criteria

The lower the information criterion, the better fit


the nested model has. Caveat: If the models are
not nested, they cannot be compared with the
information criteria.

81
GLM vs. Mixed
GLM has
means
lsmeans
sstype 1,2,3,4
estimates using OLS or WLS
one has to program the correct F tests for random
effects.
losses cases with missing values.
Mixed has
lsmeans
sstypes 1 and 3
estimates using maximum likelihood, general methods
of moments, or restricted maximum likelihood
ML
MIVQUE0
REML
gives correct std errors and confidence intervals for
random effects
Automatically provides correct standard errors for
analysis.
Can handle missing values

82

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy