0% found this document useful (0 votes)
19 views416 pages

Datos Categóricos

The document discusses the analysis of repeated categorical data, focusing on generalized linear models and linear mixed models for longitudinal data. It includes case studies such as analgesic and vaccination trials, as well as surrogate markers in cancer research. The content is structured into sections covering theoretical models, case studies, and statistical methods for data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views416 pages

Datos Categóricos

The document discusses the analysis of repeated categorical data, focusing on generalized linear models and linear mixed models for longitudinal data. It includes case studies such as analgesic and vaccination trials, as well as surrogate markers in cancer research. The content is structured into sections covering theoretical models, case studies, and statistical methods for data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 416

Repeated Categorical Data

Analysis

Geert Molenberghs
Ariel Alonso Abad
Fabián Santiago Tibaldi

CenStat, Limburgs Universitair Centrum

Cuba, December 2001

In collaboration with: Didier Renard, Geert Verbeke


Contents

1 Reading 1

1.1 Basic References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Generalized Linear Models 1

2.1 GLM for Independent Responses: Review . . . . . . . . . . . . . . . . . . . . . 2

2.2 Example: Normal Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Example: Bernoulli Logistic Model . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Case Study: Analgesic Trial 9

3.1 Observed Frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2 Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3 Missingness Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3.1 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

ii
CONTENTS iii

3.4 Summary Table: Logit Link . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Linear (Mixed) Models for Longitudinal Data 22

4.1 Correlated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2 Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2.1 EXAMPLE: Reading ability and age . . . . . . . . . . . . . . . . . . . . 25

4.2.2 EXAMPLE: CD4+ cell counts . . . . . . . . . . . . . . . . . . . . . . . 26

4.2.3 EXAMPLE: Weight of Pigs . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3 Scientific Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.4 Merits of Longitudinal Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.5 Advantages of Longitudinal Studies . . . . . . . . . . . . . . . . . . . . . . . . 30

4.6 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.7 Types of Longitudinal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.7.1 Balanced Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.7.2 Unbalanced Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.8 Types of Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.9 Components of Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.10 Full Multivariate Model For Balanced Data . . . . . . . . . . . . . . . . . . . . 35

4.10.1 Case Study: Growth Data . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.10.2 Model formulation and estimation . . . . . . . . . . . . . . . . . . . . . 39


CONTENTS iv

4.10.3 Example: Growth data . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.11 Linear Mixed Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.12 Inference in the marginal linear mixed model . . . . . . . . . . . . . . . . . . . 64

4.12.1 The hierarchical versus marginal model . . . . . . . . . . . . . . . . . . 64

4.12.2 Notation and terminology . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.12.3 The Autocorrelation Function . . . . . . . . . . . . . . . . . . . . . . . 67

4.12.4 The Variogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.13 Empirical Bayes Methods for the Random Effects . . . . . . . . . . . . . . . . . 70

4.13.1 Estimation of the Random Effects bi . . . . . . . . . . . . . . . . . . . 70

 . . . . . . . . . . . . . . . . . . . . . . .
4.13.2 Empirical Bayes estimates b 71
i

 . . . . . . . . . . . . . . . . . . . . . . . . . .
4.13.3 Shrinkage estimators b 73
i

5 Case Study: Vaccination Trial 75

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.2 Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.3 Selection of a Covariance Structure . . . . . . . . . . . . . . . . . . . . . . . . 80

5.3.1 Note: Log-Linear Variance Model . . . . . . . . . . . . . . . . . . . . . 81

5.4 Simplification of the Mean Structure . . . . . . . . . . . . . . . . . . . . . . . 83

5.4.1 Note: Fractional Polynomials . . . . . . . . . . . . . . . . . . . . . . . 84

5.5 How Does the Model Fit ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87


CONTENTS v

5.6 Prediction From the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6 Case Study: Surrogate Markers 91

6.1 Age-Related Macular Degeneration . . . . . . . . . . . . . . . . . . . . . . . . 92

6.2 Advanced Ovarian Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.3 Advanced Colorectal Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.4 Definition of Surrogate Endpoint . . . . . . . . . . . . . . . . . . . . . . . . . 100

6.5 ARMD: Prentice’s Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.6 Proportion Explained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.7 Criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6.8 Relative Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6.9 Adjusted Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.10 Use of RE and Adjusted Association . . . . . . . . . . . . . . . . . . . . . . . 106

6.11 Analysis Based on Several Trials . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.12 Statistical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.13 Methods of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

6.14 ARMD: Trial-Level Surrogacy . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.15 ARMD: Individual-Level Surrogacy . . . . . . . . . . . . . . . . . . . . . . . . 114

6.16 Ovarian: Trial-Level Surrogacy . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6.17 Ovarian: Individual-Level Surrogacy . . . . . . . . . . . . . . . . . . . . . . . . 117


CONTENTS vi

6.18 Ovarian: Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.19 Colorectal: Trial-Level Surrogacy . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.20 Colorectal: Individual-Level Surrogacy . . . . . . . . . . . . . . . . . . . . . . . 120

7 Case Study: The prostate Cancer Data 121

7.1 The use of PSA to detect prostate cancer . . . . . . . . . . . . . . . . . . . . . 121

7.1.1 The Baltimore Longitudinal Study of Aging (BLSA) . . . . . . . . . . . 123

7.1.2 The prostate data from the BLSA . . . . . . . . . . . . . . . . . . . . . 124

7.1.3 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

7.2 A two-stage model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

7.2.1 General idea of two-stage models . . . . . . . . . . . . . . . . . . . . . 126

7.2.2 Applied to the prostate data . . . . . . . . . . . . . . . . . . . . . . . . 127

7.2.3 Matrix notation for two-stage models: . . . . . . . . . . . . . . . . . . . 131

7.3 Linear mixed models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

7.3.1 Stage 1 + Stage 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

7.4 Fitting linear mixed models in SAS . . . . . . . . . . . . . . . . . . . . . . . . 133

7.5 Inference for contrasts of fixed effects . . . . . . . . . . . . . . . . . . . . . . . 147

7.5.1 The CONTRAST statement . . . . . . . . . . . . . . . . . . . . . . . . 147

7.5.2 The ESTIMATE statement . . . . . . . . . . . . . . . . . . . . . . . . 157

7.6 Inference for variance components . . . . . . . . . . . . . . . . . . . . . . . . . 160


CONTENTS vii

7.6.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

7.6.2 Wald tests for variance components . . . . . . . . . . . . . . . . . . . . 161

7.6.3 Likelihood ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

8 Parametric Modeling Families 165

8.1 Continuous Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

8.1.1 Marginal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

8.1.2 Random-Effects Models . . . . . . . . . . . . . . . . . . . . . . . . . . 166

8.1.3 Transition Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

8.2 Longitudinal Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . 168

8.2.1 Marginal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

8.2.2 Random-effects Models . . . . . . . . . . . . . . . . . . . . . . . . . . 171

8.2.3 Conditional Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

8.2.4 Marginal Versus Conditional Models . . . . . . . . . . . . . . . . . . . . 173

8.3 Main Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

9 Modelling Repeated Categorical Data 175

9.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

9.1.1 The Standard (Regression) Notation . . . . . . . . . . . . . . . . . . . 176

9.1.2 The Table Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177


CONTENTS viii

9.2 A Conditional Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

9.2.1 Interpretation of Parameters . . . . . . . . . . . . . . . . . . . . . . . . 180

9.3 Marginal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

9.3.1 Link Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

10 Case Study: NTP Data 188

10.1 Data Structure of Developmental Toxicity Studies . . . . . . . . . . . . . . . . 189

10.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

10.3 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

10.4 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

10.5 Quadratic Log-linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

10.6 The quadratic exponential model . . . . . . . . . . . . . . . . . . . . . . . . . 193

10.7 The linear exponential model . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

10.8 Specialized to Clustered Binary Data . . . . . . . . . . . . . . . . . . . . . . . 194

10.9 Quadratic Clustered Loglinear Model . . . . . . . . . . . . . . . . . . . . . . . 195

10.10The Bahadur Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

11 Generalized Estimating Equations 197

11.1 Large Sample Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

11.2 Unknown Covariance Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 202


CONTENTS ix

11.3 The Sandwich Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

11.4 The Working Correlation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 205

11.4.1 Estimation of Working Correlation . . . . . . . . . . . . . . . . . . . . . 206

11.5 Fitting GEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

11.6 The NTP Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

11.6.1 PROC GENMOD Code . . . . . . . . . . . . . . . . . . . . . . . . . . 210

11.6.2 Discussion of Program . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

11.6.3 PROC GENMOD Output . . . . . . . . . . . . . . . . . . . . . . . . . 213

11.6.4 Discussion of Output . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

11.7 GEE: Alternative 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

11.7.1 gee1corr.mac Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

11.7.2 gee1corr.mac Output . . . . . . . . . . . . . . . . . . . . . . . . . . 224

11.8 GEE: Alternative 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

11.8.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

11.8.2 The Variance Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 228

11.8.3 GLIMMIX Macro Code . . . . . . . . . . . . . . . . . . . . . . . . . . 229

11.8.4 Discussion of Macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

11.8.5 GLIMMIX Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

11.8.6 Discussion of Output . . . . . . . . . . . . . . . . . . . . . . . . . . . 239


CONTENTS x

11.9 Comparison of GEE Estimates (Standard Errors) . . . . . . . . . . . . . . . . . 241

11.10GEE2: Odds Ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

11.11GEE2: Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

11.11.1 The NTP Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

11.11.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

11.12Alternating Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 248

12 Case Study: Analgesic Trial 249

12.1 Comparison of GEE Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

12.2 Comparison of GEE Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

12.2.1 Fitted Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

12.3 Use of GLIMMIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

12.3.1 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

12.4 Alternating Logistic Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . 261

12.4.1 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

13 Random-Effects Models 266

13.1 The Marginal Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

13.2 Numerical Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

13.2.1 Adaptive Gaussian Quadrature . . . . . . . . . . . . . . . . . . . . . . . 268


CONTENTS xi

13.2.2 First Order Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

13.3 Estimation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

13.4 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

13.5 The Beta-binomial Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

13.5.1 The NTP Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

13.5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

13.6 Generalized Linear Mixed Models . . . . . . . . . . . . . . . . . . . . . . . . . 276

13.6.1 Quasi-likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . 277

13.6.2 Quasi-likelihood For Generalized Linear Mixed Models . . . . . . . . . . 279

13.6.3 Estimation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

13.7 Linear Mixed Model Using GLIMMIX . . . . . . . . . . . . . . . . . . . . . . . 282

13.7.1 Selected Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

13.7.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

13.7.3 GLIMMIX Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

13.7.4 Output From GLIMMIX . . . . . . . . . . . . . . . . . . . . . . . . . . 300

13.7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

13.8 The NTP Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

13.9 Transition Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

13.10Differences Between Families of Models . . . . . . . . . . . . . . . . . . . . . . 312


CONTENTS xii

14 Case Study: Analgesic Trial 313

14.1 PROC NLMIXED Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

14.1.1 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

14.1.2 Population Averaged Profiles . . . . . . . . . . . . . . . . . . . . . . . 317

14.1.3 Fitted Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

14.2 Comparison of Different Approaches/Programs . . . . . . . . . . . . . . . . . . 322

14.2.1 PROC NLMIXED (Gaussian quadrature and N-R) . . . . . . . . . . . . 322

14.2.2 MIXOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

14.2.3 PQL2 (MLwiN) without and with extra-dispersion parameter . . . . . . 327

14.2.4 PQL (MLwiN) without and with extra-dispersion parameter . . . . . . . 328

14.2.5 PQL (GLIMMIX) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

15 Analgesic Trial: Ordinal Data 332

15.1 Proportional odds model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

15.2 GEE Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336

15.3 Random-effects Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

16 Missing Data 343

16.1 Missing Data Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

16.2 The Name of the Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345


CONTENTS xiii

16.3 Factorizing the Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

16.4 Selection Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347

16.5 Missing Data Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

16.6 Ignorability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349

16.7 Ignorability ←− Separability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

16.8 Simple Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351

16.9 Three Likelihood Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 352

16.10An Ignorable Likelihood Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 353

16.11A Selection Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354

16.12Dropout Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355

16.13Contributions Combined . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

16.14A Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

16.15Pattern-Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358

16.16Pattern-Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359

16.17Pattern-Mixture Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360

16.18Estimating Marginal Effects From PMM . . . . . . . . . . . . . . . . . . . . . 361

16.19Random-Coefficient Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

16.20Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363

16.21Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
CONTENTS xiv

16.22Non-Normal Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365

16.23Pros and Cons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

16.24Less Parametric Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

17 Case Study: Analgesic Trial 368

17.1 Weighted GEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

17.2 Analgesic Trial Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370/1

17.2.1 Estimated working correlation structures: . . . . . . . . . . . . . . . . . 373/1

18 PROC NLMIXED 370

18.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370

18.2 Particularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371

18.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372

18.4 MIXOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373

19 Introduction to Multilevel Modeling 374

19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374

19.2 Multilevel Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 376

19.3 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378

19.3.1 IGLS Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379

19.3.2 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380


CONTENTS xv

19.4 Illustration of the IGLS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 381

19.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382

19.6 Multilevel Models for Discrete Response Data . . . . . . . . . . . . . . . . . . . 386

19.7 MQL/PQL Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387

20 The Use of SPlus 390

20.1 Fitting Mixed Models Using SPlus . . . . . . . . . . . . . . . . . . . . . . . . . 390


Chapter 1

Reading

1.1 Basic References

• Verbeke, G. and Molenberghs, G. (1997). Linear


Mixed Models In Practice: A SAS Oriented Approach,
Lecture Notes in Statistics 126. New York:
Springer-Verlag.
• Verbeke, G. and Molenberghs, G. (2000). Linear
Mixed Models for Longitudinal Data. New York:
Springer-Verlag.
• Fahrmeir, L. and Tutz, G. (2001). Multivariate
Statistical Modelling Based on Generalized Linear
Models (2nd edition), Springer Series in Statistics.
New York: Springer-Verlag.
• Diggle, P.J., Liang, K.Y. and Zeger, S.L. (1994).
Analysis of Longitudinal Data. Oxford: Oxford
1
CHAPTER 1. READING 2

University Press.

1.2 Further Reading


• Brown, H. and Prescott, R. (1999). Applied Mixed
Models in Medicine. Chichester: John Wiley.
• Crowder, M.J. and Hand, D.J. (1990). Analysis of
Repeated Measures. London: Chapman and Hall.
• Davidian, M. and Giltinan, D.M. (1995). Nonlinear
Models For Repeated Measurement Data. London:
Chapman and Hall.
• Goldstein, H. (1979). The Design and Analysis of
Longitudinal Studies. London: Academic Press.
• Goldstein, H. (1995). Multilevel Statistical Models.
London: Edward Arnold.
• Hand, D.J. and Crowder, M.J. (1995). Practical
Longitudinal Data Analysis. London: Chapman and
Hall.
• Jones, B. and Kenward, M.G. (1989). Design and
Analysis of Crossover Trials. London: Chapman and
Hall.
• Kshirsagar, A.M. and Smith, W.B. (1995). Growth
Curves. New York: Marcel Dekker.
CHAPTER 1. READING 3

• Lindsey, J.K. (1993). Models for Repeated


Measurements. Oxford: Oxford University Press.
• Longford, N.T. (1993). Random Coefficient Models.
Oxford: Oxford University Press.
• McCullagh, P. and Nelder, J.A. (1989). Generalized
Linear Models (second edition). London: Chapman
and Hall.
• Pinheiro, J.C. and Bates, D.M. (2000). Mixed-Effects
Models in S and S-PLUS. New York: Springer-Verlag.
• Searle, S.R., Casella, G., and McCulloch, C.E. (1992).
Variance Components. John Wiley & Sons, New York.
• Senn, S.J. (1993). Cross-over Trials in Clinical
Research. Chichester: Wiley.
• Vonesh, E.F. and Chinchilli, V.M. (1997). Linear and
non-linear models for the analysis of repeated
measurements. Marcel Dekker, Basel.
Chapter 2

Generalized Linear Models

Generalized linear models (GLM) is a unifying theory for


a wide range of settings:

• normal: linear models: multiple regression, ANOVA

• binary: probit and logit (logistic) regression

• categorcial data: log-linear modelling

• counts: Poisson regression

• non-negative continuous data: survival analysis


(possibly censored)

(McCullagh and Nelder 1989)

1
CHAPTER 2. GENERALIZED LINEAR MODELS 2

2.1 GLM for Independent Responses:


Review

1. E(Yi) = µi
2. η(µi) = xTi β with η(.) the link function
3. Var(Yi) = φv(µi), where
• v(.) is a known variance function

• φ is a scale parameter (sometimes called


overdispersion parameter)

• exponential family p.d.f.


 
−1
f (y|θi, φ) = exp φ [yθi − ψ(θi)] + c(y, φ)
with θi the natural parameter and ψ(.) a function
satisfying
– µi = ψ (θi)

– v(µi) = ψ (θi)
CHAPTER 2. GENERALIZED LINEAR MODELS 3

Summary

1.–2. specify the mean response as a known function of


explanatory variables (covariates, regression
parameters)

3. specify the variance of the response as a known


function of the mean, multiplied by a scale parameter

4. specify the distribution of the response as a member


of the exponential family
(for large samples, this assumption can be relaxed)
CHAPTER 2. GENERALIZED LINEAR MODELS 4

2.2 Example: Normal Linear Model

Yi ∼ N (xTi β, σ 2)
with

• η(µ) = µ (identity link)

• v(µ) = 1

• φ = σ2

•θ=µ

• ψ(θ) = θ2/2

−y 2
• c(y, φ) = 2φ − 12 ln(2πφ)
CHAPTER 2. GENERALIZED LINEAR MODELS 5

2.3 Example: Bernoulli Logistic Model

exp(xTi β)
P (Yi = 1) =
1 + exp(xTi β)
with
 
• η(µ) = ln µ
1−µ

• v(µ) = µ(1 − µ)

•φ=1
 
• θ = ln µ
1−µ

• ψ(θ) = ln {1 + exp(θ)}

• Verify the conditions on ψ(θ)


CHAPTER 2. GENERALIZED LINEAR MODELS 6

2.4 Likelihood Estimation

1 N n

(θ1, . . . , θN , φ) = {yiθi − ψ(θi)} + c(yi, φ)
φ i=1 i=1

Since θi is modelled in terms of β:


θi = θi(β)
we find
∂ 1 
N ∂θi
= {yi − φ(θi)}
∂βj φ i=1 ∂βj

which can be rewritten in the classical score equations


form

N ∂θi  
ψ (θi)[φψ (θi)]−1 {yi − φ(θi)}

S(βj ) =
i=1 ∂βj

(j = 1, . . . , p)

The MLE is found by solving the score equations


S(β) = 0.
CHAPTER 2. GENERALIZED LINEAR MODELS 7

The score equations can be rewritten in a useful form,


using two facts:

1. ψ (θi) = µi and thus


∂µi  ∂θi
= ψ (θi)
∂βj ∂βj

2. vi = var(Yi) = φψ (θi)

The score equations become


N
 ∂µi −1
S(βj ) = vi (yi − µi) = 0
i=1 ∂βj
or  
N 
 ∂µi T −1
S(β) = 
  vi (yi − µi ) = 0.
i=1 ∂β
CHAPTER 2. GENERALIZED LINEAR MODELS 8

Remarks

• Estimation of β only depends on the p.d.f. f (.)


through the means and variances of the responses Yi.

• A unified estimation scheme obtainable in terms of


specified explanatory variables, link function and
variance function:
iterative (re-)weighted least squares.
Newton-Raphson and Fisher scoring can be used
equally well.

• The theory of quasi-likelihood shows that the usual


asymptotic properties of β̂ hold when the means and
variances of Yi are specified correctly, even if the
distribution of f (.) is not

• In some applications φ is a known constant.


If not, need to estimate φ to construct standard errors
for β̂.
Chapter 3

Case Study: Analgesic Trial

• single-arm trial with 530 patients recruited (491


selected for analysis)

• analgesic treatment for pain caused by chronic


nonmalignant disease

• treatment was to be administered for 12 months

• we will focus on Global Satisfaction Assessment


(GSA)

• GSA scale goes from 1=very good to 5=very bad

• GSA was rated by each subject 4 times during the


trial, at months 3, 6, 9, and 12.

9
CHAPTER 3. CASE STUDY: ANALGESIC TRIAL 10

3.1 Observed Frequencies

|---------|------------------------------------------------------|----------|
| | GSA | |
| |----------|----------|----------|----------|----------| |
| |Very Good | Good | Moderate | Bad | Very Bad | All |
| |----|-----+----|-----+----|-----+----|-----+----|-----+----|-----|
| | N | % | N | % | N | % | N | % | N | % | N | % |
|---------+----+-----+----+-----+----+-----+----+-----+----+-----+----+-----|
|Time | | | | | | | | | | | | |
|---------| | | | | | | | | | | | |
|MONTH 3 | 55| 14.3| 112| 29.1| 151| 39.2| 52| 13.5| 15| 3.9| 385|100.0|
|---------+----+-----+----+-----+----+-----+----+-----+----+-----+----+-----|
|MONTH 6 | 38| 12.6| 84| 27.8| 115| 38.1| 51| 16.9| 14| 4.6| 302|100.0|
|---------+----+-----+----+-----+----+-----+----+-----+----+-----+----+-----|
|MONTH 9 | 40| 17.6| 67| 29.5| 76| 33.5| 33| 14.5| 11| 4.8| 227|100.0|
|---------+----+-----+----+-----+----+-----+----+-----+----+-----+----+-----|
|MONTH 12 | 30| 13.5| 66| 29.6| 97| 43.5| 27| 12.1| 3| 1.3| 223|100.0|
|---------|----|-----|----|-----|----|-----|----|-----|----|-----|----|-----|
CHAPTER 3. CASE STUDY: ANALGESIC TRIAL 11

3.2 Questions

• Evolution over time

• Relation with baseline covariates: age, sex, duration


of the pain, type of pain, disease progression, Pain
Control Assessment (PCA), . . .

• Investigation of dropout
CHAPTER 3. CASE STUDY: ANALGESIC TRIAL 12

3.3 Missingness Patterns


Missingness Cumulative Cumulative
Pattern Frequency Percent Frequency Percent
-------------------------------------------------------
**** 96 19.6 96 19.6
***- 2 0.4 98 20.0
**-* 1 0.2 99 20.2
*-** 3 0.6 102 20.8
*-*- 1 0.2 103 21.0
*--* 1 0.2 104 21.2
*--- 2 0.4 106 21.6
-*** 63 12.8 169 34.4
-**- 18 3.7 187 38.1
-*-* 2 0.4 189 38.5
-*-- 7 1.4 196 39.9
--** 51 10.4 247 50.3
--*- 30 6.1 277 56.4
---* 51 10.4 328 66.8
---- 163 33.2 491 100.0
CHAPTER 3. CASE STUDY: ANALGESIC TRIAL 13

3.3.1 Dropout
Dropout Time

Frequency|
Col Pct |MONTH 3 |MONTH 6 |MONTH 9 |MONTH 12| Total
---------+--------+--------+--------+--------+
No | 385 | 302 | 227 | 223 | 1137
| 78.41 | 61.51 | 46.23 | 45.42 |
---------+--------+--------+--------+--------+
Yes | 106 | 189 | 264 | 268 | 827
| 21.59 | 38.49 | 53.77 | 54.58 |
---------+--------+--------+--------+--------+
Total 491 491 491 491 1964

Dropout
Pattern Cumulative Cumulative
(redefined) Frequency Percent Frequency Percent
-------------------------------------------------------------
**** 96 19.55 96 19.55
-*** 63 12.83 159 32.38
--** 54 11.00 213 43.38
---* 55 11.20 268 54.58
---- 223 45.42 491 100.00
CHAPTER 3. CASE STUDY: ANALGESIC TRIAL 14

Generalized Linear Models

• Early dropout (did the subject drop out after the first
or the second visit) ?
• Binary response
• PROC GENMOD can fit GLMs in general
• PROC LOGISTIC can fit models for binary (and
ordered) responses
• SAS code:

/*** Logit link ***/


proc genmod data=earlydrp;
model earlydrp = pca0 weight psychiat physfct / dist=b;
run;

proc logistic data=earlydrp descending;


model earlydrp = pca0 weight psychiat physfct;
run;

/*** Probit link ***/


proc genmod data=earlydrp;
model earlydrp = pca0 weight psychiat physfct / dist=b link=probit;
run;

proc logistic data=earlydrp descending;


model earlydrp = pca0 weight psychiat physfct / link=probit;
run;
CHAPTER 3. CASE STUDY: ANALGESIC TRIAL 15

The GENMOD Procedure

Model Information

Data Set WORK.EARLYDRP


Distribution Binomial
Link Function Logit
Dependent Variable earlydrp
Observations Used 386
Probability Modeled Pr( earlydrp = 1 )
Missing Values 9

Response Profile

Ordered Ordered
Level Value Count

1 0 271
2 1 115

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 381 437.9967 1.1496


Scaled Deviance 381 437.9967 1.1496
Pearson Chi-Square 381 384.1586 1.0083
Scaled Pearson X2 381 384.1586 1.0083
Log Likelihood -218.9984

Algorithm converged.

Analysis Of Parameter Estimates

Standard Wald 95% Chi-


Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq

Intercept 1 -1.0673 0.7328 -2.5037 0.3690 2.12 0.1453


PCA0 1 0.3981 0.1343 0.1349 0.6614 8.79 0.0030
WEIGHT 1 -0.0211 0.0072 -0.0353 -0.0070 8.55 0.0034
PSYCHIAT 1 0.7169 0.2871 0.1541 1.2796 6.23 0.0125
PHYSFCT 1 0.0121 0.0050 0.0024 0.0219 5.97 0.0145
Scale 0 1.0000 0.0000 1.0000 1.0000

NOTE: The scale parameter was held fixed.


CHAPTER 3. CASE STUDY: ANALGESIC TRIAL 16

The LOGISTIC Procedure

Model Information

Data Set WORK.EARLYDRP


Response Variable earlydrp
Number of Response Levels 2
Number of Observations 386
Link Function Logit
Optimization Technique Fisher’s scoring

Response Profile

Ordered Total
Value earlydrp Frequency

1 1 115
2 0 271

NOTE: 9 observations were deleted due to missing values for the response or
explanatory variables.

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics

Intercept
Intercept and
Criterion Only Covariates

AIC 472.224 447.997


SC 476.179 467.776
-2 Log L 470.224 437.997

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 32.2269 4 <.0001


Score 31.6004 4 <.0001
Wald 28.3625 4 <.0001
CHAPTER 3. CASE STUDY: ANALGESIC TRIAL 17

Analysis of Maximum Likelihood Estimates

Standard
Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -1.0674 0.7328 2.1214 0.1453


PCA0 1 0.3981 0.1343 8.7885 0.0030
WEIGHT 1 -0.0211 0.00723 8.5546 0.0034
PSYCHIAT 1 0.7169 0.2871 6.2338 0.0125
PHYSFCT 1 0.0121 0.00496 5.9706 0.0145

Odds Ratio Estimates

Point 95% Wald


Effect Estimate Confidence Limits

PCA0 1.489 1.144 1.937


WEIGHT 0.979 0.965 0.993
PSYCHIAT 2.048 1.167 3.595
PHYSFCT 1.012 1.002 1.022

Association of Predicted Probabilities and Observed Responses

Percent Concordant 67.1 Somers’ D 0.346


Percent Discordant 32.5 Gamma 0.347
Percent Tied 0.4 Tau-a 0.145
Pairs 31165 c 0.673
CHAPTER 3. CASE STUDY: ANALGESIC TRIAL 18

The GENMOD Procedure

Model Information

Data Set WORK.EARLYDRP


Distribution Binomial
Link Function Probit
Dependent Variable earlydrp
Observations Used 386
Probability Modeled Pr( earlydrp = 1 )
Missing Values 9

Response Profile

Ordered Ordered
Level Value Count

1 0 271
2 1 115

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 381 437.9255 1.1494


Scaled Deviance 381 437.9255 1.1494
Pearson Chi-Square 381 384.2600 1.0086
Scaled Pearson X2 381 384.2600 1.0086
Log Likelihood -218.9628

Algorithm converged.

Analysis Of Parameter Estimates

Standard Wald 95% Chi-


Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq

Intercept 1 -0.6485 0.4371 -1.5052 0.2082 2.20 0.1379


PCA0 1 0.2370 0.0791 0.0821 0.3920 8.99 0.0027
WEIGHT 1 -0.0126 0.0043 -0.0210 -0.0042 8.72 0.0031
PSYCHIAT 1 0.4300 0.1731 0.0908 0.7692 6.17 0.0130
PHYSFCT 1 0.0073 0.0030 0.0015 0.0132 6.06 0.0139
Scale 0 1.0000 0.0000 1.0000 1.0000

NOTE: The scale parameter was held fixed.


CHAPTER 3. CASE STUDY: ANALGESIC TRIAL 19

The LOGISTIC Procedure

Model Information

Data Set WORK.EARLYDRP


Response Variable earlydrp
Number of Response Levels 2
Number of Observations 386
Link Function Normit
Optimization Technique Fisher’s scoring

Response Profile

Ordered Total
Value earlydrp Frequency

1 1 115
2 0 271

NOTE: 9 observations were deleted due to missing values for the response or
explanatory variables.

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics

Intercept
Intercept and
Criterion Only Covariates

AIC 472.224 447.926


SC 476.179 467.705
-2 Log L 470.224 437.926

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 32.2981 4 <.0001


Score 31.6004 4 <.0001
Wald 30.0157 4 <.0001
CHAPTER 3. CASE STUDY: ANALGESIC TRIAL 20

Analysis of Maximum Likelihood Estimates

Standard
Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -0.6486 0.4354 2.2198 0.1363


PCA0 1 0.2371 0.0796 8.8714 0.0029
WEIGHT 1 -0.0126 0.00424 8.8833 0.0029
PSYCHIAT 1 0.4300 0.1741 6.0995 0.0135
PHYSFCT 1 0.00732 0.00297 6.0698 0.0138

Association of Predicted Probabilities and Observed Responses

Percent Concordant 67.1 Somers’ D 0.346


Percent Discordant 32.5 Gamma 0.347
Percent Tied 0.4 Tau-a 0.145
Pairs 31165 c 0.673
CHAPTER 3. CASE STUDY: ANALGESIC TRIAL 21

3.4 Summary Table: Logit Link

Variable† Estimate S.E. p

Intercept -1.0674 0.7328 0.1453


PCA0 (PCA) 0.3981 0.1343 0.0030
WEIGHT (weight) -0.0211 0.0072 0.0034
PSYCHIAT (psychiatric disorder Yes/No) 0.7169 0.2871 0.0125
PHYSFCT (physical functioning) 0.0121 0.0050 0.0145
† All variables as measured at baseline.
Chapter 4

Linear (Mixed) Models for Longitudinal


Data

4.1 Correlated Data


• One outcome, one sample
– height of human subjects
– mean, median, standard error, interquartile ranges

• One outcome, two samples (binary covariate)


– treated and untreated patients, two species,. . .
– Are means different across populations ?

22
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 23

• One outcome, more complex covariate


– several dose levels
– species of plants
– weight

⇒ Linear, logistic, Poisson,. . . regression


• One outcome, multiple covariates
– Most techniques extend easily
– Multi-way ANOVA
– Multiple regression
– Collinearity, confounding,. . .
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 24

4.2 Taxonomy

Classical design: (cross-sectional) A single outcome is


measured on each subject.

Multivariate design: A single outcome on several


variables is measured on each subject (HDL, LDL,
CHOL, APOA1, APOB).

Repeated measures design: Multiple outcomes of a


single quantitity are measured on each subject (e.g.
the malformation index for all fetuses of the same
dam: clustered data).

Longitudinal design: Repeated measures over time.


CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 25

4.2.1 EXAMPLE: Reading ability and age

• Panel (a) surprising ?


• First longitudinal interpretation: panel (b).
• Second longitudinal interpretation: panel (c).
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 26

4.2.2 EXAMPLE: CD4+ cell counts

• A cohort of 369 male HIV seroconverters.


• CD4+ cell-count is measured at approximately 6
month intervals.
• A variable number of measurements per subject.
• 2 distinct objectives:
– estimate the population average time course of
CD4+ cell depletion.
– estimate/predict time course for individual men.
• Substantial measurement error in CD4+ cell
determinations.
• CD4+ is highly variable due to sudden changes
a mild infection, such as an ordinary cold, causes CD4 to “shoot up”.
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 27

4.2.3 EXAMPLE: Weight of Pigs

• Plot for the weights of pigs


• “raw” and a “standardized residual” plot
• Raw plot:
– overall trend
– unclear picture of the random variation about the
trend
– variance is increasing over time
• The residuals have a tendency to remain “high” or
“low” for a given animal: tracking.
• Although still manageable, the plot is quite busy.
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 28

4.3 Scientific Questions

• Making inferences about the set of mean response


profiles (population averaged, marginal).
Does diet affect milk protein content ?
• Predicting individual response trajectories (individual
based, conditional, empirical Bayes).
Person-specific future course of CD4 cell depletion.
• Inferences about the variability between subjects
(sum of squares flavour).
Milk protein data: effect of diets.
• Inferences about the nature of the dependence
between measurements within subject (components
in the variability, the autocorrelation structure).
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 29

4.4 Merits of Longitudinal Studies

longitudinal designs ∼ study of change

cf. reading example

• Cross-sectional:
Yi1 = βC xi1 + εi1 (4.1)
βC : the average difference across two sub-population
that differ by unit x.
• Repeated observations:
Yij = βC xi1 + βL(xij − xi1) + εij (4.2)
– j = 1: cross-sectional
– ⇒ βC retains interpretation
– In addition, βL can be studied
Subtract:
(Yij − Yi1) = βL(xij − xi1) + (εij − εi1)
βL: expected change in Y over time per unit
change in x.
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 30

4.5 Advantages of Longitudinal Studies

• To study the longitudinal parameter with a


cross-sectional study: βL ≡ βC .
Very strong assumption.
• Longitudinal studies are more powerful:
The estimation of βL is based on changes of x
within subjects.
The subject serves as her own control.
• The distinction of between subjects variability and
within subjects variability is important:
– The between variability among CD4+ measures is
very high
⇒ less useful to predict an individual’s future
values.
– The past measurements of an individual may
contribute more information to predict an
individual’s future values than the measurements
of other subjects.
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 31

4.6 Notation

• Yij , yij : jth response on ith subject


• tij : measurement time at which Yij is taken
• xij : vector of explanatory variables
• Individual i: Y i = (Yi1, . . . , Yini )
• The entire dataset: Y = (Y 1, . . . , Y N )
• E(Y i) = µi
• Var(Y i) = V i
• E(Y ) = µ
• Var(Y ) = V

V has a block-diagonal structure


 




V 1 0 ... 0 



 
 


 0 V 2 ... 0 


V = 








.. .. . . . .. 


 
 
 
 

0 0 ... V N 
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 32

4.7 Types of Longitudinal Data

4.7.1 Balanced Data

• A fixed number of measurements per subject.


• Measurements taken at ± the same time.
• “holes”≡ missing data.
• A saturated model can be considered.

4.7.2 Unbalanced Data

• A variable, possibly random, number of measurements


per subject is taken.
• Measurement times: (random) variable
• It is hard to distinguish unbalancedness from genuine
missing data.
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 33

4.8 Types of Outcomes

• Linear (Mixed) Models:


– A continuous outcome (or transformation) drawn
from a ± normal distribution.
→ linear (mixed) model (Laird and Ware, 1982)
→ SAS PROC MIXED, BMDP5V, GENSTAT, SPlus
(lme and nlme, OSWALD), MLwiN
• Generalized Linear Models:
– McCullagh and Nelder (1989)
– Multivariate and longitudinal counterparts
– Remains a field of active research.
– Marginal, conditional, random effects models
– Breslow and Clayton (1993)
– Wolfinger and O’Connell (1995)
– Lee and Nelder (1996)
– Fahrmeir and Tutz (1994)
– Less software.
SAS (GENMOD:GEE, GLIMMIX), SPlus
(OSWALD, GEE).
• Survival Outcomes
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 34

4.9 Components of Variability

• Random effects:
– These are effects which arise from the
characteristics of individual subjects.
– Some subjects may be intrinsically high
responders, others intrinsically low responders.
– The influence of a random effect extends over all
measurements of the same subject.
• Serial correlation:
– Measurements taken close together in time are
typically more strongly correlated than those taken
further apart in time.
– On a sufficiently small time-scale, this kind of
structure is almost inevitable.
• Measurement error:
– When measurements involve delicate
determinations, the results may show substantial
variation even when two measurements are taken
at the same time from the same subject.
– e.g. bio-assay of blood samples
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 35

4.10 Full Multivariate Model For


Balanced Data

4.10.1 Case Study: Growth Data

• Potthof and Roy (1964)


• Jennrich and Schluchter (1986)
• Little and Rubin (1987)
• Growth measurements for 11 girls and 16 boys.
• For each subject the distance from the center of the
pituitary to the maxillary fissure was recorded.
• At ages 8, 10, 12, and 14.
• Little and Rubin (1987) deleted 9 of the
[(11 + 16) × 4] measurements.
• 9 subjects are incomplete.
• Deletion is confined to the age 10 measurements.
• subjects with a low value at age 8 are more likely to
have a missing value at age 10.
• Balanced:
– common measurement times
– equally spaced measurements
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 36

• Later: missing data treatment, based on these data.


CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 37

Growth data for 11 girls and 16 boys.


Age (in years) Age (in years)

Girl 8 10 12 14 Boy 8 10 12 14
1 21.0 20.0 21.5 23.0 1 26.0 25.0 29.0 31.0
2 21.0 21.5 24.0 25.5 2 21.5 22.5∗ 23.0 26.5
3 20.5 24.0∗ 24.5 26.0 3 23.0 22.5 24.0 27.5
4 23.5 24.5 25.0 26.5 4 25.5 27.5 26.5 27.0
5 21.5 23.0 22.5 23.5 5 20.0 23.5∗ 22.5 26.0
6 20.0 21.0∗ 21.0 22.5 6 24.5 25.5 27.0 28.5
7 21.5 22.5 23.0 25.0 7 22.0 22.0 24.5 26.5
8 23.0 23.0 23.5 24.0 8 24.0 21.5 24.5 25.5
9 20.0 21.0∗ 22.0 21.5 9 23.0 20.5 31.0 26.0
10 16.5 19.0∗ 19.0 19.5 10 27.5 28.0 31.0 31.5
11 24.5 25.0 28.0 28.0 11 23.0 23.0 23.5 25.0
12 21.5 23.5∗ 24.0 28.0
13 17.0 24.5∗ 26.0 29.5
14 22.5 25.5 25.5 26.0
15 23.0 24.5 26.0 30.0
16 22.0 21.5∗ 23.5 25.0
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 38
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 39

4.10.2 Model formulation and estimation

• Let yi be the vector of n repeated measurements for


the ith subject:
 




yi1 
 
 
 

 yi2 

yi = 








.. 


 
 
 
 
 
yin

• The general multivariate model assumes that yi


satisfies a regression model
yi = Xiβ + εi
with










Xi: matrix of covariates







β: vector of regression parameters









 εi: vector of error components, εi ∼ N (0, Σ)
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 40

• We then have the following distribution for yi:


yi ∼ N (Xiβ, V )

• The mean structure Xiβ is modelled as in classical


regression (ANOVA) models

• Usually, V is just a general (n × n) covariance matrix.


However, special structures for V can be assumed
(see later).

• Assuming independence across individuals, β and the


parameters in V can be estimated by maximizing

N 
 
−n/2 − 12
LM L = 

(2π) |V |
i=1
 
1  −1 

× exp − (yi − Xiβ) V (yi − Xiβ) 




2
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 41

• The MLE for β equals


 
N −1 N
 −1
 
XiV −1yi,

β =



Xi V Xi 
i=1 i=1

which has mean β and covariance matrix

 
N −1
   −1
Var(β) =



Xi V Xi 
i=1

• Inferences for β are obtained from replacing V by V




in the above equations, and from assuming normality


for β.


• If V is a general (n × n) covariance matrix, the MLE


equals
1   
V =

(yi − Xiβ)(y

i − X i β)
N i
Otherwise, there is usually no analytic expression for
the MLE of the parameters in V .
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 42

4.10.3 Example: Growth data

Model 1: Unstructured mean and covariance

• 8 parameters for mean structure

• Unstructured 4 × 4 covariance matrix

• We define xi = 0 for boys and xi = 1 for girls

• Age: ti = 8, 10, 12, 14

• Model 1 is given by:

Yi1 = β0 + β1xi + β0,8(1 − xi) + β1,8xi + εi1,

Yi2 = β0 + β1xi + β0,10(1 − xi) + β1,10xi + εi2,

Yi3 = β0 + β1xi + β0,12(1 − xi) + β1,12xi + εi3,

Yi4 = β0 + β1xi + εi4,


CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 43

• In matrix notation, this equals


Y i = Xiβ + εi,
with
 




1 xi 1 − xi 0 0 xi 0 0 



 
 


 1 xi 0 1 − xi 0 0 xi 0 


Xi = 










1 xi 0 0 1 − xi 0 0 xi 


 
 
 

1 xi 0 0 0 0 0 0 

and with
β = (β0, β1, β0,8, β0,10, β0,12, β1,8, β1,10, β1,12)

• Parameterization:
– Means for boys: β0 + β1 + β1,8
β0 + β1 + β1,10
β0 + β1 + β1,12
β0 + β1
– Means for girls: β0 + β0,8
β0 + β0,10
β0 + β0,12
β0
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 44

• SAS program:
proc mixed data = growth method = ml covtest;
title ’Growth Data, Model 1’;
class idnr sex age;
model measure = sex age*sex / s;
repeated / type = un subject = idnr r rcorr;
run;

• MLE’s and estimated standard errors for β

Parameter MLE (s.e.)


β0 24.0909 (0.6478)
β1 3.3778 (0.8415)
β0,8 -4.5938 (0.5369)
β0,10 -3.6563 (0.3831)
β0,12 -1.7500 (0.4290)
β1,8 -2.9091 (0.6475)
β1,10 -1.8636 (0.4620)
β1,12 -1.0000 (0.5174)
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 45

• Estimated covariance matrix V




 
 5.0143 2.5156 3.6206 2.5095 

 
 2.5156 3.8748 2.7103 3.0714 


 
 3.6206 2.7103 5.9775 3.8248 


 
2.5095 3.0714 3.8248 4.6164

• The corresponding correlation matrix is


 


1.0000 0.5707 0.6613 0.5216 
 



0.5707 1.0000 0.5632 0.7262 
 .



0.6613 0.5632 1.0000 0.7281 
 
0.5216 0.7262 0.7281 1.0000
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 46

Model 2: Linear average trends

• Linear average trend within each sex group

• Unstructured 4 × 4 covariance matrix

• Model 2 is given by:

Yij = β0 + β01xi + β10tj (1 − xi) + β11tj xi + εij ,

• In matrix notation, this equals


Y i = Xiβ + εi,
where the design matrix is
 




1 xi 8(1 − xi) 8xi 



 
 


 1 xi 10(1 − xi) 10xi 


Xi = 







.



1 xi 12(1 − xi) 12xi 


 
 
 

1 xi 14(1 − xi) 14xi 

and
β = (β0, β01, β10, β11).
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 47

• Parameterization:
– β0: intercept for boys
– β0 + β01: intercept for girls
– β10: slope for boys
– β11: slope for girls

• SAS program: delete age from the CLASS statement

• LR test Model 2 versus Model 1


Mean Covar par −2 Ref G2 df p
1 unstr. unstr. 18 416.509
2 = slopes unstr. 14 419.477 1 2.968 4 0.5632

• Predicted trends:
girls : Ŷj = 17.43 + 0.4764tj

boys : Ŷj = 15.84 + 0.8268tj


CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 48

A View On the Data


CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 49

Model 3: Parallel average profiles

• Linear average trend within each sex group

• The same slope for both groups

• Unstructured 4 × 4 covariance matrix

• Model 3 is given by:


Yij = β0 + β01xi + β1tj + εij .

• In matrix notation, this equals


Y i = Xiβ + εi,
where the design matrix is
 




1 xi 8 



 
 


 1 xi 10 


Xi = 










1 xi 12 


 
 
 

1 xi 14 

and
β = (β0, β01, β1)
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 50

• The two slopes in Model 2 have been replaced by β1

• SAS program:

model measure = sex age / s;

• LR test: Model 3 versus Model 2


Mean Covar par −2 Ref G2 df p
1 unstr. unstr. 18 416.509
2 = slopes unstr. 14 419.477 1 2.968 4 0.5632
3 = slopes unstr. 13 426.153 2 6.676 1 0.0098

• Predicted trends:
girls : Ŷj = 15.37 + 0.6747tj

boys : Ŷj = 17.42 + 0.6747tj


CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 51

Model 4: Toeplitz covariance structure

• Linear average trend within each sex group

• Elements of V of the form Vij = α|i−j|:


 




α0 α1 α2 α3 
 
 
 

 α1 α0 α1 α2 

V = 







 


α2 α1 α0 α1 

 
 
 
 
α3 α2 α1 α0

• SAS program:
proc mixed data = growth method = ml covtest;
title ’Growth Data, Model 4’;
class sex idnr;
model measure = sex age*sex / s;
repeated / type = toep subject = idnr r rcorr;
run;
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 52

• LR test Model 4 versus Model 2:


Mean Covar par −2 Ref G2 df p
1 unstr. unstr. 18 416.509
2 = slopes unstr. 14 419.477 1 2.968 4 0.5632
4 = slopes banded 8 424.643 2 5.166 6 0.5227

• Estimated covariance matrix:


 
 4.9439 3.0507 3.4054 2.3421 

 
 3.0507 4.9439 3.0507 3.4054 


 ,
 3.4054 3.0507 4.9439 3.0507 


 
2.3421 3.4054 3.0507 4.9439

• Corresponding correlation matrix


 


1.0000 0.6171 0.6888 0.4737 
 



0.6171 1.0000 0.6171 0.6888 
 .



0.6888 0.6171 1.0000 0.6171 
 
0.4737 0.6888 0.6171 1.0000
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 53

• Standard errors of the variance components estimates:


Covariance Parameter Estimates (MLE)

Cov Parm Subject Estimate Std Error

TOEP(2) IDNR 3.05070312 0.97907984


TOEP(3) IDNR 3.40540527 0.98115569
TOEP(4) IDNR 2.34212396 1.03583358
Residual IDNR 4.94388956 0.98687143
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 54

Model 5: AR(1) covariance structure

• Linear average trend within each sex group

• Elements of V of the form Vij = σ 2ρ|i−j|:


 




1 ρ ρ2 ρ3 
 
 


 ρ 1 ρ ρ 2 


V =σ 2 










ρ2 ρ 1 ρ 


 
 
 

ρ3 ρ 2 ρ 1 

• SAS program:

repeated / type = AR(1) subject = idnr r rcorr;


CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 55

• LR test Model 5 versus Models 2 and 4:


Mean Covar par −2 Ref G2 df p
1 unstr. unstr. 18 416.509
2 = slopes unstr. 14 419.477 1 2.968 4 0.5632
4 = slopes banded 8 424.643 2 5.166 6 0.5227
5 = slopes AR(1) 6 440.681 2 21.204 8 0.0066
4 16.038 2 0.0003

• The estimated covariance matrix is


 
 4.8903 2.9687 1.8021 1.0940 
 
 
 2.9687 4.8903 2.9687 1.8021 
 
 
 .
 1.8021 2.9687 4.8903 2.9687 
 
 
 
1.0940 1.8021 2.9687 4.8903

• The corresponding correlation matrix is


 


1.0000 0.6070 0.3685 0.2237 
 



0.6070 1.0000 0.6070 0.3685 
 .



0.3685 0.6070 1.0000 0.6070 
 
0.2237 0.3685 0.6070 1.0000
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 56

Model 6: Random intercepts and slopes

• Linear trend for each subject

• Linear average trend for each sex group

• V is assumed of the form


V = ZDZ  + σ 2I
where  




1 8 



 
 


 1 10 


Z= 










1 12 


 
 
 

1 14 

and where D is unstructured.

• SAS program:
proc mixed data = growth method = ml covtest;
title ’Growth Data, Model 6’;
class sex idnr;
model measure = sex age*sex / s;
random intercept age / type = un subject = idnr g;
run;
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 57

• Estimates for D and σ 2:


 



4.5569 −0.1983 
 
,  2 = 1.7162
σ
 
 

−0.1983 0.0238 

• Estimate for V :
 




4.6216 2.8891 2.8727 2.8563 
 
 




 2.8891 4.6839 3.0464 3.1251 


Z DZ + σ I =

2 










2.8727 3.0464 4.9363 3.3938 


 
 
 

2.8563 3.1251 3.3938 5.3788 

• The corresponding estimated correlation matrix is


 




1.0000 0.6209 0.6014 0.5729 
 
 




0.6209 1.0000 0.6335 0.6226 



 
 
 



0.6014 0.6335 1.0000 0.6586 


 
 
 

0.5729 0.6226 0.6586 1.0000 
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 58

Random Intercept/Compound Symmetry: Model


7

• Linear average trend for each sex group

• Subject-specific intercepts

• V is assumed of the form


ZDZ  + σ 2I4 = dJ4 + σ 2I4
where J4 is a (4 × 4) matrix of ones.

• This covariance structure is called exchangeable or


compound symmetry.

• All correlations are equal to (d + σ 2)/σ 2


CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 59

• Possible SAS programs:


proc mixed data = growth method = ml covtest;
title ’Growth Data, Model 7’;
class sex idnr;
model measure = sex age*sex / s;
random intercept / type = un subject = idnr g;
run;

proc mixed data = growth method = ml covtest;


title ’Growth Data, Model 7’;
class sex idnr;
model measure = sex age*sex / s;
repeated / type = cs subject = idnr r rcorr;
run;

• LR test: Model 7 versus Models 1 and 6 (+random


slopes)

Mean Covar par −2 Ref G2 df p


1 unstr. unstr. 18 416.509
6 = slopes random 8 427.806 2 8.329 6 0.2150
7 = slopes CS 6 428.639 6 0.833 2 0.6594
6 0.833 1:2 0.5104
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 60

• The estimated covariance matrix is


 
 4.9052 3.0306 3.0306 3.0306 
 
 
 3.0306 4.9052 3.0306 3.0306 
 
 
 ,
 3.0306 3.0306 4.9052 3.0306 
 
 
 
3.0306 3.0306 3.0306 4.9052

• The corresponding correlation matrix equals


 


1.0000 0.6178 0.6178 0.6178 
 



0.6178 1.0000 0.6178 0.6178 
 .



0.6178 0.6178 1.0000 0.6178 
 
0.6178 0.6178 0.6178 1.0000

• The profiles, predicted by Model 7, are


girls : Ŷj = 17.37 + 0.4795tj ,

boys : Ŷj = 16.34 + 0.7844tj .

• While the average profiles are not exactly the same as


those from Model 2, they are extremely similar.
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 61

Independence: Model 8

• Linear average trend for each sex group

• Independence of all repeated measurements

• V is assumed of the form σ 2I

• SAS program:

repeated / type = simple subject = idnr r rcorr;

• LR test Model 8 versus Model 7


Mean Covar par −2 Ref G2 df p
1 unstr. unstr. 18 416.509
8 = slopes simple 5 478.242 7 49.603 1 <0.0001
7 49.603 0:1 <0.0001
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 62

Overview

Mean Covar par −2 Ref G2 df p


1 unstr. unstr. 18 416.509
2 = slopes unstr. 14 419.477 1 2.968 4 0.5632
3 = slopes unstr. 13 426.153 2 6.676 1 0.0098
4 = slopes Toepl. 8 424.643 2 5.166 6 0.5227
5 = slopes AR(1) 6 440.681 2 21.204 8 0.0066
4 16.038 2 0.0003
6 = slopes random 8 427.806 2 8.329 6 0.2150
7 = slopes CS 6 428.639 2 9.162 8 0.3288
4 3.996 2 0.1356
6 0.833 2 0.6594
6 0.833 1:2 0.5104
8 = slopes simple 5 478.242 7 49.603 1 <0.0001
7 49.603 0:1 <0.0001
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 63

4.11 Linear Mixed Models











Yi = Xiβ + Zibi + εi

























bi ∼ N (0, D),




















εi ∼ N (0, Σi),





















 b1, . . . , bN , ε1, . . . , εN independent,

• Inference for the marginal model: ML and REML


• Inference for random effects: (Empirical) Bayes
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 64

4.12 Inference in the marginal linear


mixed model

4.12.1 The hierarchical versus marginal model

• Distribution of Yi given bi:


Yi|bi ∼ N (Xiβ + Zibi, Σi)
with density function f (yi|bi)

• Distribution of bi:
bi ∼ N (0, D)
with density function f (bi)

• The marginal density of Yi is then given by



f (yi) = f (yi|bi) f (bi) dbi
which is the density function of a ni-dimensional
normal distribution with mean vector Xiβ and with
covariance matrix Vi = ZiDZi + Σi
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 65

• The marginal model equals


Yi ∼ N (Xiβ, ZiDZi + Σi)

• E(Yi) = Xiβ
Var(Yi) = Vi = ZiDZi + Σi

• Hence, the random-effects structure implies a


covariance structure of a very specific form.
e.g. random intercepts and slopes for time
=⇒ variance is a quadratic function over time.

• Note that the hierarchical model implies the marginal


model, but not vice versa.

• Therefore, inferences based on the marginal model do


not explicitly assume the presence of random effects
representing the natural heterogeneity between
subjects.
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 66

4.12.2 Notation and terminology

• The only parameters in the marginal model are β, D


and the parameters in the Σi

• Let α contain all q(q + 1)/2 parameters in D, and all


parameters in the Σi.

• The elements of β are called fixed effects

• The elements of α are called variance components

• We denote θ = (β , α)
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 67

4.12.3 The Autocorrelation Function

• Often, the correlation between two measurements


within a subject only depends on the time lag
|tij − tik | between these two measurements.

• Autocorrelation function:
ρ(u) = corr(e(t), e(t − u)), u≥0

• For data sets where all subjects are measured at the


same, equally spaced time points, ρ(u) can be studied
by calculating the correlation between all
measurements with a specific time lag u

• For unbalanced data, with constant variance function,


ρ(u) can be studied by means of the so-called
variogram.
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 68

4.12.4 The Variogram

• We assume e(t) to be (weakly) stationary:


– Constant mean (zero)

– Constant variance: σ 2(t) = σ 2

– Corr(e(t), e(t − u)) only depends on u

• One can then show that the variogram, defined by


1  
γ(u) = E (ei(t) − ei(t − u)) 2
2
is equal to
γ(u) = σ 2 [1 − ρ(u)]

• Hence, if γ(u) and σ 2 can be estimated from the


data, the autocorralation function ρ(u) can be
studied.
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 69

• γ(u) is estimated by smoothing the scatterplot of


1
vijk = (rij − rik )2
2
versus the time lags
uijk = |tij − tik |
The resulting estimate γ(u)
 is called the empirical or
sample variogram.

• It follows from
1  
E (ei(t) − ej (t − u)) = σ 2,
2
for i = j
2
that σ 2 can be estimated by
 
2 =
σ (rik − rjl )2/2
i=j k l
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 70

4.13 Empirical Bayes Methods for the


Random Effects

4.13.1 Estimation of the Random Effects bi

How can we estimate the bi,

which are unobservable random variables ?

Bayesian methods

1. We first assume θ = (β, α) known


=⇒ bi(θ) (= Bayes estimation)

2. Afterwards, we replace θ by its ML or REML


estimator
=⇒ bi = bi(θ) 
(= empirical Bayes estimation)
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 71

4.13.2 Empirical Bayes estimates bi

• The model for the data, conditional on bi:


yi | bi ∼ N (Xiβ + Zibi, Σi)

• The prior distribution for bi:


bi ∼ N (0, D)

• The posterior density then equals


f (yi|bi) f (bi)
f (bi|yi) =
f (yi)

∝ f (yi|bi) f (bi)

∝ ...


1  −1


∝ exp − bi − DZi Vi (yi − Xiβ)
2

 

Λ−1
i bi − DZiVi−1(yi − Xiβ) 

for some positive definite matrix Λi.


CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 72

• Hence, the posterior distribution is given by


 
bi | yi ∼ N DZiVi−1(yi − Xiβ), Λi

• The EB estimate for bi then equals


bi = E(bi | yi) = DZiVi−1(yi − Xiβ),
in which all parameters are replaced by their ML or
REML estimates.

• Histograms and scatterplots of certain components of


bi are frequently used to detect model deviations or
subjects with ‘exceptional’ evolutions over time.

• Required SAS code:

random intercept time time2


/ type = un subject = id g solution;
make ’solutionR’ out=randeff noprint;
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 73

4.13.3 Shrinkage estimators bi

• Consider the prediction of the evolution of the ith


subject
Y

i ≡ X i β

+ Z b

i i

= Xi β

+ ZiDZiVi−1(yi − Xiβ)


 
= In i − ZiDZiVi−1 Xi β

+ ZiDZiVi−1yi
 
= ΣiVi−1Xiβ

+ In i − ΣiVi−1 yi,

• Hence, Y
i is a weighted mean of the
population-averaged profile Xiβ 
and the observed
  −1   −1
data yi, with weights Σ i Vi and Ini − Σi Vi
respectively.

• Note that Xiβ 


gets much weight if the residual
variability is ‘large’ in comparison to the total
variability.
CHAPTER 4. LINEAR (MIXED) MODELS FOR LONGITUDINAL DATA 74

• This phenomenon is usually called shrinkage:

The observed data are shrunk towards the


prior average profile which is Xiβ,
since the prior mean of the random effects was zero.

• This is also reflected in the fact that for any linear


combination λbi of random effects,
var(λbi) ≤ var(λbi).
Chapter 5

Case Study: Vaccination Trial

5.1 Introduction

• Hepatitis A vaccination trial


– 120 patients recruited
– 109 selected for analysis

• Subjects taken from hospitals within the Antwerp


region

• Month 0–6 vaccination schedule

• Trial initiated in 1992 with yearly follow-up

• Response: (log10) antibody titer

• Covariates: lot, age, sex, weight, height, (BMI)


75
CHAPTER 5. CASE STUDY: VACCINATION TRIAL 76
CHAPTER 5. CASE STUDY: VACCINATION TRIAL 77

5.2 Questions

• Difference between lots

• Relation with baseline covariates

• Prediction

For modeling purposes, we restrict the analysis to


post-vaccination data.
CHAPTER 5. CASE STUDY: VACCINATION TRIAL 78

visit1 visit2 visit3 visit4 visit5 visit6 visit7


visit1 1.00000 0.70230 0.64406 0.70146 0.63716 0.67282 0.68626
102 77 66 72 63 64 57
visit2 0.70230 1.00000 0.92642 0.92258 0.91542 0.92323 0.90638
77 80 66 71 63 64 54
visit3 0.64406 0.92642 1.00000 0.94630 0.93187 0.93040 0.87529
66 66 70 62 55 59 47
visit4 0.70146 0.92258 0.94630 1.00000 0.97291 0.97037 0.92452
72 71 62 74 62 63 52
visit5 0.63716 0.91542 0.93187 0.97291 1.00000 0.97662 0.93980
63 63 55 62 66 59 51
visit6 0.67282 0.92323 0.93040 0.97037 0.97662 1.00000 0.94029
64 64 59 63 59 68 50
visit7 0.68626 0.90638 0.87529 0.92452 0.93980 0.94029 1.00000
57 54 47 52 51 50 60
CHAPTER 5. CASE STUDY: VACCINATION TRIAL 79
CHAPTER 5. CASE STUDY: VACCINATION TRIAL 80

5.3 Selection of a Covariance Structure

• Use of an over-specified model for the mean structure:


– INTERCEPT, AGE, SEX, BMI, LOGBASE
– all covariates are allowed to have time-varying
effects

• Initial random effects: intercept, time, time2

• Models with serial correlation (exponential, gaussian)


show convergence difficulties.

• Selection of the covariance structure:

# param. Comp.
Model Description Covar. Struct. Deviance Model G2 d.f. p-value

A Int., time, time2 7 -384.2


B Int., time 4 -319.8 A 64.4 † < 0.0001
C Int. 2 -239.6 B 80.2 †† < 0.0001
D Int., time, time2 + 9 -432.3 A 48.1 2 < 0.0001
EXP(TIME)
E Int., time, time2 + 14 -440.8 D 8.5 4 0.075
EXP(TIMECLS)

† 50-50 mixture of χ22 and χ23 distributions.


†† 50-50 mixture of χ21 and χ22 distributions.
CHAPTER 5. CASE STUDY: VACCINATION TRIAL 81

5.3.1 Note: Log-Linear Variance Model

• Linear mixed model:


Yij = Xij β + Zij γi + εij
with
– γi ∼ N (0, G)

– (εi1, . . . , ini ) ∼ N (0, Ri).

• Additional variance parameters can be incorporated


by adding a diagonal matrix to Ri.

• This could be as simple as σ 2I, or more complex:


σ 2diag[exp(U δ)]

• The latter is a log-linear variance model and produces


exponential local effects (also known as dispersion
effects).
CHAPTER 5. CASE STUDY: VACCINATION TRIAL 82
CHAPTER 5. CASE STUDY: VACCINATION TRIAL 83

5.4 Simplification of the Mean


Structure

• No time varying effect for the covariates are needed.

• No covariates remain in the model after adjusting for


log antibody level at time of second vaccination.

• Time trend is modeled using fractional polynomials.

• Powers retained: time−2 and log(time)

• Use time−2 and log(time) as random effects instead


of time and time2

• Code:

proc mixed data=postvacc method=ml covtest scoring=5;


class patid pvmthcls;
model loganti = logpvmth pvmthm2 logbase|logbase / s ;
random int logpvmth pvmthm2 / sub=patid type=un;
repeated pvmthcls / sub=patid local=exp(pvmonth);
run;
CHAPTER 5. CASE STUDY: VACCINATION TRIAL 84

5.4.1 Note: Fractional Polynomials

Royston P. & Altman D. (1994). Regression using fractional polynomials of continuous

covariates: parsimonious parametric modeling. Applied Statistics, 43, 429–67.

• A fractional polynomial (FP) of degree m:


m
βj X (pj ),

φm(X; p) = β0 +
j=1

where p = (p1, . . . , pm) is a real-valued vector of


powers with p1 < . . . < pm, and


(pj )




 X pj if pj = 0,
X = 



 ln(X) if pj = 0.

• FP extend the family of traditional polynomials.


FP provide a wide range of functional forms.
FP are a very flexible tool for parametric modeling.

• Usually, m does not need to be large (m = 1, 2).

• Choose a predefined set of fixed powers (e.g. -2 to 2


by 0.5).

• Select the best combination of powers by checking


deviances compared to φ1(X; 1) (straight line).
CHAPTER 5. CASE STUDY: VACCINATION TRIAL 85

• PROC MIXED output:

The Mixed Procedure

Model Information

Data Set WORK.POSTVACC


Dependent Variable LOGANTI
Covariance Structures Unstructured, Variance
Components, Local
Exponential
Subject Effects PATID, PATID
Estimation Method ML
Residual Variance Method Profile
Fixed Effects SE Method Model-Based
Degrees of Freedom Method Containment

Dimensions

Covariance Parameters 9
Columns in X 5
Columns in Z Per Subject 3
Subjects 107
Max Obs Per Subject 7
Observations Used 513
Observations Not Used 250
Total Observations 763

Covariance Parameter Estimates

Cov Parm Subject Estimate

UN(1,1) PATID 0.1211


UN(2,1) PATID -0.01740
UN(2,2) PATID 0.008859
UN(3,1) PATID -0.06872
UN(3,2) PATID 0.01606
UN(3,3) PATID 0.03872
pvmthcls PATID 0.008318
EXP pvmonth -0.1718
Residual 0.04505

Fit Statistics
CHAPTER 5. CASE STUDY: VACCINATION TRIAL 86

Log Likelihood 189.9


Akaike’s Information Criterion 180.9
Schwarz’s Bayesian Criterion 168.9
-2 Log Likelihood -379.8

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

8 681.48 <.0001

Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 3.3133 0.09280 103 35.70 <.0001


logpvmth -0.2266 0.01373 86 -16.50 <.0001
pvmthm2 -0.1276 0.04115 80 -3.10 0.0027
logbase -0.1891 0.09867 239 -1.92 0.0565
logbase*logbase 0.1553 0.03166 239 4.91 <.0001

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

logpvmth 1 86 272.35 <.0001


pvmthm2 1 80 9.62 0.0027
logbase 1 239 3.67 0.0565
logbase*logbase 1 239 24.08 <.0001
CHAPTER 5. CASE STUDY: VACCINATION TRIAL 87

5.5 How Does the Model Fit ?


CHAPTER 5. CASE STUDY: VACCINATION TRIAL 88
CHAPTER 5. CASE STUDY: VACCINATION TRIAL 89
CHAPTER 5. CASE STUDY: VACCINATION TRIAL 90

5.6 Prediction From the Model

• After modeling was completed, new data were


gathered on 74 subjects (Month 84).

• These can be compared with predictions obtained


from the model:
Chapter 6

Case Study: Surrogate Markers

• Primary motivation
– True endpoint is rare and/or distant
– Surrogate endpoint is frequent and/or close in time

• Secondary motivation

True endpoint is
– invasive
– uncomfortable
– costly
– confounded
∗ by secondary treatments
∗ by competing risks

91
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 92

6.1 Age-Related Macular Degeneration

Pharmacological Therapy for Macular Degeneration Study Group (1997)

Z: Interferon-α
• 0: placebo
• 1: 6MIU

T : Visual acuity at 6 months


• continuous
• binary: discretized version or loss of 2 lines of
vision

S: Visual acuity at 1 year


• continuous
• binary: discretized version or loss of 3 lines of
vision

N : 190
• 36 centers
• # patients per center ∈ [2; 18]
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 93

Visual Acuity

Visual Acuity=number of letters correctly read

V A L I D
A T I O N
O F S U R
R O G A T

E M A R K

E R S I N

R A N D O

M I Z E D

E X P E R

I M E N T
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 94

ARMD Data
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 95

6.2 Advanced Ovarian Cancer

4 randomized multicenter trials in advanced ovarian cancer

Ovarian Cancer Meta-Analysis Project (1991)

Z: Two treatment modalities


• 0: cyclophosphamide plus cisplatin (CP)
• 1: cyclophosphamide plus adriamycin plus cisplatin (CAP)

T : (Log of) Survival time


• continuous
• Time in weeks from randomization to death from any cause

S: (Log of) Time to progression


• continuous
• Time in weeks from randomization to clinical progression of the
disease or death due to the disease

N : 1194
• Individual data available on every randomized patient
• 952 (80%) have progression/death
• 50 units
• # patients per unit ∈ [2; 274]
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 96

Advanced Ovarian Cancer


CHAPTER 6. CASE STUDY: SURROGATE MARKERS 97

Advanced Ovarian Cancer


CHAPTER 6. CASE STUDY: SURROGATE MARKERS 98

6.3 Advanced Colorectal Cancer

CORFU Study

Z: Two treatment modalities


• 0: 5FU
• 1: 5FU plus folinic acid or 5FU plus interferon

T : Survival time

S: Time to progression

N : 736
• Individual data available on every randomized patient
• 694 (94.3%) have progression/death
• 76 units
• # patients per unit ∈ [2; 38]
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 99

Advanced Colorectal Cancer


CHAPTER 6. CASE STUDY: SURROGATE MARKERS 100

6.4 Definition of Surrogate Endpoint

Prentice (Bcs 1989)

“A test of H0 of no effect of treatment on surrogate is


equivalent to a test of H0 of no effect of treatment on
true endpoint.”

(S|treated) = (S|control)

(T |treated) = (T |control)
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 101

6.5 ARMD: Prentice’s Criteria

Criterion 1: Treatment Z is prognostic for surrogate S

• Sij |Zij = µS + αZij + εSij

• α = 2.83 (s.e. 1.86, P = 0.13)

Criterion 2: Treatment Z is prognostic for true


endpoint T

• Tij |Zij = µT + βZij + εT ij

• β = 4.12 (s.e. 2.32, P = 0.079)

Criterion 3: Surrogate S is prognostic for true


endpoint T

• Tij |Sij = µ + γSij + εij

• γ = 0.95 (s.e. 0.06, P < 0.0001)


CHAPTER 6. CASE STUDY: SURROGATE MARKERS 102

6.6 Proportion Explained

Freedman et al (SiM 1992)

• Description:
4. The full effect of Z on T is explained by S

• Model:
Tij |Zij , Sij = µ̃T + βS Zij + γZ Sij + ε̃T ij ,

• Definition:
β − βS
PE(T, S, Z) =
β

• Estimate:
– P E = 0.65 (95% C.I. [−0.22; 1.51])

• But: problems with P E


CHAPTER 6. CASE STUDY: SURROGATE MARKERS 103

6.7 Criticism

• Prentice criteria neither necessary nor


sufficient
– except in the binary/binary case

• P E suffers from severe problems:


– P E not restricted to unit interval
Volberding et al (1990)

Choi et al (1993)

– confidence limits (Fieller or delta) tend to be wide


Lin, Fleming, and DeGruttola (1997)

∗ unless large sample sizes


∗ unless very strong effect of Z on T

• Proposal: two new criteria:


Buyse and Molenberghs (1998)

– Relative Effect
– Adjusted Association
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 104

6.8 Relative Effect

• Can we link the effect of Z on S to the effect of Z


on T ?

• Description:
4A. The effect of Z on S predicts a clinically useful
effect of Z on T

• Definition:
β
RE(T, S, Z) =
α

• Estimate:
– RE = 1.45 (95% C.I. [−0.48; 3.39])
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 105

6.9 Adjusted Association

• What is the association between S and T , after


correction for Z ?

• Description:
4B. The correlation between S and T after correction
for Z

• Definition:
ρZ = Corr(S, T |Z)

• Estimate:
– ρZ = 0.75 (95% C.I. [0.69; 0.82])
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 106

6.10 Use of RE and Adjusted


Association

• Criticism: P E not useful


Molenberghs and Buyse (1998)

• For normal endpoints:


ρZ
PE = δ
RE
• The two new quantities have clear meaning
– Relative Effect: trial-level measure of surrogacy
Can we translate the treatment effect on the surrogate to the
treatment effect on the endpoint, in a sufficiently precise way ?

– Adjusted Association: individual-level measure


of surrogacy
After accounting for the treatment effect, is the surrogate endpoint
predictive for a patient’s true endpoint ?

• BUT:
The RE is based on a single trial ⇒ regression
through the origin, based on one point !
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 107

6.11 Analysis Based on Several Trials

• Context:
– multicenter trials
– meta analysis
– several meta analyses

• Extensions:

– Relative Effect −→ Trial-Level Surrogacy


How close is the relationship between the
treatment effects on the surrogate and true
endpoints, based on the various trials (units) ?

– Adjusted Association −→ Individual-Level


Surrogacy
How close is the relationship between the
surrogate and true outcome, after accounting for
trial and treatment effects ?
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 108

Is Considered a Useful Idea

Albert et al (SiM 1998)

“There has been little work on alternative


statistical approaches. A meta-analysis approach
seems desirable to reduce variability.
Nevertheless, we need to resolve basic problems
in the interpretation of measures of surrogacy
such as PE as well as questions about the
biologic mechanisms of drug action.”
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 109

6.12 Statistical Model

• Model:
Sij |Zij = µSi + αiZij + εSij
Tij |Zij = µT i + βiZij + εT ij

• Error structure:

– Individual level:
∗ Deviations εSij and εT ij are correlated

– Trial level:
∗ Treatment effects αi and βi are correlated
∗ (Information from intercepts µSi and µT i can be
used as well)
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 110

Statistical Model

• Model:
Sij |Zij = µSi + αiZij + εSij
Tij |Zij = µT i + βiZij + εT ij

• Error structure:
 
σ σ
 
Σ = SS ST






σT T

• Trial-specific effects:
     


µSi   µS   mSi 
     


 µT i   µT   mT i 








 =   + 

 


 αi   α   ai 
     
    
βi β bi 

• Error structure of random effects:


 
dSS dST dSa dSb 


 

dT T dT a dT b 


D= 




 daa dab 
 

dbb 
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 111

6.13 Methods of Estimation

Endpoints dimension:
• Both endpoints together
• Each endpoint separately

Center dimension:
• Center as fixed effect
• Center as random effect

Measurement error:
• No adjustment
• Adjustment by sample size per trial
• Full correction using Stijnen’s approach
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 112

6.14 ARMD: Trial-Level Surrogacy


Effect for change in visual acuity

30

20

10
at 12 months

-10

-20

-30

-40

-30 -20 -10 0 10 20

Effect for change in visual acuity at 6 months

• Prediction:
– What do we expect ?
E(β + b0|mS0, a0)
– How precisely can we estimate it ?
Var(β + b0|mS0, a0)

• Estimate:
2
– Rtrial = 0.692 (95% C.I. [0.52; 0.86])
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 113

ARMD: Trial-Level Surrogacy

• Prediction:
 T  −1  
dSb dSS dSa µS0 − µS
E(β + b0 |mS0 , a0 ) = β +      
dab dSa daa α0 − α
 T  −1  
dSb dSS dSa dSb
Var(β + b0 |mS0 , a0 ) = dbb −      
dab dSa daa dab

• Trial-level association:
 T  −1  


dSb  

dSS dSa  

dSb 
     

dab  
dSa daa  
dab 
Rb2i|mSi,ai =
dbb

• Estimate:
– Rb2i|mSi,ai = 0.692 (95% C.I. [0.52; 0.86])
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 114

6.15 ARMD: Individual-Level Surrogacy


Residual for change in visual acuity

30

20

10
at 12 months

-10

-20

-30

-40

-40 -30 -20 -10 0 10 20 30

Residual for change in visual acuity at 6 months

• Trial-level association:
ρZ = Rindiv = Corr(εT i, εSi)

• Estimate:
2
– Rindiv = 0.483 (95% C.I. [0.38; 0.59])

– Rindiv = 0.69 (95% C.I. [0.62; 0.77])

– Recall ρZ = 0.75 (95% C.I. [0.69; 0.82])


CHAPTER 6. CASE STUDY: SURROGATE MARKERS 115

ARMD: Individual-Level Surrogacy

• Conditional density:
−1
Tij |Zij , Sij ∼ N µT i − σT S σSS µSi
−1
+(βi − σT S σSS αi)Zij
−1
+ σT S σSS Sij ; σT T − σT2 S σSS
−1

• Trial-level association:
2
σST
ρZ = Rε2T i|εSi =
σSS σT T

• Estimate:
– Rε2T i|εSi = 0.483 (95% C.I. [0.38; 0.59])

– RεT i|εSi = 0.69 (95% C.I. [0.62; 0.77])

– Recall ρZ = 0.75 (95% C.I. [0.69; 0.82])


CHAPTER 6. CASE STUDY: SURROGATE MARKERS 116

6.16 Ovarian: Trial-Level Surrogacy

• Prediction:
– What do we expect ?
E(β + b0|mS0, a0)
– How precisely can we estimate it ?
Var(β + b0|mS0, a0)

• Estimate:
2
– Rtrial = 0.940 (95% C.I. [0.91; 0.97])
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 117

6.17 Ovarian: Individual-Level


Surrogacy

• Trial-level association:
ρZ = Rindiv = Corr(εT i, εSi)

• Estimate:
2
– Rindiv = 0.886 (95% C.I. [0.87; 0.90])

– Rindiv = 0.941 (95% C.I. [0.93; 0.95])

– ρZ = 0.944 (95% C.I. [0.94; 0.95])


CHAPTER 6. CASE STUDY: SURROGATE MARKERS 118

6.18 Ovarian: Prediction

unit # patients α 0 E(β + b0 |a0 ) β


+ b0

6 17 -0.58 (0.33) -0.45 (0.29) -0.56 (0.32)

8 10 0.67 (0.76) 0.49 (0.57) 0.76 (0.39)

55 31 1.08 (0.56) 0.80 (0.44) 0.79 (0.45)

DAC 274 0.25 (0.15) 0.17 (0.13) 0.14 (0.14)

GON 125 0.15 (0.25) 0.10 (0.20) 0.03 (0.22)


CHAPTER 6. CASE STUDY: SURROGATE MARKERS 119

6.19 Colorectal: Trial-Level Surrogacy

• Prediction:
– What do we expect ?
E(β + b0|mS0, a0)
– How precisely can we estimate it ?
Var(β + b0|mS0, a0)

• Estimate:
2
– Rtrial = 0.454 (95% C.I. [0.23; 0.68])
CHAPTER 6. CASE STUDY: SURROGATE MARKERS 120

6.20 Colorectal: Individual-Level


Surrogacy

• Trial-level association:
ρZ = Rindiv = Corr(εT i, εSi)

• Estimate:
2
– Rindiv = 0.665 (95% C.I. [0.62; 0.71])

– Rindiv = 0.815

– ρZ = 0.805
Chapter 7

Case Study: The prostate Cancer Data

7.1 The use of PSA to detect prostate


cancer

• U.S.: one of the most common and most costly


medical problems, and the second leading cause of
male cancer deaths.

• Important to look for markers which can detect the


disease in an early stage.

• Prostate specific antigen (PSA): level in the blood is


proportional with the volume of prostate tissue

• Still, an elevated PSA level is not necessarily an


indicator of prostate cancer because also patients with

121
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 122

benign prostatic hyperplasia (BPH) have an enlarged


volume of prostate tissue and therefore also an
increased PSA level.

• Clinical practice suggests that the rate of change in


PSA level might be a more accurate method of
detecting prostate cancer in the early stages of the
disease.
=⇒ Longitudinal Data
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 123

7.1.1 The Baltimore Longitudinal Study of


Aging (BLSA)

• Started in 1958, still ongoing

• Over 1500 males and 900 females enrolled

• Volunteers, predominantly white (95 per cent),


well-educated (over 75 per cent have college degrees),
and financially comfortable (82 per cent)

• Participants return approximately every two years for


three days of biomedical and psychological
examinations.

• Data from repeated clinical examinations and a bank


of frozen blood samples.

• An average of 7 visits and 16 years of follow-up.


CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 124

7.1.2 The prostate data from the BLSA

• Retrospective case-control study


• 18 prostate cancer patients (14+4)
20 BPH cases
16 controls
• Inclusion criteria:
1. seven or more years of follow-up prior to diagnosis
of prostate cancer, simple prostatectomy for BPH,
or exclusion of prostate disease by a urologist,
2. confirmation of the pathological diagnosis, and
3. no prostate surgery prior to diagnosis.

• To the extent possible, age at diagnosis and years of


follow-up was matched for the control, BPH and
cancer groups. However, due to the high prevalence
of BPH in men over age 50, it was difficult to find
age-matched controls with no evidence of prostate
disease. In fact, the control group remained
significantly younger at first visit and at diagnosis
compared to the BPH group, which makes it
necessary to control for age at diagnosis in all
statistical analyses.
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 125

7.1.3 Descriptive statistics

Cancer Cases
Controls BPH cases L/R M
Number of participants 16 20 14 4
Age at diagnosis (years)
median 66 75.9 73.8 72.1
range 56.7-80.5 64.6-86.7 63.6-85.4 62.7-82.8
Years of follow up
median 15.1 14.3 17.2 17.4
range 9.4-16.8 6.9-24.1 10.6-24.9 10-25.3
Time between
measurements (years)
median 2 2 1.7 1.7
range 1.1-11.7 0.9-8.3 0.9-10.8 0.9-4.8
Number of measurements
per individual
median 8 8 11 9.5
range 4-10 5-11 7-15 7-12

Complications

• Unequal number of repeated measurements per


individual: ni
• Measurements taken at arbitrary timepoints: tij
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 126

7.2 A two-stage model

7.2.1 General idea of two-stage models

Stage 1:

Assume that the data of each subject separately

can be well described by a linear regression model.

Stage 2:

Use regression techniques to investigate

the effects of age, diagnostic group, . . .

on the subject-specific regression coefficients

defined in the first stage.


CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 127

7.2.2 Applied to the prostate data

Yij = Yi(tij ) = ln(P SAi(tij ) + 1)

We assume that each profile can be described

by a quadratic function over time

Stage 1:

Yij = β1i + β2itij + β3it2ij + εij , j = 1, . . . , ni

Stage 2:
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 128

















β 1i = β1Agei + β2Ci + β3Bi + β4Li + β5Mi + b1i





































β 2i = β6Agei + β7Ci + β8Bi + β9Li + β10Mi + b2i










































β 3i = β11Agei + β12Ci + β13Bi + β14Li + β15Mi + b3i,
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 129

where
Agei = Age at time of diagnosis
















1 if Control
Ci = 














0 otherwise
















1 if BPH case
Bi = 














0 otherwise
















1 if L/R cancer case
Li = 














0 otherwise
















1 if Metastatic cancer case
Mi = 














0 otherwise
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 130

• β2, β3, β4, β5 are the average intercepts after


correction for age.

• β7, β8, β9, β10 are the average slopes for time after
correction for age.

• β12, β13, β14, β15 are the average slopes for time2 after
correction for age.
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 131

7.2.3 Matrix notation for two-stage models:

Stage 1:

Yi = Zi β i + εi
where     
     
       



Yi1 





εi1 





1 ti1 t2i1 

  
       
       
       β1i 
       
       
       


 Yi2 




 εi2 




 1 ti2 t2i2 








       
Yi = 





, εi = 





, Zi = 





, βi = 
 β2i 

       
       
 ...   ...   ... ... ...   
       
       
       
       
       
       β3i 
     
     
     
 Yini   εini   1 tini t2ini 

Stage 2:

βi = Biβ + bi,

where Bi is the appropriate (3 × 15) matrix of covariates


β is equal to (β1, . . . , β15)
bi = (b1i, b2i, b3i)
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 132

7.3 Linear mixed models

7.3.1 Stage 1 + Stage 2

Yij = Yi(tij )

= β1Agei + β2Ci + β3Bi + β4Li + β5Mi

+ (β6Agei + β7Ci + β8Bi + β9Li + β10Mi) × tij

+ (β11Agei + β12Ci + β13Bi + β14Li + β15Mi) × t2ij

+ b1i + b2i × tij + b3i × t2ij + εij

or equivalently

Yi = Xiβ + Zibi + εi

where Xi = ZiBi is a (ni × 15) matrix of covariates.


CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 133

7.4 Fitting linear mixed models in SAS

PSA example

• Factor group defined by:


– control: group = 1
– BPH: group = 2
– local cancer: group = 3
– metastatic cancer: group = 4

• time and timeclss are time, expressed in decades


before diagnosis

• age is age at the time of diagnosis

• lnpsa = ln(P SA + 1)

• Model with age-corrected quadratic evolutions within


each diagnostic group.

• Random intercepts and slopes for time and time2.

• We assume Σi = σ 2Ini
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 134

SAS program

proc mixed data = prostate method = ml;


class id group timeclss;
model lnpsa = group age group*time age*time
group*time2 age*time2 / noint solution;
random intercept time time2 / type = un subject = id;
repeated timeclss / type = simple subject = id;
run;

• PROC MIXED statement:


– calls procedure MIXED

– specifies data-set (records correspond to occasions)

– estimation method: ML, REML (default),


MIVQUE0

• CLASS statement: definition of the factors in the


model
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 135

• MODEL statement:
– response variable
– fixed effects
– options similar to SAS regression procedures

• RANDOM statement:
– definition of random effects (including intercepts !)
– identification of the ‘subjects’: independence
accross subjects
– structure of random-effects covariance matrix D
many structures available within SAS

• REPEATED statement:
– ordering of measurements within subjects
– the effect(s) specified must be of the factor-type
– identification of the ‘subjects’: independence
accross subjects
– structure of Σi
the same structures available as for the RANDOM
statement
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 136

Overview of frequently used covariance structures which can be specified in the RANDOM

and REPEATED statements of the SAS procedure MIXED. The σ-parameters are used to

denote variances and covariances, while the ρ-parameters are used for correlations.

Structure Example
 
σ12 σ12 σ13
Unstructured  σ12 σ22 σ23 
type=UN
σ13 σ23 σ32

Simple  (1)  2 (2)


Variance Compo- σ2 0 0 σ1 0 0
nents  0 σ2 0  or  0 σ22 0 
type=SIMPLE 0 0 σ2 0 0 σ32
type = VC
 
σ12 + σ 2 σ12 σ12
Compound symmetry  
σ12 σ1 + σ 2
2
σ12
type=CS
σ12 σ12 σ1 + σ 2
2

 
σ12 σ12 0
Banded  σ12 σ22 σ23 
type=UN(2)
0 σ23 σ32
 
σ2 ρσ 2 ρ2 σ 2
First-order autoregressive  ρσ 2 σ2 ρσ 2 
type=AR(1)
ρ2 σ 2 ρσ 2 σ2
 
σ2 σ12 σ13
Toeplitz  σ12 σ2 σ12 
type=TOEP
σ13 σ12 σ2
 
σ2 0 0
Toeplitz (1)  0 σ2 0 
type=Toep(1)
0 0 σ2
 
Heterogeneous com- σ12 ρσ1 σ2 ρσ1 σ3
pound symmetry  ρσ1 σ2 σ22 ρσ2 σ3 
type=CSH ρσ1 σ3 ρσ2 σ3 σ32
 
Heterogeneous first- σ12 ρσ1 σ2 ρ2 σ1 σ3
order autoregressive  ρσ1 σ2 σ22 ρσ2 σ3 
type=ARH(1) ρ2 σ1 σ3 ρσ2 σ3 σ32
 
σ12 ρ1 σ1 σ2 ρ2 σ1 σ3
Heterogeneous Toeplitz  ρ1 σ1 σ2 σ22 ρ1 σ2 σ3 
type=TOEPH
ρ2 σ1 σ3 ρ1 σ2 σ3 σ32

(1)
Example: repeated timeclss / type = simple subject = id;
(2)
Example: random intercept time time2 / type = simple subject = id;
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 137

Overview of frequently used (stationary) spatial covariance structures, which can be

specified in the RANDOM and REPEATED statements of the SAS procedure MIXED. The

correlations are positive decreasing functions of the Euclidean distances dij between the

observations. The coordinates of the observations, used to calculate these distances are

given by a set of variables the names of which are specified in the list ‘list’. The variance is

denoted by σ 2 , and ρ defines how fast the correlations decrease as functions of the dij .

Structure Example

 
1 ρd12 ρd13
Power  
σ2 
 ρ
d12
1 ρd23  
type=SP(POW)(list)
ρd13 ρd23 1

 
1 exp(−d12 /ρ) exp(−d13 /ρ)
Exponential 
2

σ  exp(−d12 /ρ) 1 exp(−d23 /ρ) 

type=SP(EXP)(list)
exp(−d13 /ρ) exp(−d23 /ρ) 1

 
1 exp(−d212 /ρ2 ) exp(−d213 /ρ2 )
Gaussian  
σ2 
 exp(−d 2
12 /ρ 2
) 1 exp(−d 2
23 /ρ 2 
) 
type=SP(GAU)(list)
exp(−d213 /ρ2 ) exp(−d223 /ρ2 ) 1
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 138

SAS output: parameter estimates

Maximum likelihood and restricted maximum likelihood estimates (MLE and REMLE) and
standard errors for all fixed effects and all variance components in the PSA model

Effect Parameter MLE (s.e.) REMLE (s.e.)


Age effect β1 0.026 (0.013) 0.027 (0.014)
Intercepts:
control β2 -1.077 (0.919) -1.098 (0.976)
BPH β3 -0.493 (1.026) -0.523 (1.090)
L/R cancer β4 0.314 (0.997) 0.296 (1.059)
Met. cancer β5 1.574 (1.022) 1.549 (1.086)
Age×time effect β6 -0.010 (0.020) -0.011 (0.021)
Time effects:
control β7 0.511 (1.359) 0.568 (1.473)
BPH β8 0.313 (1.511) 0.396 (1.638)
L/R cancer β9 -1.072 (1.469) -1.036 (1.593)
Met. cancer β10 -1.657 (1.499) -1.605 (1.626)
2
Age×time effect β11 0.002 (0.008) 0.002 (0.009)
Time2 effects:
control β12 -0.106 (0.549) -0.130 (0.610)
BPH β13 -0.119 (0.604) -0.158 (0.672)
L/R cancer β14 0.350 (0.590) 0.342 (0.656)
Met. cancer β15 0.411 (0.598) 0.395 (0.666)
Covariance of bi :
var(b1i ) d11 0.398 (0.083) 0.452 (0.098)
var(b2i ) d22 0.768 (0.187) 0.915 (0.230)
var(b3i ) d33 0.103 (0.032) 0.131 (0.041)
cov(b1i , b2i ) d12 = d21 -0.443 (0.113) -0.518 (0.136)
cov(b2i , b3i ) d23 = d32 -0.273 (0.076) -0.336 (0.095)
cov(b3i , b1i ) d13 = d31 0.133 (0.043) 0.163 (0.053)
Residual variance:
var(εij ) σ2 0.028 (0.002) 0.028 (0.002)
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 139

• ML and REML estimates different for variance


components as well as for fixed effects

• REML estimates for variance components ‘larger’


than ML estimates

• Fitted average profiles (at median ages)

• Approximate t-tests available for fixed effects


CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 140

SAS output: Iteration history

REML Estimation Iteration History

Iteration Evaluations Objective Criterion

0 1 -259.0577593
1 2 -753.2423823 0.00962100
2 1 -757.9085275 0.00444385
. . ............ ..........
6 1 -760.8988784 0.00000003
7 1 -760.8988902 0.00000000

Convergence criteria met.

• Objective functions:
1
ln (L (θ)) = − {n ln(2π) + OF (θ)}
2
ML ML

1
ln (L (θ)) = − {(n − p) ln(2π) + OF (θ)}
2
REML REML

• Number of times the objective function has been


evaluated during each iteration

• Criterion: measure of convergence:|gk Hk−1gk |/|fk |,


where fk is the objective function, gk is the gradient,
and Hk is the Hessian.
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 141

SAS output: Information on model fit

Model Fitting Information for LNPSA

Description Value

Observations 463.0000
Variance Estimate 1.0000
Standard Deviation Estimate 1.0000
REML Log Likelihood -31.2350
Akaike’s Information Criterion -38.2350
Schwarz’s Bayesian Criterion -52.6018
-2 REML Log Likelihood 62.4700
Null Model LRT Chi-Square 501.8411
Null Model LRT DF 6.0000
Null Model LRT P-Value 0.0000

• Observations: n = i=1 ni = 463


N

• Variance and standard deviation depend on


program-specification

• Maximized REML log likelihood

• Information criteria: see later

• Test for the need for covariance modelling


CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 142

SAS output: F -tests for fixed effects

Tests of Fixed Effects

Source NDF DDF Type III F Pr > F

GROUP 4 299 15.90 0.0001


AGE 1 299 3.48 0.0631
TIME*GROUP 4 299 7.85 0.0001
AGE*TIME 1 299 0.27 0.6026
TIME2*GROUP 4 299 4.44 0.0017
AGE*TIME2 1 299 0.07 0.7982

• For continuous covariates: equivalent with t test

• For factors: test whether any of the parameters


assigned to this factor is significantly different from
zero

• Details on F-tests: see later


CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 143

First parameterization of mean structure

model lnpsa = group age group*time age*time


group*time2 age*time2 / noint solution;

Solution for Fixed Effects

Effect GROUP Estimate Std Error DF t Pr > |t|

GROUP 1 -1.09842483 0.97631046 299 -1.13 0.2615 BETA2


GROUP 2 -0.52284975 1.08953050 299 -0.48 0.6317 BETA3
GROUP 3 0.29640353 1.05870264 299 0.28 0.7797 BETA4
GROUP 4 1.54938619 1.08561192 299 1.43 0.1546 BETA5
AGEDIAG 0.02655078 0.01423433 299 1.87 0.0631 BETA1
TIME*GROUP 1 0.56806200 1.47250947 299 0.39 0.6999 BETA7
TIME*GROUP 2 0.39562209 1.63767403 299 0.24 0.8093 BETA8
TIME*GROUP 3 -1.03590942 1.59277722 299 -0.65 0.5159 BETA9
TIME*GROUP 4 -1.60490411 1.62575741 299 -0.99 0.3244 BETA10
AGEDIAG*TIME -0.01116548 0.02142364 299 -0.52 0.6026 BETA6
TIME2*GROUP 1 -0.12952337 0.61005089 299 -0.21 0.8320 BETA12
TIME2*GROUP 2 -0.15845944 0.67234237 299 -0.24 0.8138 BETA13
TIME2*GROUP 3 0.34191867 0.65628931 299 0.52 0.6028 BETA14
TIME2*GROUP 4 0.39506308 0.66604953 299 0.59 0.5535 BETA15
AGEDIAG*TIME2 0.00225933 0.00882934 299 0.26 0.7982 BETA11
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 144

Tests of Fixed Effects

Source NDF DDF Type III F Pr > F

GROUP 4 299 15.90 0.0001


AGEDIAG 1 299 3.48 0.0631
TIME*GROUP 4 299 7.85 0.0001
AGEDIAG*TIME 1 299 0.27 0.6026
TIME2*GROUP 4 299 4.44 0.0017
AGEDIAG*TIME2 1 299 0.07 0.7982

Source Null hypothesis


Group H1 : β2 = β3 = β4 = β5 = 0
Age H2 : β1 = 0
Time∗group H3 : β7 = β8 = β9 = β10 = 0
Age∗group H4 : β6 = 0
Time2∗group H5 : β12 = β13 = β14 = β15 = 0
Age∗time2 H6 : β11 = 0
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 145

Second parameterization of mean structure

model lnpsa = group age time group*time age*time


time2 group*time2 age*time2 / solution;

Solution for Fixed Effects

Parameter Estimate Std Error DDF T Pr > |T|

INTERCEPT 1.549386 1.085611 49 1.43 0.1599


GROUP 1 -2.647811 0.393111 300 -6.74 0.0001
GROUP 2 -2.072235 0.383595 300 -5.40 0.0001
GROUP 3 -1.252982 0.393223 300 -3.19 0.0016
GROUP 4 0.000000 . . . .
AGE 0.026550 0.014234 300 1.87 0.0631
TIME -1.604904 1.625757 49 -0.99 0.3284
TIME*GROUP 1 2.172966 0.583601 300 3.72 0.0002
TIME*GROUP 2 2.000526 0.567835 300 3.52 0.0005
TIME*GROUP 3 0.568994 0.579436 300 0.98 0.3269
TIME*GROUP 4 0.000000 . . . .
AGE*TIME -0.011165 0.021423 300 -0.52 0.6026
TIME2 0.395063 0.666049 50 0.59 0.5558
TIME2*GROUP 1 -0.524586 0.234146 300 -2.24 0.0258
TIME2*GROUP 2 -0.553522 0.223216 300 -2.48 0.0137
TIME2*GROUP 3 -0.053144 0.226748 300 -0.23 0.8149
TIME2*GROUP 4 0.000000 . . . .
AGE*TIME2 0.002259 0.008829 300 0.26 0.7982
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 146

Tests of Fixed Effects

Source NDF DDF Type III F Pr > F

GROUP 3 300 20.13 0.0001


AGE 1 300 3.48 0.0631
TIME 1 49 0.07 0.7885
TIME*GROUP 3 300 10.41 0.0001
AGE*TIME 1 300 0.27 0.6026
TIME2 1 50 0.03 0.8616
TIME2*GROUP 3 300 5.93 0.0006
AGE*TIME2 1 300 0.07 0.7982

Source Null hypothesis


Group H7 : β2 = β3 = β4 = β5
Age H8 : β1 = 0
Time H9 : (β7 + β8 + β9 + β10)/4 = 0
Time∗group H10 : β7 = β8 = β9 = β10
Age∗group H11 : β6 = 0
Time2 H12 : (β12 + β13 + β14 + β15)/4 = 0
Time2∗group H13 : β12 = β13 = β14 = β15
Age∗time2 H14 : β11 = 0
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 147

7.5 Inference for contrasts of fixed


effects

7.5.1 The CONTRAST statement

• Testing general linear hypotheses of the form

H0 L β = 0,

for some specific known matrix L.

• The set of linear combinations Lβ is sometimes


called a contrast (or a set of contrasts) of the fixed
effects β.
• F-statistic:

   −1
   −1  −1 
β L L N
X V
i=1 i i ( α)X i L L β


F = ,
rank(L)

• Null-distribution approximately F with rank(L)


numerator degrees of freedom.
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 148

• Several methods available to estimate the


denominator degrees of freedom:
– containment method (default)

– Satterthwaite’s approximation

– ...

• In the context of longitudinal data, the resulting


p-values are often very similar under different
approximation methods.
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 149

Example 1

• Testing whether the local cancer cases evolve different


from the metastatic cancer cases.

• The null-hypothesis is specified by








β4 = β5

H0 :  β9 = β10

14 = β15 ,


 β

which is equivalent with testing

 
0 0 0 1 −1 0 0 0 0 0 0 0 0 0 0 

H0 : 
  β = 0,
0 0 0 0 0 0 0 0 1 −1 0 0 0 0 0 



 
0 0 0 0 0 0 0 0 0 0 0 0 0 1 −1
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 150

• The contrast can be tested within PROC MIXED by


adding the following statement to the original
program:

contrast ’L/R can = Met can’ group 0 0 1 -1,


group*time 0 0 1 -1,
group*time2 0 0 1 -1;

• Several CONTRAST statements (with different


labels) are allowed

• Additional table in the output:

CONTRAST Statement Results

Source NDF DDF F Pr > F

L/R can = Met can 3 299 5.86 0.0007


CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 151

Example 2

Default F -tests under the first

parameterization of the mean structure.

Null hypothesis Contrast statement


H1 : β2 = β3 = β4 = β5 = 0 contrast ’H1’ group 1 0 0 0,
group 0 1 0 0,
group 0 0 1 0,
group 0 0 0 1;
H2 : β1 = 0 contrast ’H2’ age 1;
H3 : β7 = β8 = β9 = β10 = 0 contrast ’H3’ group*time 1 0 0 0,
group*time 0 1 0 0,
group*time 0 0 1 0,
group*time 0 0 0 1;
H4 : β6 = 0 contrast ’H4’ age*time 1;
H5 : β12 = β13 = β14 = β15 = 0 contrast ’H5’ group*time2 1 0 0 0,
group*time2 0 1 0 0,
group*time2 0 0 1 0,
group*time2 0 0 0 1;
H6 : β11 = 0 contrast ’H6’ age*time2 1;
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 152

• The results are:

CONTRAST Statement Results

Source NDF DDF F Pr > F

H1 4 299 15.90 0.0001


H2 1 299 3.48 0.0631
H3 4 299 7.85 0.0001
H4 1 299 0.27 0.6026
H5 4 299 4.44 0.0017
H6 1 299 0.07 0.7982

• Exactly the same as the results obtained from the


default table ‘Tests of Fixed Effects’
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 153

Example 3

Testing group-differences under first

parameterization of the mean structure.

Null hypothesis Contrast statement


H7 : β2 = β3 = β4 = β5 contrast ’H7’ group 1 -1 0 0,
group 1 0 -1 0,
group 1 0 0 -1;
H8 : β1 = 0 contrast ’H8’ agediag 1;
H9 : β7 + β8 + β9 + β10 = 0 contrast ’H9’ group*time 1 1 1 1;
H10 : β7 = β8 = β9 = β10 contrast ’H10’ group*time 1 -1 0 0,
group*time 1 0 -1 0,
group*time 1 0 0 -1;
H11 : β6 = 0 contrast ’H11’ agediag*time 1;
H12 : β12 + β13 + β14 + β15 = 0 contrast ’H12’ group*time2 1 1 1 1;
H13 : β12 = β13 = β14 = β15 contrast ’H13’ group*time2 1 -1 0 0,
group*time2 1 0 -1 0,
group*time2 1 0 0 -1;
H14 : β11 = 0 contrast ’H14’ agediag*time2 1;
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 154

• The results are:

CONTRAST Statement Results

Source NDF DDF F Pr > F

H7 3 299 20.13 0.0001


H8 1 299 3.48 0.0631
H9 1 299 0.07 0.7876
H10 3 299 10.41 0.0001
H11 1 299 0.27 0.6026
H12 1 299 0.03 0.8610
H13 3 299 5.93 0.0006
H14 1 299 0.07 0.7982

• The same F -statistics as the ones obtained from the


default F -tests under the reparameterized mean
structure

• But different denominator degrees of freedom (due to


containment method)

• However, only small changes in p-values


CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 155

Example 4

Model reduction

Hierarchical CONTRAST statements lead to the


following simplifications:

• no interaction age × time2

• no interaction age × time

• quadratic time effect the same for both cancer groups

• the quadratic time effect is not significant for the


non-cancer groups

• the linear time effect is not significant for the controls


CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 156

• Simultaneous testing of all these hypotheses can be


done by adding the following CONTRAST statement
to the original program:

contrast ’Final model’ age*time2 1,


age*time 1,
group*time2 1 -1 0 0,
group*time2 0 0 1 -1,
group*time2 1 0 0 0,
group*time2 0 1 0 0,
group*time 1 0 0 0;

• This results in the following table in the output:

CONTRAST Statement Results

Source NDF DDF F Pr > F

Final model 6 299 0.56 0.7583

• Note that the matrix L is not of full rank:


the third row equals the fifth row minus the sixth row
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 157

7.5.2 The ESTIMATE statement

• Can cancer patients be better discriminated from


BPH cases using the rate of increase of PSA rather
than just one single measurement of PSA ?

• How can we estimate the average difference in


ln(1 + P SA) as well as the average difference in the
rate of increase of ln(1 + P SA), for example 5 years
prior to diagnosis ?

• We ignore the metastatic cancer cases

• The average difference, 5 years prior to diagnosis


equals
DIFF(t = 5 years)
 
2 
= β1 age + β4 + β9 t + β14 t 

t=0.5
− (β1 age + β3 + β8 t)|t=0.5

= −β3 + β4 − 0.5 β8 + 0.5 β9 + 0.25 β14


CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 158

• The average difference in rate of change, 5 years prior


to diagnosis equals
DIFFRATE(t = 5 years)

∂  
2 
= β1 age + β4 + β9 t + β14 t 
∂t 
t=0.5

∂ 
− (β1 age + β3 + β8 t)
∂t t=0.5

= −β8 + β9 + β14

• Both quantities are of the form Lβ, for specific


(1 × 15) matrices L.

• Estimation of such linear combinations of fixed effects


can be done by adding the following ESTIMATE
statement:
estimate ’DIFF, t = 5yrs’ group 0 -1 1 0
bph*time -0.5
loccanc*time 0.5
cancer*time2 0.25
/ cl alpha = 0.05;

estimate ’DIFFRATE, t = 5yrs’ bph*time -1


loccanc*time 1
cancer*time2 1
/ cl alpha = 0.05;
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 159

• The matrices L are specified as in the CONTRAST


statement.

• Only matrices with one row are allowed

• The option ‘cl alpha=0.05’ to request an approximate


t-type 95% confidence interval

• A new table is added to the output:


ESTIMATE Statement Results

Parameter Estimate Std Error DDF T

DIFF, t = 5yrs 0.22081242 0.14573103 301 1.52


DIFFRATE, t = 5yrs -0.95067653 0.16587627 301 -5.73

Pr > |T| Alpha Lower Upper

0.1308 0.05 -0.0660 0.5076


0.0001 0.05 -1.2771 -0.6243

• 5 years prior to diagnosis, there is on average no


signficant difference, but the average rates of change
are significantly different
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 160

7.6 Inference for variance components

7.6.1 Example

• Do we need subject-specific quadratic time-effects in


our model ?

• If not, then all b3i equal zero

• In the marginal model, this corresponds to setting


appropriate variance components equal to zero:
H0 : d13 = d23 = d33 = 0

• Note that, due to the fact that the hierarchical and


the marginal model are not equivalent, rejecting
above null-hypothesis does not imply the presence of
subject-specific quadratic time-effects

• The above H0 rather tests whether the covariance


structure Vi = ZiDZi + Σi can be simplified.
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 161

7.6.2 Wald tests for variance components

• Adding the ‘covtest’ option to the PROC MIXED


statement, SAS reports standard errors and Wald
tests for all variance components:

Covariance Parameter Estimates (REML)

Cov Parm Subject Estimate Std Error Z Pr > |Z|

UN(1,1) XRAY 0.44315715 0.09348532 4.74 0.0001 D11


UN(2,1) XRAY -0.49032753 0.12385751 -3.96 0.0001 D12
UN(2,2) XRAY 0.84160121 0.20326977 4.14 0.0001 D22
UN(3,1) XRAY 0.14795060 0.04701598 3.15 0.0017 D13
UN(3,2) XRAY -0.29997791 0.08195018 -3.66 0.0003 D23
UN(3,3) XRAY 0.11415478 0.03453537 3.31 0.0009 D33
TIMECLSS XRAY 0.02837400 0.00227601 12.47 0.0001 sigma2

• In our context, d33 = 0 does not make sense if d13


and d23 are not zero

• The Wald tests assume asymptotic normality of the


parameter estimates. However this is not satisfied for
the variances d11, d22, d33 and σ 2, due to boundary
problems.

• Classical results on MLE’s don’t hold in this context


CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 162

7.6.3 Likelihood ratio test

• No asymptotic chi-squared null-distribution for the


likelihood ratio test statistic, due to the boundary
problems

• However, some results are available for the case


Σi = σ 2Ini

• Typically, the asymptotic null-distribution is a mixture


of χ2 distributions, rather than a single χ2

• We denote the LR test statistic by


 
 L(θ 0 ) 


−2 ln λ = −2 ln    ,
L(θ 1)
N

where θ
0 and θ 1 are ML or REML estimates under


the null-hypothesis and under the alternative


hypothesis respectively.
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 163

Case 1: No Random Effects versus One Random


Effect

• The hypothesis of interest is


H0 : D = 0 versus H1 : D = d11
where d11 is a non-negative scalar
• The asymptotic null-distribution:
d 1 2 1
−2 ln λ −→ χ0 + χ21
2 2
N

Case 2: One versus Two Random Effects

• The hypothesis of interest is


 
 d11 0 
H0 : D =  
0 0
for some strictly positive d11, versus H1 that D is a
(2 × 2) positive semi-definite matrix.
• The asymptotic null-distribution:
d 1 2 1
−2 ln λ −→ χ1 + χ22
2 2
N
CHAPTER 7. CASE STUDY: THE PROSTATE CANCER DATA 164

Applied to the Prostate Data

• Model 1: random intercepts, time, time2


Model 2: random intercepts, time
Model 3: random intercepts
Model 4: no random effects

• Test statistics:

Maximum likelihood
Asymptotic null distribution
Hypothesis −2 ln(λN ) Correct Naive
Model 2 versus Model 1 94.270 χ22:3 χ23
Model 3 versus Model 2 161.016 χ21:2 χ22
Model 4 versus Model 3 240.114 χ20:1 χ21
Restricted maximum likelihood
Asymptotic null distribution
Hypothesis −2 ln(λN ) Correct Naive
Model 2 versus Model 1 92.796 χ22:3 χ23
Model 3 versus Model 2 165.734 χ21:2 χ22
Model 4 versus Model 3 245.874 χ20:1 χ21

• The covariance structure cannot be simplified deleting


random effects from the model
Chapter 8

Parametric Modeling Families

8.1 Continuous Case

8.1.1 Marginal Models

E(Yij |xij ) = xTij β


cf. cross-sectional study

• CD4+: characterize the average CD4+ level as a


function of time.
• Assumptions about the correlation structure must be
included in the model.

165
CHAPTER 8. PARAMETRIC MODELING FAMILIES 166

8.1.2 Random-Effects Models

Correlation arises because regression coefficients vary


among individuals (heterogeneity).

E(Yij |β i, xij ) = xTij β i.


The number of parameters increases with the number of
subjects

⇒ inconsistency

⇒ further assumptions:

βi = β + U i
with

• β an unknown but constant regression parameter


vectors: fixed effects
• U i a random vector with mean zero: random
effects.
• Random effects are useful in describing individual
CD4+ trajectories.
CHAPTER 8. PARAMETRIC MODELING FAMILIES 167

8.1.3 Transition Models

The conditional expectation of Yij , given past outcomes


Yi1, . . . , Yi,j−1 and (past and present) covariates.

E(Yij |Yi,j1 , . . . , Yi1, xij ) = xTij β + αYi,j−1.

This model could be fitted using regression software.


CHAPTER 8. PARAMETRIC MODELING FAMILIES 168

8.2 Longitudinal Generalized Linear


Models

• Also non-normal data can be measured repeatedly


(over time).
• GLMs need to be extended.
• Basic idea: account for correlation.
• Normal longitudinal data form a fairly recent branch
in statistics.
• Non-normal longitudinal data are an active area of
research.
• β denotes mean response parameters
• α is used for covariance parameters
• There are several ways to express generalized linear
models for longitudinal data.
• Unlike in the normal case, the choice made is crucial.
There is no transition from one family to the other in
terms of simple functions of the model parameters.
CHAPTER 8. PARAMETRIC MODELING FAMILIES 169

8.2.1 Marginal Models

• Specify E(Yj |X).


The probability of each outcome (or set of outcomes)
is directly modelled (integrating or summing the other
outcomes away).
• Sometimes called: population averaged method.
• Non-likelihood models:
– Empirical generalized least squares (EGLS)
∗ (Koch et al, 1977) (SAS CATMOD)
– Generalized estimating equations (GEE)
∗ Liang and Zeger (1986)
∗ Lipsitz, Laird and Harrington (1991)
∗ Liang, Zeger and Qaqish (1992)
∗ Zhao and Prentice (1992)
∗ Lipsitz and Zhao (1995)
∗ Rotnitzky, Robins, and Zhao (1995)
∗ ...
∗ SAS procedure GENMOD
CHAPTER 8. PARAMETRIC MODELING FAMILIES 170

• Likelihood methods:
– Multivariate Probit Model
Ashford and Sowden (1970)

– Bahadur Model
Bahadur (1962)

– Odds ratio model for bivariate data


Dale (1986)

– Odds ratio models for multivariate data


∗ constraint equations approach
Lang and Agresti (1994)
∗ multivariate Dale model
Molenberghs and Lesaffre (1994)
∗ multivariate logit type models
Glonek and McCullagh (1995)
CHAPTER 8. PARAMETRIC MODELING FAMILIES 171

8.2.2 Random-effects Models

• Specify E(Y |b, X).


The outcome(s), conditional on an unobserved
(latent) random effect or set of random effects.
• Beta-binomial model
– Skellam (1948)
– Kleinman (1973)
– Molenberghs, Declerck, and Aerts (1997)

• Generalized linear mixed models


– Engel and Keen (1992)
– Breslow and Clayton (1993)
– Wolfinger and O’Connell (1995)

• Hierarchical generalized linear models


– Lee and Nelder (1996)

• GLIMMIX macro in SAS


• NLMIXED procedure in SAS
• MIXOR
• MLwiN
• ...
CHAPTER 8. PARAMETRIC MODELING FAMILIES 172

8.2.3 Conditional Models

• Specify E(Yj |{Yk }, X).


An outcome or set of outcomes is modelled,
conditional on the other outcomes or at least a set of
other outcomes.
• In a longitudinal setting, a very relevant family is the
so-called set of transition models:
E(Yj |{Y1, . . . , Yj−1}, X),
(e.g., Markov models.)
• Rather than integrating over other outcomes, they
have to be conditioned upon.
• multivariate exponential family: Cox (1972)
this model forms the basis for the well known
loglinear model.
CHAPTER 8. PARAMETRIC MODELING FAMILIES 173

8.2.4 Marginal Versus Conditional Models

• Fitting marginal models is fairly involved.


• Marginal association parameters highly constrained.
• Marginal models are reproducible
(upward compatible).
This property is particularly relevant when sequences
are of unequal length.
• Some models combine “the best of both worlds”:
– mixed marginal-conditional model
Fitzmaurice and Laird (1993)

– Alternating logistic regressions


Carey, Diggle, Zeger (1994)

– 2nd order mixed parametrization/ GEE2


Molenberghs and Ritter (1996)
Molenberghs and Danielson (1999)
CHAPTER 8. PARAMETRIC MODELING FAMILIES 174

8.3 Main Focus

• Marginal: Generalized Estimating Equations


• Random Effects: Generalized Linear Mixed Models
Chapter 9

Modelling Repeated Categorical Data

9.1 Notation

It is useful to have a double notation for multivariate


categorical data:

• The standard (regression) notation


• The contingency table notation

175
CHAPTER 9. MODELLING REPEATED CATEGORICAL DATA 176

9.1.1 The Standard (Regression) Notation

This notation is in agreement with previous notation.

• Let the outcomes for subject i = 1, . . . , N be


denoted as (Yi1, . . . , Yini ).
– Binary data: each component is either 0 or 1.
– (Binary data: each component is either 1 or 2.)
– (Binary data: each component is either −1 or +1.)
– (Categorical data: Yij ∈ {1, . . . , c}.)
• The corresponding covariate vector is xij .
CHAPTER 9. MODELLING REPEATED CATEGORICAL DATA 177

9.1.2 The Table Notation

It often happens that a lot of individuals have the same


covariate

(e.g. a trial with treatment or control as the only


covariate).

A contingency table approach is useful.

• Let Zi(k1, . . . , kN ) be the cells in an N -way


contingency table.
Here, kj = 0, 1 for j = 1, . . . , n.
• Here, i denotes the covariate level or design level,
grouping all individuals with covariate vector xi or
covariate vectors xi1, . . . , xiN .
For ease of notation, no new index has been
introduced.
• The corresponding cell probabilities are
µi(k1, . . . , kn).
• The tables and probabilities are summarized as Z i
and µi.
CHAPTER 9. MODELLING REPEATED CATEGORICAL DATA 178

• Lower dimensional tables are found by summing over


an appropriate set of indices.
• The univariate probabilities are:
µij = µi(+, . . . , +, kj = 1, +, . . . , +)
with corresponding count Zij .
The shorthand notation µij and Zij is easier than the
+ notation.
• Note that
E(Yij ) = Pr(Yij = 1) = µij .

• We also use bivariate counts and probabilities:


– two-way counts Zijk
– two-way probabilities
µijk = E(Yij Yik ) = Pr(Yij = 1, Yik = 1).

This notation extends easily to categorical data.

Then, each component Yij assumes values 1, . . . , c for a


c category variable.
CHAPTER 9. MODELLING REPEATED CATEGORICAL DATA 179

9.2 A Conditional Model

The Log-linear Model

Specifies the joint distribution of Y i in terms of a


multivariate exponential family:

n
 
f (y i, θ i) = exp 
 θij yij + θij1j2 yij1 yij2 + . . . +
j=1 j1 <j2
θi1...n yi1 . . . yin − A(θ i))

n
 
= c(θ i) exp  θij yij + θij1j2 yij1 yij2 + . . . +
j=1 j1 <j2
θi1...n yi1 . . . yin )

where A(θ i) and c(θ i) represent the same normalizing


constant, written in additive and multiplicative way,
respectively.

• θ i is the canonical parameter, consisting of first,


second, up to nth order components.
• The model was proposed by Cox (1972).
CHAPTER 9. MODELLING REPEATED CATEGORICAL DATA 180

9.2.1 Interpretation of Parameters

The parameters have a conditional interpretation:


 
Pr(Yij = 1|Yik = 0; k =
  j) 
θij = ln 
 
Pr(Yij = 0|Yik = 0; k = j)
⇒ the first order parameters (main effects) are
interpreted as conditional logits.
Similarly,
 
Pr(Yij = 1, Yik = 1|Yi = 0; k, j =
 )Pr(Yij = 0, Yik = 0|Yi = 0; k, j = ) 
θijk = ln 
Pr(Yij = 1, Yik = 0|Yi = 0; k, j = )Pr(Yij = 0, Yik = 1|Yi = 0; k, j = )

These are conditional log odds ratios.


CHAPTER 9. MODELLING REPEATED CATEGORICAL DATA 181

• Advantages:
– The parameter vector is not constrained. All
values of θ ∈ IR yield nonnegative probabilities.
– Calculation of the joint probabilities is fairly
straightforward:
∗ ignore the normalizing constant
∗ evaluate the density for all possible sequences y
∗ sum all terms to yield c(θ)−1
• Drawbacks:
– Due to above conditional interpretation, the
models are less useful for regression.
The dependence of E(Yij ) on covariates involves all
parameters, not only the main effects.
– The interpretation of the parameters depends on
the length ni of a sequence.
Shorter sequences imply that one conditions on less
outcomes, changing interpretation with length of sequence.

These drawbacks make marginal models or models


that combine marginal and conditional features better
suited.
CHAPTER 9. MODELLING REPEATED CATEGORICAL DATA 182

9.3 Marginal Models

Whereas the probability mass function of the loglinear


model follows at once, a set of choices have to be made
in a marginal model:

• Description of mean profiles (univariate parameters)


and of association (bivariate and higher order
parameters)
• Degree of modelling:
– joint distribution fully specified ⇒ likelihood
procedures
– or only a limited number of moments ⇒ e.g.,
generalized estimating equations

Minimally, one specifies:

• η i(µi) = {ηi1(µi1), . . . , ηin(µin)}


• E(Y i) = µi and η i(µi) = X iβ
• var(Y i) = φv(µi) where v(.) is a known variance
function
• corr(Y i) = R(α)
CHAPTER 9. MODELLING REPEATED CATEGORICAL DATA 183

Remarks

• Choosing logit links ⇒ (extensions of) logistic


regression.
• Although this is seemingly a straightforward
generalization of univariate GLMs, there are problems:
– the probability model is not fully specified:
∗ only the first moment is specified; the variances
usually follow immediately from the first
moment (cf. binary, counts);
∗ the covariances involve the correlations, they are
specified separately, using a different parameter
vector α
∗ estimation of α is still left blank
∗ still, the third and higher moments have not
been specified
– some of these models are severely constrained
(e.g., Bahadur)
– estimation can proceed through generalized
estimating equations
CHAPTER 9. MODELLING REPEATED CATEGORICAL DATA 184

9.3.1 Link Functions

An important sub-family of
η i = η i(µi)
is the log-contrast family:
η i(µi) = C ln(Aµi),
with

• A a matrix defining an appropriate set of probabilities,


• C a matrix defining contrasts of log probabilities.
• Advantages:
– Facilitate computations
– Encompass popular links such as logit link

We will consider special cases of link functions.


CHAPTER 9. MODELLING REPEATED CATEGORICAL DATA 185

Univariate Link Functions

(Sometimes called marginal links, although this term is


slightly misleading).

• The marginal logit link:


ηij = ln(µij ) − ln(1 − µij ) = logit(µij ).

• Some links, such as the probit link:


ηij = Φ−1
1 (µij ),

and the complementary log-log link are excluded from


this particular family.
However, this is not a major obstacle.
CHAPTER 9. MODELLING REPEATED CATEGORICAL DATA 186

Pairwise Association

• Success probability approach.


(Ekholm 1991)
Logit link for two-way probabilities
ηijk = ln(µijk ) − ln(1 − µijk ) = logit(µijk ),
• Marginal correlation coefficient.
(Bahadur model)
µijk − µij µik
ρijk = 
µij (1 − µij )µik (1 − µik )
ηijk = ln(1 + ρijk ) − ln(1 − ρijk ).
• Marginal odds ratio.
(Dale model)
(µijk )(1 − µij − µik + µijk )
ψijk =
(µik − µijk )(µij − µijk )
 
 Pr(Yij = 1, Yik = 1)Pr(Yij = 0, Yik = 0) 
=  

Pr(Yij = 0, Yik = 1)Pr(Yij = 1, Yik = 0)
ηijk = ln(ψijk )

Observe that this odds ratio has the same structure as


the one in the log-linear model. However, we do not need
to condition on the other outcomes. Hence, the name
marginal odds ratio.
CHAPTER 9. MODELLING REPEATED CATEGORICAL DATA 187

Higher Order Association

• All three extend naturally to higher orders


• Working in terms of correlations leads to the Bahadur
model
• Using odds ratios all the way through yields the
multivariate Dale model (multivariate odds ratio
model)

For instance, let us consider the three-way odds ratio:


Pr(Yij = 1, Yik = 1, Yi = 1)Pr(Yij = 1, Yik = 0, Yi = 0)
ψijk =
Pr(Yij = 1, Yik = 1, Yi = 0)Pr(Yij = 1, Yik = 0, Yi = 1)
Pr(Yij = 0, Yik = 1, Yi = 0)Pr(Yij = 0, Yik = 0, Yi = 1)
×
Pr(Yij = 0, Yik = 1, Yi = 1)Pr(Yij = 0, Yik = 0, Yi = 0)

ψijk (Yi = 1)
=
ψijk (Yi = 0)

Higher order odds ratios are defined similarly.


Chapter 10

Case Study: NTP Data

Developmental Toxicity Studies

• Research Triangle Institute US National Toxicology Program

• The effect in mice of 3 chemicals:


– DEHP: di(2-ethyhexyl)-phtalate
– EG: ethylene glycol
– DYME: diethylene glycol dimethyl ether
• Implanted fetuses:
– death/resorbed
– viable:
∗ weight
∗ malformations: visceral, skeletal, external

188
CHAPTER 10. CASE STUDY: NTP DATA 189

10.1 Data Structure of Developmental


Toxicity Studies

dam
@
@
@
@
@
? R
@
. . .implant (mi). . .
@
@
@
@
@
R
@
viable (ni) non-viable (ri)
A A
 A  A
 A  A
 A  A
 A  A
 AU  AU
malf. (zi)weight deathresorption
A
 A
 A
 A
 A
 ? AU
1 ... K
CHAPTER 10. CASE STUDY: NTP DATA 190

10.2 Design

• Restrict attention to malformations:


– visceral
– skeletal
– external
– collapsed: any
• control group and 3 or 4 dose groups
dose is cluster level covariate

• each group: 20 to 30 dams


• offspring per litter: 2 to 17 fetuses
CHAPTER 10. CASE STUDY: NTP DATA 191

10.3 Goals

• Describe dose-response relationship


• Test for dose effect
• Estimation of dose effect
• Determine a benchmark dose: BD
effective dose, virtually safe dose
• Use a fully quantitative approach
CHAPTER 10. CASE STUDY: NTP DATA 192

10.4 Issues

• Account for clustering (litter effect)


• Univariate versus multivariate approach:
– individual malformation types
– trivariate model
– collapsed outcome
• Fetus versus dam to mimick processes in humans
• Method of estimation ?
• Family of models ?
CHAPTER 10. CASE STUDY: NTP DATA 193

10.5 Quadratic Log-linear Model

Cox (1972) and others suggest that often the higher


order interactions can be neglected. This claim is
supported by empirical evidence.

10.6 The quadratic exponential model

 
n
  
f (y i, θ i) = exp  θij yij + θij1j2 yij1 yij2 − A(θ i)
j=1 j1 <j2
 
n
  
= c(θ i) exp  θij yij + θij1j2 yij1 yij2  .
j=1 j1 <j2

10.7 The linear exponential model


 
n

f (y i, θ i) = exp  θij yij − A(θ i)
j=1
then this model reflects the assumption of independence.

• Equals logistic regression.


• Both conditional and marginal.
CHAPTER 10. CASE STUDY: NTP DATA 194

10.8 Specialized to Clustered Binary


Data

• NTP data
• Yij is malformation indicator for fetus j in litter i
• Code Yij as −1 or 1
• di is dose level at which litter i is exposed
• Simplification:
θij = θi = β0 + βddi,
θij1j2 = βa.

• Using
n
i
Zi = Yij
j=1
we obtain
 
ni
f (zi|θi, βa) = 


 exp {θ z + β z (n − z ) − A(θ )} ,
i i a i i i i
zi 
CHAPTER 10. CASE STUDY: NTP DATA 195

10.9 Quadratic Clustered Loglinear


Model

Maximum Likelihood Estimates (Standard Errors) for the Conditional Model.

Outcome Parameter DEHP EG DYME


External β0 -2.81(0.58) -3.01(0.79) -5.78(1.13)
βd 3.07(0.65) 2.25(0.68) 6.25(1.25)
βa 0.18(0.04) 0.25(0.05) 0.09(0.06)
Visceral β0 -2.39(0.50) -5.09(1.55) -3.32(0.98)
βd 2.45(0.55) 3.76(1.34) 2.88(0.93)
βa 0.18(0.04) 0.23(0.09) 0.29(0.05)
Skeletal β0 -2.79(0.58) -0.84(0.17) -1.62(0.35)
βd 2.91(0.63) 0.98(0.20) 2.45(0.51)
βa 0.17(0.04) 0.20(0.02) 0.25(0.03)
Collapsed β0 -2.04(0.35) -0.81(0.16) -2.90(0.43)
βd 2.98(0.51) 0.97(0.20) 5.08(0.74)
βa 0.16(0.03) 0.20(0.02) 0.19(0.03)
CHAPTER 10. CASE STUDY: NTP DATA 196

10.10 The Bahadur Model

Maximum Likelihood Estimates (Standard Errors) for the Bahadur Model.

Outcome Parameter DEHP EG DYME


External β0 -4.93(0.39) -5.25(0.66) -7.25(0.71)
βd 5.15(0.56) 2.63(0.76) 7.94(0.77)
βa 0.11(0.03) 0.12(0.03) 0.11(0.04)
Visceral β0 -4.42(0.33) -7.38(1.30) -6.89(0.81)
βd 4.38(0.49) 4.25(1.39) 5.49(0.87)
βa 0.11(0.02) 0.05(0.08) 0.08(0.04)
Skeletal β0 -4.67(0.39) -2.49(0.11) -4.27(0.61)
βd 4.68(0.56) 2.96(0.18) 5.79(0.80)
βa 0.13(0.03) 0.27(0.02) 0.22(0.05)
Collapsed β0 -3.83(0.27) -2.51(0.09) -5.31(0.40)
βd 5.38(0.47) 3.05(0.17) 8.18(0.69)
βa 0.12(0.03) 0.28(0.02) 0.12(0.03)
Chapter 11

Generalized Estimating Equations

We have seen that the score equations for estimating β


in a classical (univariate) GLM take the form
n
 ∂µi −1
S(βj ) = vi (yi − µi) = 0
i=1 ∂βj

with vi = Var(Yi).

(Here, Yi is scalar.)

197
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 198

The corresponding vector notation


S(β) = DT V −1(y − µ) = 0
where

• D is an N × p matrix with (i, j)th element


∂µi
∂βj
• V is an N × N diagonal matrix with non-zero
elements proportional to Var(Yi)
• y and µ are N -vectors with elements yi and µi
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 199

In the longitudinal setting, all of this carries over, except


that

• the single element scalars yi and µi are replaced by ni


element vectors y i and µi associated with the
sequences of ni observations on the ith subject
• the corresponding matrices V i = Var(Y i) involve a
set of nuisance parameters, α say, which determine
the covariance structure of Y i.
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 200

The score equations for the complete set of data,


Y = (Y 1, . . . , Y N ) from the N subjects can now be
written as
N
DiT [V i(α)]−1 (y i − µi) = 0.

S(β) =
i=1

• These equations have the same form as the score


equations of the full likelihood procedure of, e.g., the
odds ratio model.
• We restrict specification to the first moment only
(hence only Y i).
The second moment is only specified in the variances,
not in the correlations.
• Solution of the score equations uses a multivariate
version of the iteratively weighted least squares
algorithm, provided that the variance matrices Vi(α)
are known (including the numerical values of α).
• Alternatively, a Fisher scoring algorithm can be used
• Liang and Zeger (1986)
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 201

11.1 Large Sample Properties

As N → ∞

N (β̂ − β) ∼ N (0, I0−1)
where
N
DiT [Vi(α)]−1Di

I0 =
i=1

• (Unrealistic) Conditions:
– α is known
– the parametric form for V i(α) is known
• Solution: working correlation matrix
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 202

11.2 Unknown Covariance Structure

Keep the score equations


N
[Di]T [Vi(α)]−1 (y i − µi) = 0

S(β) =
i=1
BUT

• suppose V i(.) is not the true variance of Y i but only


a plausible guess, a so-called working correlation
matrix
• specify correlations and not covariances, because the
variances follow from the mean structure
• the score equations are solved as before

The asymptotic normality results change to



N (β̂ − β) ∼ N (0, I0−1I1I0−1)
N
DiT [Vi(α)]−1Di

I0 =
i=1
N
DiT [Vi(α)]−1Var(Y i)[Vi(α)]−1Di.

I1 =
i=1
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 203

11.3 The Sandwich Estimator

• This is the so-called sandwich estimator:


– I0 is the bread
– I1 is the filling (ham or cheese)
• the known-variance result is recovered when the guess
is actually equal to the true model
• the estimators β̂ are consistent even if the working
correlation matrix is incorrect
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 204

• An estimate is found by replacing the unknown


variance matrix Var(Y i) by
(Y i − µ̂i)(Y i − µ̂i)T
where µ̂i = µi(β̂)
Even if this estimator is bad for Var(Y i) it leads to a
good estimate of I1, provided that:
– the replication in the data is sufficiently large
– the same model for µi is fitted to groups of
subjects
– observation times do not vary too much between
subjects
• a bad choice of working correlation matrix can affect
the efficiency of β̂
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 205

11.4 The Working Correlation Matrix

Write
1/2 1/2
Vi(β, α) = φAi (β)Ri(α)Ai (β).

Variance function: Ai is (ni × ni) diagonal with


elements v(µij ), the known GLM variance function.

Working correlation: Ri(α), possibly dependent on a


different set of parameters α.

Overdispersion parameter: φ, assumed 1 or estimated


from the data.

The unknown quantities are expressed in terms of the


Pearson residuals
yij − µij
eij =  .
v(µij )
Note that eij depends on β.
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 206

11.4.1 Estimation of Working Correlation

Liang and Zeger (1986) proposed moment-based


estimates for the working correlation.

Some of the more popular ones:

• Independence:
Corr(Yij , Yik ) = 0 (j = k).
There are no parameters to be estimated.
• Exchangeable:
Corr(Yij , Yik ) = α (j = k).
1 N
 1 
α̂ = eij eik .
N i=1 ni (ni − 1) j=k

• AR(1):
Corr(Yij , Yi,j+t) = αt (t = 0, 1, . . . , ni − j).
1 N
 1 
α̂ = eij ei,j+1.
N i=1 ni − 1 j≤ni −1
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 207

• Unstructured:
Corr(Yij , Yik ) = αjk (j = k).
1 N

α̂jk = eij eik .
N i=1

The dispersion parameter is estimated by


1 N 1  ni
φ̂ = e2ij .
N i=1 ni j=1
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 208

11.5 Fitting GEE

The standard procedure, implemented in the SAS


procedure GENMOD.

1. Compute initial estimates for β, using a univariate


GLM (i.e., assuming independence).
2. • Compute Pearson residuals eij .
• Compute estimates for α.
• Compute Ri(α).
• Compute estimate for φ.
1/2 1/2
• Compute Vi(β, α) = φAi (β)Ri(α)Ai (β).
3. Update estimate for β:
 
N −1  N 

β (t+1) = β (t)−  
 DiT Vi−1Di


 
DiT Vi−1(y i − µi) .
i=1 i=1

4. Iterate 2.–3. until convergence.

Estimates of precision by means of I0−1 and I0−1I1I0−1.


CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 209

11.6 The NTP Data

• Marginal model: interest in effect of exposure to dose.


• Correlation structure:
– AR(1) and unstructured meaningless.
– Exchangeable or independence are sensible choices.
• Variables in data set:
– dose: between 0 and 1
– litter: indicator of cluster
– visceral (0=normal, 1=malformed)
– skeletal
– external
– collapsed outcome
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 210

11.6.1 PROC GENMOD Code

proc genmod data=m.dehp3;


title ’Visceral, Exchangeable Working Assumptions’;
class litter;
model visceral=dose / dist=bin;
repeated subject=litter / type=exch covb corrw modelse;
run;

proc genmod data=m.dehp3;


title ’Visceral, Independence Working Assumptions’;
class litter;
model visceral=dose / dist=bin;
repeated subject=litter / type=ind covb modelse;
run;
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 211

11.6.2 Discussion of Program

• GEE estimation is invoked by the REPEATED


statement.

• It is available since Version 6.12.

• The MODEL statement is classical (univariate GLM,


MIXED).
– The ‘dist=’ option specifies distribution and
default link.
– The ‘link=’ option specifies the link function.
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 212

• Useful options to the REPEATED statement:


– ‘type=’: specifies the correlation structure: exch,
ar(1), ind, un,. . .
– ‘covb’: requests covariance matrix of parameter
estimates
– ‘corrw’: requests working correlation matrix
– ‘modelse’: prints model based standard errors, in
addition to robust standard errors
– ‘obstats’: prints table containing response values,
predicted values, linear predictor, residuals for each
observation.
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 213

11.6.3 PROC GENMOD Output

Visceral, Exchangeable Working Assumptions

The GENMOD Procedure

Model Information

Description Value

Data Set M.DEHP3


Distribution BINOMIAL
Link Function LOGIT
Dependent Variable VISCERAL
Observations Used 1082
Number Of Events 72
Number Of Trials 1082

Parameter Information

Parameter Effect

PRM1 INTERCEPT
PRM2 DOSE

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 1080 407.5135 0.3773


Scaled Deviance 1080 407.5135 0.3773
Pearson Chi-Square 1080 1076.8015 0.9970
Scaled Pearson X2 1080 1076.8015 0.9970
Log Likelihood . -203.7567 .

Analysis Of Initial Parameter Estimates

Parameter DF Estimate Std Err ChiSquare Pr>Chi

INTERCEPT 1 -4.4692 0.2759 262.4482 0.0001


DOSE 1 4.4014 0.4280 105.7533 0.0001
SCALE 0 1.0000 0.0000 . .
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 214

NOTE: The scale parameter was held fixed.

GEE Model Information

Description Value

Correlation Structure Exchangeable


Subject Effect LITTER (108 levels)
Number of Clusters 108
Correlation Matrix Dimension 16
Maximum Cluster Size 16
Minimum Cluster Size 2

Covariance Matrix (Model-Based)


Covariances are Above the Diagonal and Correlations are Below

Parameter
Number PRM1 PRM2

PRM1 0.13427 -0.17695


PRM2 -0.88490 0.29780

Covariance Matrix (Empirical)


Covariances are Above the Diagonal and Correlations are Below

Parameter
Number PRM1 PRM2

PRM1 0.13445 -0.18714


PRM2 -0.86347 0.34936

Working Correlation Matrix

COL1 COL2 COL3 ...

ROW1 1.0000 0.0800 0.0800 ...


ROW2 0.0800 1.0000 0.0800 ...
ROW3 0.0800 0.0800 1.0000 ...

. . . . .
. . . . .
. . . . .
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 215

Analysis Of GEE Parameter Estimates


Empirical Standard Error Estimates

Empirical 95% Confidence Limits


Parameter Estimate Std Err Lower Upper Z Pr>|Z|

INTERCEPT -4.4977 0.3667 -5.2163 -3.7790 -12.27 0.0000


DOSE 4.5506 0.5911 3.3922 5.7091 7.6991 0.0000
Scale 0.9974 . . . . .

NOTE: The scale parameter for GEE estimation was computed as the
square root of the normalized Pearson’s chi-square

Analysis Of GEE Parameter Estimates


Model-Based Standard Error Estimates

Model 95% Confidence Limits


Parameter Estimate Std Err Lower Upper Z Pr>|Z|

INTERCEPT -4.4977 0.3664 -5.2158 -3.7795 -12.27 0.0000


DOSE 4.5506 0.5457 3.4811 5.6202 8.3389 0.0000
Scale 0.9974 . . . . .
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 216

Visceral, Independence Working Assumptions

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 1080 407.5135 0.3773


Scaled Deviance 1080 407.5135 0.3773
Pearson Chi-Square 1080 1076.8015 0.9970
Scaled Pearson X2 1080 1076.8015 0.9970
Log Likelihood . -203.7567 .

Analysis Of Initial Parameter Estimates

Parameter DF Estimate Std Err ChiSquare Pr>Chi

INTERCEPT 1 -4.4692 0.2759 262.4482 0.0001


DOSE 1 4.4014 0.4280 105.7533 0.0001
SCALE 0 1.0000 0.0000 . .

NOTE: The scale parameter was held fixed.

GEE Model Information

Description Value

Correlation Structure Independent


Subject Effect LITTER (108 levels)
Number of Clusters 108
Correlation Matrix Dimension 16
Maximum Cluster Size 16
Minimum Cluster Size 2

Covariance Matrix (Model-Based)


Covariances are Above the Diagonal and Correlations are Below

Parameter
Number PRM1 PRM2

PRM1 0.07616 -0.10315


PRM2 -0.87292 0.18333

Covariance Matrix (Empirical)


Covariances are Above the Diagonal and Correlations are Below
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 217

Parameter
Number PRM1 PRM2

PRM1 0.12993 -0.18001


PRM2 -0.85509 0.34107

Analysis Of GEE Parameter Estimates


Empirical Standard Error Estimates

Empirical 95% Confidence Limits


Parameter Estimate Std Err Lower Upper Z Pr>|Z|

INTERCEPT -4.4692 0.3605 -5.1757 -3.7627 -12.40 0.0000


DOSE 4.4014 0.5840 3.2568 5.5461 7.5365 0.0000
Scale 1.0004 . . . . .

NOTE: The scale parameter for GEE estimation was


computed as the square root of the normalized Pearson’s chi-square.

Analysis Of GEE Parameter Estimates


Model-Based Standard Error Estimates

Model 95% Confidence Limits


Parameter Estimate Std Err Lower Upper Z Pr>|Z|

INTERCEPT -4.4692 0.2760 -5.0101 -3.9283 -16.19 0.0000


DOSE 4.4014 0.4282 3.5622 5.2406 10.280 0.0000
Scale 1.0004 . . . . .
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 218

11.6.4 Discussion of Output

• Initial estimates are based on univariate GLM: logistic


regression.
• Initial and ‘independence’ estimates are the same.
• The ‘independence’ standard errors correct for
overdispersion.
• Compare:
– Exchangeable and Independence estimates and
standard errors.
– Model based and robust standard errors.
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 219

11.7 GEE: Alternative 1

• Classical approach:
– Estimating equation for β
– Moment-based estimation for α
– Liang and Zeger (1986)
– SAS PROC GENMOD
• Alternative approach GEE 1.5:
– Estimating equation for β
– Estimating equation for α
– Prentice (1988)
– SAS macro gee1corr.mac by Stuart Lipsitz
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 220

Form of Equations

N
DiT Vi−1(Y i − µi) = 0,


i=1
N
EiT Wi−1(Z i − δ i) = 0,


i=1
where
(Yij − µij )(Yik − µik )
Zijk =  ,
µij (1 − µij )µik (1 − µik )
δijk = E(Zijk )

√ joint asymptotic distribution of N (β̂ − β) and
The
N (α̂ − α) normal with variance-covariance matrix
consistently estimated by
   



A 0 





Λ11 Λ12   A B T 


N 









,
B C Λ21 Λ22 0 C
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 221

 −1
N

A =  DiT Vi−1 Di  ,
i=1
 −1   −1
N
 N
 N
∂Z i   
B =  EiT Wi−1 Ei   EiT Wi−1 DiT Vi−1 Di  ,
i=1 i=1 ∂β i=1
 −1
N

C =  EiT Wi−1 Ei  ,
i=1
N

Λ11 = DiT Vi−1 Cov(Y i )Vi−1 Di ,
i=1
N

Λ12 = DiT Vi−1 Cov(Y i , Z i )Wi−1 Ei ,
i=1

Λ21 = Λ12 ,
N

Λ22 = EiT Wi−1 Cov(Z i )Wi−1 Ei ,
i=1

and

Statistic Estimator
Var(Y i) (Y i − µi)(Y i − µi)T
Cov(Y i, Z i) (Y i − µi)(Z i − δ i)T
Var(Z i) (Z i − δ i)(Z i − δ i)T
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 222

Special Case: Exchangeability

• Then, δijk = ρi, the correlation between any two


outcomes of the same cluster i.
• Fisher’s z transform: α = ln(1 + ρ) − ln(1 − ρ).
• Define  



Yi1Yi2 


 
 
 
 Yi1Yi3 
Zi = 


 ..




 
 
 
 
 
Yi,ni−1Yini
Hence,

E(Zijk ) = µijk = ρ µij (1 − µij )µik (1 − µik ) + µij µik ,
Var(Zijk ) = µijk (1 − µijk ),
∂E(Zijk ) 2 exp(α) 
= µij (1 − µij )µik (1 − µik ),
∂α (exp(α) + 1)2
 2
2 exp(α)  1
C =  µ ij (1 − µ ij )µ ik (1 − µ ik ) .
(exp(α) + 1)2 µijk (1 − µijk )

• Standard error for ρ under exchangeability is obtained


by multiplying the standard error for α with
2 exp(α)/(exp(α) + 1)2.
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 223

11.7.1 gee1corr.mac Code

%include ’c:\sas\stat\sample\gee1corr.mac’;

%gee(data=m.dehp3,y=visceral,x=dose,id=litter,corr=exc);

%gee(data=m.dehp3,y=visceral,x=dose,id=litter,corr=ind);

It is essential to code the outcome as 0 and 1 !! :

• for gee1corr.mac
• for GLIMMIX macro
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 224

11.7.2 gee1corr.mac Output

Visceral, exchangeable assumption

Correlation Structure: Exchangeable

PARAMETER ESTIMATES with naive variance

VARIABLE ESTIMATE SE_EST Z P


INTERCEP -4.507824 0.3958657 -11.38726 0
DOSE 4.5890696 0.5845582 7.8504921 4.108E-15

PARAMETER ESTIMATES with robust variance

VARIABLE ESTIMATE SE_EST Z P


INTERCEP -4.507824 0.3685713 -12.23053 0
DOSE 4.5890696 0.5932811 7.735068 1.033E-14

CORR SECORR Z P
0.1100235 0.0455011 2.4180411 0.0156043
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 225

Visceral, independence assumption

Correlation Structure: Independence

PARAMETER ESTIMATES with naive variance

VARIABLE ESTIMATE SE_EST Z P


INTERCEP -4.469209 0.2758728 -16.20025 0
DOSE 4.401443 0.4280043 10.283642 0

PARAMETER ESTIMATES with robust variance

VARIABLE ESTIMATE SE_EST Z P


INTERCEP -4.469209 0.3604581 -12.39869 0
DOSE 4.401443 0.5840154 7.5365191 4.829E-14
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 226

11.8 GEE: Alternative 2

• Previous are formulated directly in terms of binary


outcomes
• This approach is based on a linearization

Write
y i = µi + εi
with
η i = g(µi),
η i = Xiβ,
Var(y i) = Var(εi) = Σi.
Here,

• η i is a vector of linear predictors,


• g(.) is the (vector) link function.
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 227

11.8.1 Estimation

Nelder and Wedderburn (1972)

Solve iteratively:
N N
Wiy ∗i ,
 
XiT WiXiβ =
i=1 i=1
where
Wi = DiΣ−1
i Di ,

y ∗i = η̂ i + (y i − µ̂i)Di−1,
∂µi
Di = ,
∂η i
Σi = Var(ε),
µi = E(y i).
Remarks:

• y ∗i is called “working variable” or “pseudo data”.


• Given the pseudo data, η̂ can be determined using
PROC MIXED.
• For linear models, Di = Ini and standard generalized
least squares follow.
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 228

11.8.2 The Variance Structure

The variance can be written as


1/2 1/2
Σi = φAi (β)Ri(α)Ai (β)
where

• φ is a scale (overdispersion) parameter,


• Ai = v(µi), expressing the mean-variance relation
(this is a function of β),
• Ri(α) describes the correlation structure:
– If independence is assumed then Ri(α) = Ini .
Equivalently, the scale parameter can be placed
along the diagonal.
– Other structures, such as compound symmetry,
AR(1),. . . can be assumed as well.
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 229

11.8.3 GLIMMIX Macro Code

%include ’c:\sas\stat\sample\glimmix.sas’;

%glimmix(
data=m.dehp3,
procopt=method=reml,
stmts=%str(
class litter;
model visceral=dose / solution;
repeated / subject=litter type=cs r;),
error=binomial,
link=logit,
title=’visceral, CS, Model Based’,
options=mixprintlast
);

%glimmix(
data=m.dehp3,
procopt=method=reml,
stmts=%str(
class litter;
model visceral=dose / solution;
repeated / subject=litter type=simple r;),
error=binomial,
link=logit,
title=’visceral, Independence, Model Based’,
options=mixprintlast
);
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 230

%glimmix(
data=m.dehp3,
procopt=method=reml empirical,
stmts=%str(
class litter;
model visceral=dose / solution;
repeated / subject=litter type=cs r;),
error=binomial,
link=logit,
title=’visceral, CS, Empirically Corrected’,
options=mixprintlast
);

%glimmix(
data=m.dehp3,
procopt=method=reml empirical,
stmts=%str(
class litter;
model visceral=dose / solution;
repeated / subject=litter type=simple r;),
error=binomial,
link=logit,
title=’visceral, Independence, Empirically Corrected’,
options=mixprintlast
);
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 231

11.8.4 Discussion of Macro

• Described in
– Littell et al (1996)
– glimmix.sas

• Intended for generalized linear mixed models

• Random-effects applications are postponed

• “PROC MIXED” part of the macro:


– PROCOPT defines the PROC MIXED statement
options
– STMTS is a string that contains the body of the
MIXED procedure

• “GLM” part of the macro:


– GLM specified by means of ERROR and LINK
arguments
– OPTIONS=MIXPRINTLAST produces the output
of the underlying PROC MIXED call, produced at
convergence

• Double iteration scheme ⇒ slow

• PROC NLMIXED provides an alternative


CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 232

11.8.5 GLIMMIX Output

The first output is for

• Visceral malformation
• Exchangeable correlation (CS)
• Model based standard errors (the standard in PROC
MIXED)

First, the MIXED output is produced.


We present a selection.

Visceral, CS, Model Based

The MIXED Procedure

REML Estimation Iteration History

Iteration Evaluations Objective Criterion

0 1 4843.2167338
1 2 4823.1918629 0.00000048
2 1 4823.1907084 0.00000000

Convergence criteria met.

R Matrix for LITTER 38


Weighted by _W

Row COL1 COL2 COL3 ...

1 91.60982268 7.00560953 7.00560953 ...


2 7.00560953 91.60982268 7.00560953 ...
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 233

3 7.00560953 7.00560953 91.60982268 ...

. . . . .
. . . . .
. . . . .

Covariance Parameter Estimates (REML)

Cov Parm Subject Estimate

CS LITTER 0.07639306
Residual 0.92257140

Model Fitting Information for _Z


Weighted by _W

Description Value

Observations 1082.000
Res Log Likelihood -3404.05
Akaike’s Information Criterion -3406.05
Schwarz’s Bayesian Criterion -3411.03
-2 Res Log Likelihood 6808.098
Null Model LRT Chi-Square 20.0260
Null Model LRT DF 1.0000
Null Model LRT P-Value 0.0000

Solution for Fixed Effects

Effect Estimate Std Error DF t Pr > |t|

INTERCEPT -4.49639983 0.36363994 106 -12.36 0.0001


DOSE 4.54563842 0.54219604 106 8.38 0.0001

Tests of Fixed Effects

Source NDF DDF Type III F Pr > F

DOSE 1 106 70.29 0.0001


CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 234

Next, the GLIMMIX macro produces its own output.

Visceral, CS, Model Based

Covariance Parameter Estimates

Cov
Parm Estimate

CS 0.07639306

GLIMMIX Model Statistics

Description Value

Deviance 407.7891
Scaled Deviance 442.0136
Pearson Chi-Square 1076.0335
Scaled Pearson Chi-Square 1166.3418
Extra-Dispersion Scale 0.9226

Parameter Estimates

Effect Estimate Std Error DF t Pr > |t|

INTERCEPT -4.4964 0.3636 106 -12.36 0.0001


DOSE 4.5456 0.5422 106 8.38 0.0001

Tests of Fixed Effects

Source NDF DDF Type III F Pr > F

DOSE 1 106 70.29 0.0001


CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 235

We now present a similar analysis for independence


working assumptions.

Visceral, Independence, Model Based

The MIXED Procedure

Covariance Parameter Estimates (REML)

Cov Parm Subject Estimate

DIAG LITTER 0.99703847

Model Fitting Information for _Z


Weighted by _W

Description Value

Observations 1082.000
Res Log Likelihood -3415.38
Akaike’s Information Criterion -3416.38
Schwarz’s Bayesian Criterion -3418.88
-2 Res Log Likelihood 6830.770
Null Model LRT Chi-Square 0.0000
Null Model LRT DF 0.0000
Null Model LRT P-Value 1.0000

Solution for Fixed Effects

Effect Estimate Std Error DF t Pr > |t|

INTERCEPT -4.46920901 0.27546399 106 -16.22 0.0001


DOSE 4.40144298 0.42737005 106 10.30 0.0001

Tests of Fixed Effects

Source NDF DDF Type III F Pr > F

DOSE 1 106 106.07 0.0001


CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 236

Visceral, Independence, Model Based

Covariance Parameter Estimates

Cov
Parm Estimate

DIAG 0.99703847

GLIMMIX Model Statistics

Description Value

Deviance 407.5135
Scaled Deviance 407.5135
Pearson Chi-Square 1076.8015
Scaled Pearson Chi-Square 1076.8015
Extra-Dispersion Scale 1.0000

Parameter Estimates

Effect Estimate Std Error DF t Pr > |t|

INTERCEPT -4.4692 0.2755 106 -16.22 0.0001


DOSE 4.4014 0.4274 106 10.30 0.0001

Tests of Fixed Effects

Source NDF DDF Type III F Pr > F

DOSE 1 106 106.07 0.0001


CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 237

Visceral, CS, Empirically Corrected

Covariance Parameter Estimates

Cov
Parm Estimate

CS 0.07639306

GLIMMIX Model Statistics

Description Value

Deviance 407.7891
Scaled Deviance 442.0136
Pearson Chi-Square 1076.0335
Scaled Pearson Chi-Square 1166.3418
Extra-Dispersion Scale 0.9226

Parameter Estimates

Effect Estimate Std Error DF t Pr > |t|

INTERCEPT -4.4964 0.3664 106 -12.27 0.0001


DOSE 4.5456 0.5908 106 7.69 0.0001

Tests of Fixed Effects

Source NDF DDF Type III F Pr > F

DOSE 1 106 59.20 0.0001


CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 238

Visceral, Independence, Empirically Corrected

Covariance Parameter Estimates

Cov
Parm Estimate

DIAG 0.99703847

GLIMMIX Model Statistics

Description Value

Deviance 407.5135
Scaled Deviance 407.5135
Pearson Chi-Square 1076.8015
Scaled Pearson Chi-Square 1076.8015
Extra-Dispersion Scale 1.0000

Parameter Estimates

Effect Estimate Std Error DF t Pr > |t|

INTERCEPT -4.4692 0.3605 106 -12.40 0.0001


DOSE 4.4014 0.5840 106 7.54 0.0001

Tests of Fixed Effects

Source NDF DDF Type III F Pr > F

DOSE 1 106 56.80 0.0001


CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 239

11.8.6 Discussion of Output

• GLIMMIX output copies parts from PROC MIXED


output:
– Parameter estimates, standard errors,. . .
– Covariance parameter and overdispersion scale:
∗ Compound symmetry:
· Cov. Par.: covariance between two
littermates (cf. random intercept variance)
· Extra-dispersion: residual variance
(“measurement error”)
∗ Independence:
· Cov. Par.: residual variance (“measurement
error”)
· Extra-dispersion: 1
– Tests of fixed effects
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 240

• GLIMMIX produces in addition model statistics


– deviance: treating all outcomes as if they were
independent
– scaled deviance=deviance/extra-dispersion
parameter
– They should not be trusted/used
• PROC MIXED output that should not be used:
– Model fitting information
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 241

11.9 Comparison of GEE Estimates


(Standard Errors)

GEE1 Estimates (Model Based Standard Errors; Robust Standard Errors) for
the DEHP Data. Exchangeable Working Assumptions.

Outcome Parameter GENMOD PRENTICE GLIMMIX (repeated)


External β0 -4.98(0.40;0.37) -4.99(0.46;0.37) -5.00(0.36;0.37)
βd 5.33(0.57;0.55) 5.32(0.65;0.55) 5.32(0.51;0.55)
φ 0.88 0.65
ρ 0.11 0.11(0.04) 0.06
Visceral β0 -4.50(0.37;0.37) -4.51(0.40;0.37) -4.50(0.36;0.37)
βd 4.55(0.55;0.59) 4.59(0.58;0.59) 4.55(0.54;0.59)
φ 1.00 0.92
ρ 0.08 0.11(0.05) 0.08
Skeletal β0 -4.83(0.44;0.45) -4.82(0.47;0.44) -4.82(0.46;0.45)
βd 4.84(0.62;0.63) 4.84(0.67;0.63) 4.84(0.65;0.63)
φ 0.98 0.86
ρ 0.12 0.14(0.06) 0.13
Collapsed β0 -4.05(0.32;0.31) -4.06(0.35;0.31) -4.04(0.33;0.31)
βd 5.84(0.57;0.61) 5.89(0.62;0.61) 5.82(0.58;0.61)
φ 1.00 0.96
ρ 0.11 0.15(0.05) 0.11
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 242

GEE1 Estimates (Model Based Standard Errors; Robust Standard Errors) for
the DEHP Data. Independence Working Assumptions.

Outcome Parameter GENMOD PRENTICE GLIMMIX (repeated)


External β0 -5.06(0.30;0.38) -5.06(0.33;0.38) -5.06(0.28;0.38)
βd 5.31(0.44;0.57) 5.31(0.48;0.57) 5.31(0.42;0.57)
φ 0.90 0.74
Visceral β0 -4.47(0.28;0.36) -4.47(0.28;0.36) -4.47(0.28;0.36)
βd 4.40(0.43;0.58) 4.40(0.43;0.58) 4.40(0.43;0.58)
φ 1.00 1.00
Skeletal β0 -4.87(0.31;0.47) -4.87(0.31;0.47) -4.87(0.32;0.47)
βd 4.89(0.46;0.65) 4.90(0.47;0.65) 4.90(0.47;0.65)
φ 0.99 1.02
Collapsed β0 -3.98(0.22;0.30) -3.98(0.22;0.30) -3.98(0.22;0.30)
βd 5.56(0.40;0.61) 5.56(0.40;0.61) 5.56(0.41;0.61)
φ 0.99 1.04
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 243

11.10 GEE2: Odds Ratios

Follows as a simplification from the likelihood methods


presented:

• with conditional working assumptions


mixed-marginal conditional model

• with marginal working assumptions


fully marginal odds ratio model

• Conditional working assumptions


Considering the quadratic exponential family
 
f (y i|Ψi) = exp ΨTi v i − A(Ψi) .
(Zhao and Prentice, 1990) yields a set of GEE2:
 
N  ∂µi 
 −1
S(β) = 


 Mi (v i − µi) = 0.
i=1 ∂β
– likelihood inference: assume correct specification
– GEE inference (+ robust variance estimator)

• Marginal working assumptions


– Set the higher order interactions to zero in the
fully marginal model.
– A similar set of GEE2 arises.
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 244

Remarks

• Also close to GEE1–Alternative 1 (Prentice 1988).


• Performance of both approaches is virtually identical.
• Orthogonality property is an advantage for conditional
assumptions.
• Both versions extend to categorical outcomes.
• Marginal outcomes + relevant part of association
modelled.
• Computations are stable and relatively fast.
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 245

11.11 GEE2: Correlations

• Start from the Bahadur model:


• Model:
– Marginal logit πij
– Pairwise correlations ρijk
• (Independence) working assumptions:
– Zero third order correlations
– Zero fourth order correlations
• Application to NTP data:
– Clustering and constant correlation:
logit(πi) = β0 + βddi,
 
 1 + ρi 
ln   = βa .
1 − ρi
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 246

11.11.1 The NTP Data

GEE2 Estimates (Standard Errors) for the Bahadur Model.

Outcome Parameter DEHP EG DYME


External β0 -4.98(0.37) -5.63(0.67) -7.45(0.73)
βd 5.29(0.55) 3.10(0.81) 8.15(0.83)
βa 0.15(0.05) 0.15(0.05) 0.13(0.05)
Visceral β0 -4.49(0.36) -7.50(1.05) -6.89(0.75)
βd 4.52(0.59) 4.37(1.14) 5.51(0.89)
βa 0.15(0.06) 0.02(0.02) 0.11(0.07)
Skeletal β0 -5.23(0.40) -4.05(0.33)
βd 5.35(0.60) 4.77(0.43)
βa 0.18(0.02) 0.30(0.03)
Collapsed β0 -5.23(0.40) -4.07(0.71) -5.75(0.48)
βd 5.35(0.60) 4.89(0.90) 8.82(0.91)
βa 0.18(0.02) 0.26(0.14) 0.18(0.12)
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 247

11.11.2 Discussion

• Association parameter is higher than in likelihood


version.

• The likelihood version is highly constrained:


For instance, the allowable range of βa for the external outcome
in the DEHP data is (−0.0164; 0.1610) when β0 and βd are fixed
at their MLE. This range excludes the MLE under a
beta-binomial model. It translates to (−0.0082; 0.0803) on the
correlation scale.

• Dose effect estimate is comparable.

• A GEE2 estimate is valid as soon as the second, third,


and fourth order joint probabilities are nonnegative.
The likelihood analysis requires all joint probabilities
to be nonnegative.

• GEE2: simple likelihood ratio tests are unavailable


Robust Wald and score tests can be used

• GEE2: joint probabilities are unavailable

• GEE2: more robust against misspecification of higher


order association
CHAPTER 11. GENERALIZED ESTIMATING EQUATIONS 248

11.12 Alternating Logistic Regression

When marginal odds ratios are used to model association,


α can be estimated using ALR, which is

• almost as efficient as GEE2


• almost as easy (computationally) than GEE1

(Another GEE1.5.)

Let µijk be as before and let αijk = ln(ψijk ) be the


marginal log odds ratio. Then
 
µij − µijk
logit Pr(Yij = 1|Yik = yik ) = αijk yik + ln 



1 − µij − µik + µijk

• αijk can be modelled in terms of predictors


• the second term is treated as an offset
• the estimating equations for β and α are solved in
turn, and the “alternating” between both sets is
repeated until convergence.
• this is needed because the offset clearly depends on β.

Diggle, Liang, and Zeger (1994)


Chapter 12

Case Study: Analgesic Trial

• GSA was dichotomized as follows:


GSABIN =





 1 if GSA ≤ 3 (Very Good to Moderate)




 0 otherwise
• GEE with UNstructured correlation structure:

249
CHAPTER 12. CASE STUDY: ANALGESIC TRIAL 250

proc genmod data=gsa;


ods listing exclude parameterestimates classlevels parminfo;
class patid timecls;
model gsabin = pca0 time|time / dist=b;
repeated subject=patid / type=un corrw within=timecls modelse;
run;

The GENMOD Procedure

Model Information

Data Set WORK.GSA


Distribution Binomial
Link Function Logit
Dependent Variable gsabin
Observations Used 1137
Probability Modeled Pr( gsabin = 1 )

Response Profile

Ordered Ordered
Level Value Count

1 0 206
2 1 931

Parameter Information

Parameter Effect

Prm1 Intercept
Prm2 pca0
Prm3 TIME
Prm4 TIME*TIME

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 1133 1064.1723 0.9393


Scaled Deviance 1133 1064.1723 0.9393
Pearson Chi-Square 1133 1136.8928 1.0034
Scaled Pearson X2 1133 1136.8928 1.0034
Log Likelihood -532.0862
CHAPTER 12. CASE STUDY: ANALGESIC TRIAL 251

Algorithm converged.

Analysis Of Initial Parameter Estimates

Standard Wald 95% Chi-


Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq

Intercept 1 2.8016 0.4902 1.8408 3.7624 32.66 <.0001


pca0 1 -0.2058 0.0864 -0.3751 -0.0365 5.67 0.0172
TIME 1 -0.7864 0.3874 -1.5456 -0.0271 4.12 0.0424
TIME*TIME 1 0.1774 0.0793 0.0219 0.3329 5.00 0.0254
Scale 0 1.0000 0.0000 1.0000 1.0000

NOTE: The scale parameter was held fixed.

GEE Model Information

Correlation Structure Unstructured


Within-Subject Effect timecls (4 levels)
Subject Effect PATID (395 levels)
Number of Clusters 395
Correlation Matrix Dimension 4
Maximum Cluster Size 4

GEE Model Information

Minimum Cluster Size 1

Algorithm converged.

Working Correlation Matrix

Col1 Col2 Col3 Col4

Row1 1.0000 0.1770 0.2481 0.2021


Row2 0.1770 1.0000 0.1811 0.1177
Row3 0.2481 0.1811 1.0000 0.4594
Row4 0.2021 0.1177 0.4594 1.0000
CHAPTER 12. CASE STUDY: ANALGESIC TRIAL 252

Analysis Of GEE Parameter Estimates


Empirical Standard Error Estimates

Standard 95% Confidence


Parameter Estimate Error Limits Z Pr > |Z|

Intercept 2.8731 0.4592 1.9730 3.7731 6.26 <.0001


pca0 -0.2278 0.0959 -0.4157 -0.0398 -2.37 0.0176
TIME -0.7779 0.3234 -1.4117 -0.1440 -2.41 0.0162
TIME*TIME 0.1670 0.0656 0.0384 0.2955 2.55 0.0109

Analysis Of GEE Parameter Estimates


Model-Based Standard Error Estimates

Standard 95% Confidence


Parameter Estimate Error Limits Z Pr > |Z|

Intercept 2.8731 0.4836 1.9252 3.8209 5.94 <.0001


pca0 -0.2278 0.1025 -0.4287 -0.0268 -2.22 0.0263
TIME -0.7779 0.3275 -1.4198 -0.1359 -2.37 0.0176
TIME*TIME 0.1670 0.0665 0.0366 0.2973 2.51 0.0121
Scale 1.0000 . . . . .
NOTE: The scale parameter was held fixed.
CHAPTER 12. CASE STUDY: ANALGESIC TRIAL 253

12.1 Comparison of GEE Estimates

• under different working correlation structures


• models without dropout patterns

Variable INDependence EXCHangeable AutoRegressive UNstructured


Intercept 2.80 (0.469; 0.490) 2.92 (0.463; 0.494) 2.94 (0.469; 0.488) 2.87 (0.459; 0.484)
Time -0.79 (0.341; 0.387) -0.83 (0.328; 0.343) -0.90 (0.334; 0.352) -0.78 (0.323; 0.328)
Time2 0.18 (0.070; 0.079) 0.18 (0.067; 0.070) 0.20 (0.069; 0.072) 0.17 (0.066; 0.067)
Basel. PCA -0.21 (0.095; 0.086) -0.23 (0.095; 0.103) -0.22 (0.095; 0.099) -0.23 (0.096; 0.103)
Parameter estimates and standard errors (empirical; model-based).

Estimated working correlation structures:

 IND   EXCH 

 1 0 0 0 

 1 0.219 0.219 0.219 
 
   
   



1 0 0 





1 0.219 0.219 


   
   



1 0 





1 0.219 


   
   
1 1

 AR   UN 
 1 0.247 0.061 0.015   1 0.177 0.248 0.202 
   
   
   



1 0.247 0.061 





1 0.181 0.178 


   
   



1 0.247 





1 0.459 


   
   
1 1
CHAPTER 12. CASE STUDY: ANALGESIC TRIAL 254

12.2 Comparison of GEE Estimates


• under different working correlation structures
• models with dropout patterns

Variable INDependence EXCHangeable AutoRegressive UNstructured


Pattern 1 2.22 (0.584; 0.613) 2.29 (0.577; 0.614) 2.29 (0.577; 0.602) 2.26 (0.582; 0.625)
Pattern 2 3.13 (0.550; 0.574) 3.10 (0.548; 0.564) 3.09 (0.555; 0.562) 3.09 (0.543; 0.549)
Time Patt. 1 -0.62 (0.333; 0.367) -0.67 (0.324; 0.338) -0.67 (0.324; 0.336) -0.65 (0.327; 0.349)
Time Patt. 2 -0.91 (0.416; 0.462) -0.87 (0.413; 0.414) -0.88 (0.416; 0.425) -0.86 (0.409; 0.398)
Time2 Patt. 2 0.18 (0.083; 0.092) 0.17 (0.082; 0.083) 0.17 (0.083; 0.085) 0.17 (0.081; 0.080)
Basel. PCA -0.16 (0.097; 0.090) -0.16 (0.096; 0.108) -0.16 (0.097; 0.103) -0.16 (0.098; 0.107)
Parameter estimates and standard errors (empirical; model-based).


IND  
EXCH 



1 0 0 0 


 1 0.219 0.219 0.219 




   
   



1 0 0 





1 0.219 0.219 


   
   




1 0 







1 0.219 



   

1  
1 


AR  
UN 
 1 0.235 0.055 0.013   1 0.143 0.288 0.228 
   
   
   
   



1 0.235 0.055 





1 0.220 0.098 


   
   




1 0.235 







1 0.443 



   

1  
1 
CHAPTER 12. CASE STUDY: ANALGESIC TRIAL 255

12.2.1 Fitted Model


CHAPTER 12. CASE STUDY: ANALGESIC TRIAL 256

12.3 Use of GLIMMIX

• Similar models can be fitted using SAS macro


GLIMMIX
• Its use should not be recommended in general,
though!
• Code (to fit UN structure):

%glimmix(data=gsa, procopt=%str(method=ml noclprint),


stmts=%str(
class patid timecls;
model gsabin = time|time pca0 / s;
repeated timecls / sub=patid type=un rcorr=3;
),
error=binomial);
CHAPTER 12. CASE STUDY: ANALGESIC TRIAL 257

12.3.1 Output

The Mixed Procedure

Model Information

Data Set WORK._DS


Dependent Variable _z
Weight Variable _w
Covariance Structure Unstructured
Subject Effect PATID
Estimation Method ML
Residual Variance Method None
Fixed Effects SE Method Model-Based
Degrees of Freedom Method Between-Within

Dimensions

Covariance Parameters 10
Columns in X 4
Columns in Z 0
Subjects 395
Max Obs Per Subject 4
Observations Used 1137
Observations Not Used 0
Total Observations 1137

Parameter Search

CovP1 CovP2 CovP3 CovP4 CovP5 CovP6 CovP7 CovP8

1.0059 0.1865 0.9974 0.2923 0.2196 0.9470 0.2752 0.1697

Parameter Search

CovP9 CovP10 Log Like -2 Log Like

0.4186 0.9273 -2635.7524 5271.5049

Iteration History

Iteration Evaluations -2 Log Like Criterion


CHAPTER 12. CASE STUDY: ANALGESIC TRIAL 258

1 1 5271.50486573 0.00000000

Convergence criteria met.

Estimated R Correlation Matrix


for PATID 3/Weighted by _w

Row Col1 Col2 Col3 Col4

1 1.0000 0.1862 0.2995 0.2849


2 0.1862 1.0000 0.2260 0.1764
3 0.2995 0.2260 1.0000 0.4467
4 0.2849 0.1764 0.4467 1.0000

Covariance Parameter Estimates

Standard Z
Cov Parm Subject Estimate Error Value Pr Z

UN(1,1) PATID 1.0059 0.07251 13.87 <.0001


UN(2,1) PATID 0.1865 0.06167 3.02 0.0025
UN(2,2) PATID 0.9974 0.08152 12.23 <.0001
UN(3,1) PATID 0.2923 0.07012 4.17 <.0001
UN(3,2) PATID 0.2196 0.07256 3.03 0.0025
UN(3,3) PATID 0.9470 0.08969 10.56 <.0001
UN(4,1) PATID 0.2752 0.07490 3.67 0.0002
UN(4,2) PATID 0.1697 0.07264 2.34 0.0195
UN(4,3) PATID 0.4186 0.07169 5.84 <.0001
UN(4,4) PATID 0.9273 0.09065 10.23 <.0001

Fit Statistics

Log Likelihood -2635.8


Akaike’s Information Criterion -2645.8
Schwarz’s Bayesian Criterion -2665.6
-2 Log Likelihood 5271.5

PARMS Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

10 0.00 1.0000
CHAPTER 12. CASE STUDY: ANALGESIC TRIAL 259

Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 2.8873 0.4848 393 5.96 <.0001


TIME -0.7774 0.3236 393 -2.40 0.0168
TIME*TIME 0.1646 0.06540 393 2.52 0.0123
pca0 -0.2314 0.1038 393 -2.23 0.0263

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

TIME 1 393 5.77 0.0168


TIME*TIME 1 393 6.33 0.0123
pca0 1 393 4.97 0.0263

GLIMMIX Model Statistics

Description Value

Deviance 1065.2602
Scaled Deviance 1065.2602
Pearson Chi-Square 1101.4964
Scaled Pearson Chi-Square 1101.4964
Extra-Dispersion Scale 1.0000
CHAPTER 12. CASE STUDY: ANALGESIC TRIAL 260

GEE1 Estimates with Standard Errors (Empirical;


Model-Based): Exchangeable Working Assumptions.

GENMOD PRENTICE GLIMMIX


Variable (repeated)

Intercept 2.918 (0.463; 0.494) 2.940 (0.463; 0.494) 2.942 (0.463; 0.488)
Time -0.833 (0.328; 0.343) -0.843 (0.326; 0.334) -0.843 (0.326; 0.330)
Time2 0.177 (0.067; 0.070) 0.178 (0.066; 0.068) 0.178 (0.066; 0.067)
Basel. PCA -0.226 (0.095; 0.103) -0.230 (0.095; 0.105) -0.230 (0.095; 0.104)
ρ 0.219 0.260 (0.048) 0.264 (0.037†)

Parameter estimates and standard errors (empirical; model-based).


† Standard error calculated by the delta method.
CHAPTER 12. CASE STUDY: ANALGESIC TRIAL 261

12.4 Alternating Logistic Regressions

• Consider odds ratio instead of correlation as a


measure of dependence
• Code to fit EXCHangeable structure (common odds
ratio):
proc genmod data=gsa;
where weight ne .;
class patid timecls;
model gsabin = time|time pca0 / dist=b;
repeated subject=patid / within=timecls logor=exch;
ods listing exclude classlevels parminfo parameterestimates;
run;
CHAPTER 12. CASE STUDY: ANALGESIC TRIAL 262

12.4.1 Output

The GENMOD Procedure

Model Information

Data Set WORK.GSA


Distribution Binomial
Link Function Logit
Dependent Variable gsabin
Observations Used 1137
Probability Modeled Pr( gsabin = 1 )

Response Profile

Ordered Ordered
Level Value Count

1 0 206
2 1 931

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 1133 1064.1723 0.9393


Scaled Deviance 1133 1064.1723 0.9393
Pearson Chi-Square 1133 1136.8928 1.0034
Scaled Pearson X2 1133 1136.8928 1.0034
Log Likelihood -532.0862

Algorithm converged.

GEE Model Information

Log Odds Ratio Structure Exchangeable


Within-Subject Effect timecls (4 levels)
Subject Effect PATID (395 levels)
Number of Clusters 395
Correlation Matrix Dimension 4
Maximum Cluster Size 4
Minimum Cluster Size 1
CHAPTER 12. CASE STUDY: ANALGESIC TRIAL 263

Algorithm converged.

Analysis Of GEE Parameter Estimates


Empirical Standard Error Estimates

Standard 95% Confidence


Parameter Estimate Error Limits Z Pr > |Z|

Intercept 2.9810 0.4621 2.0753 3.8866 6.45 <.0001


TIME -0.8689 0.3248 -1.5056 -0.2323 -2.67 0.0075
TIME*TIME 0.1830 0.0659 0.0539 0.3122 2.78 0.0055
pca0 -0.2352 0.0950 -0.4213 -0.0491 -2.48 0.0132
Alpha1 1.4307 0.2238 0.9921 1.8693 6.39 <.0001
CHAPTER 12. CASE STUDY: ANALGESIC TRIAL 264

• Code to fit full structure (specific odds ratio):


proc genmod data=gsa;
where weight ne .;
class patid timecls;
model gsabin = time|time pca0 / dist=b;
repeated subject=patid / within=timecls logor=fullclust;
ods listing exclude classlevels parminfo parameterestimates;
run;

• Output:
The GENMOD Procedure

Model Information

Data Set WORK.GSA


Distribution Binomial
Link Function Logit
Dependent Variable gsabin
Observations Used 1137
Probability Modeled Pr( gsabin = 1 )

Response Profile

Ordered Ordered
Level Value Count

1 0 206
2 1 931

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 1133 1064.1723 0.9393


Scaled Deviance 1133 1064.1723 0.9393
Pearson Chi-Square 1133 1136.8928 1.0034
Scaled Pearson X2 1133 1136.8928 1.0034
Log Likelihood -532.0862

Algorithm converged.
CHAPTER 12. CASE STUDY: ANALGESIC TRIAL 265

GEE Model Information

Log Odds Ratio Structure Fully Parameterized Clusters


Within-Subject Effect timecls (4 levels)
Subject Effect PATID (395 levels)
Number of Clusters 395
Correlation Matrix Dimension 4
Maximum Cluster Size 4
Minimum Cluster Size 1

Log Odds Ratio Parameter


Information

Parameter Group

Alpha1 (1, 2)
Alpha2 (1, 3)
Alpha3 (1, 4)
Alpha4 (2, 3)
Alpha5 (2, 4)
Alpha6 (3, 4)

Algorithm converged.

Analysis Of GEE Parameter Estimates


Empirical Standard Error Estimates

Standard 95% Confidence


Parameter Estimate Error Limits Z Pr > |Z|

Intercept 2.9219 0.4583 2.0237 3.8201 6.38 <.0001


TIME -0.7980 0.3207 -1.4266 -0.1694 -2.49 0.0128
TIME*TIME 0.1683 0.0648 0.0412 0.2953 2.60 0.0094
pca0 -0.2359 0.0960 -0.4241 -0.0478 -2.46 0.0140
Alpha1 1.1280 0.3278 0.4856 1.7705 3.44 0.0006
Alpha2 1.5631 0.3865 0.8056 2.3206 4.04 <.0001
Alpha3 1.6035 0.4192 0.7819 2.4251 3.83 0.0001
Alpha4 1.1864 0.3680 0.4652 1.9077 3.22 0.0013
Alpha5 0.9265 0.4218 0.0997 1.7533 2.20 0.0281
Alpha6 2.4387 0.4805 1.4970 3.3805 5.08 <.0001
Chapter 13

Random-Effects Models

• In a pure random-effects model, one assumes that the


responses Yi1, . . . , Yini are conditionally independent
given an unobserved vector of random variables bi.
• The realized value of bi represents properties of the
given subject which vary randomly between subjects.
• E(Yij |bi) = µij
• η(µij ) = xTij β + z Tij bi
• var(Yij |bi) = φv(µij )

266
CHAPTER 13. RANDOM-EFFECTS MODELS 267

13.1 The Marginal Likelihood

• To derive the marginal likelihood for this model, we


need to integrate over the assumed distribution of the
random effect vector b (which may be difficult unless
b has low dimension).
• Thus, the distribution of the vector Y i of responses
from a single subject is given by
 n

f (y i) = f (yij |bi)g(b)db
j=1

where g(.) denotes the joint p.d.f. of b.


• If both the conditional distribution of Y i given bi and
the marginal distribution of bi are normal, the
integrations can be performed analytically, and the
resulting model is an example of the Laird and Ware
(1982) model.
CHAPTER 13. RANDOM-EFFECTS MODELS 268

13.2 Numerical Integration

13.2.1 Adaptive Gaussian Quadrature

• Quadrature:
– Select abscissas
– Construct weighted sum of function over abscissas
• Adaptive Quadrature:
– Typical for random effects distribution
– Integral centered at EB estimate of ui
– Number of quadrature points selected in function
of desired accuracy
• Pinheiro and Bates (1995)
CHAPTER 13. RANDOM-EFFECTS MODELS 269

13.2.2 First Order Method

• For normal outcome density


• Conditional mean function is replaced by a first-order
Taylor series expansion
• For normal random effects, a closed form solution
then results
• Beal and Sheiner (1982, 1988)
• Sheiner and Beal (1985)
CHAPTER 13. RANDOM-EFFECTS MODELS 270

13.3 Estimation Methods

• In general, hierarchical generalized linear models are


obtained (Lee and Nelder 1996).
• A useful sub-class of models is obtained by assuming
that bi is normally distributed.
• Modern developments in Monte Carlo inference make
this class relatively tractable for higher dimensional bi.
• A useful tool is the EM algorithm
• Approximate inference methods have been developed:
– Pseudolikelihood: Wolfinger and O’Connell (1995)
– Laplace transform based: Breslow and Clayton
(1993)
• We study two approaches:
– The beta-binomial model: random effect on the
probability scale
– Generalized linear mixed models: random effects in
the linear predictor
CHAPTER 13. RANDOM-EFFECTS MODELS 271

13.4 Software

• SAS macro GLIMMIX


• SAS PROC NLMIXED
• MIXOR
• MLwiN
• WinBUGS
• ...

13.5 The Beta-binomial Model

• Useful for clustered data, e.g., NTP data, with


covariates at cluster level only.
Cluster i consists of:
– ni littermates
– zi of which are successes
– the dose level is di
• Assume each cluster has a random success probability
πi.
CHAPTER 13. RANDOM-EFFECTS MODELS 272

• Then, the intracluster correlation is assumed to arise


from natural heterogeneity in the success probability
across litters.
• In contrast, marginal models specify marginal means
and association separately.
• Still, the beta-binomial model is often considered to
be a marginal model.
CHAPTER 13. RANDOM-EFFECTS MODELS 273

• Building blocks:
The binomial part: conditional on the success
probability πi in cluster i, the responses
Y i1, . . . , Yini are independent with common
probability πi.
The beta part: the πi are drawn from a beta
distribution with mean π and variance δπ(1 − π)
• The marginal distribution of Zi is then beta-binomial
with
f (zi | πi , ρ)
B(πi (ρ−1 − 1) + zi, (1 − πi)(ρ−1 − 1) + (ni − zi))
=
B(πi(ρ−1 − 1), (1 − πi)(ρ−1 − 1))
where B(., .) denotes the beta function.
• The moments are:
– E(Zi) = niµi
– Var(Zi) = niµi(1 − µi)[1 + (ni − 1)δ]
Williams (1975)
CHAPTER 13. RANDOM-EFFECTS MODELS 274

13.5.1 The NTP Data

Maximum Likelihood Estimates (Standard Errors) for the Beta-Binomial Model.

Outcome Parameter DEHP EG DYME


External β0 -4.91(0.42) -5.32(0.71) -7.27(0.74)
βd 5.20(0.59) 2.78(0.81) 8.01(0.82)
βa 0.21(0.09) 0.28(0.14) 0.21(0.12)
Visceral β0 -4.38(0.36) -7.45(1.17) -6.21(0.83)
βd 4.42(0.54) 4.33(1.26) 4.94(0.90)
βa 0.22(0.09) 0.04(0.09) 0.45(0.21)
Skeletal β0 -4.88(0.44) -2.89(0.27) -5.15(0.47)
βd 4.92(0.63) 3.42(0.40) 6.99(0.71)
βa 0.27(0.11) 0.54(0.09) 0.61(0.14)
Collapsed β0 -3.83(0.31) -2.51(0.09) -5.42(0.45)
βd 5.59(0.56) 3.05(0.17) 8.29(0.79)
βa 0.32(0.10) 0.28(0.02) 0.33(0.10)
CHAPTER 13. RANDOM-EFFECTS MODELS 275

13.5.2 Discussion

• Parameters have the same interpretation for:


– Bahadur (MLE)
– Bahadur (GEE2)
– Beta-binomial
(A different interpretation applies for the conditional model.)

• Restrictions on the parameter space are


– most severe for Bahadur (MLE)
– intermediate for GEE
– least severe for beta-binomial (all positive
correlations allowed)
This is reflected in the estimates for βa
• Fitting beta-binomial model is relatively easy
CHAPTER 13. RANDOM-EFFECTS MODELS 276

13.6 Generalized Linear Mixed Models

• An extension of GEE1–Alternative 2.
• Random effects are included in the model.

Write
y i = µi + εi
with
η i = g(µi),
η i = Xiβ + Zibi,
Var(y i|bi) = Σi.
Here,

• η i is a vector of linear predictors, given random


effects,
• g(.) is the (vector) link function,
• bi satisfies the following moment assumptions:
– E(bi) = 0,
– Cov(bi) = G.
CHAPTER 13. RANDOM-EFFECTS MODELS 277

13.6.1 Quasi-likelihood Function

Recall: for the univariate exponential family:


 
−1
f (y|θi, φ) = exp φ [yθi − ψ(θi)] + c(y, φ)
with θi the natural parameter and ψ(.) a function
satisfying

• µi = ψ (θi)
• v(µi) = ψ (θi)

The quasi-likelihood function:


yiθi − ψ(θi)
Q(µi, yi) =
φ

∂Qi yi − µi
= .
∂µi φv(µi)
CHAPTER 13. RANDOM-EFFECTS MODELS 278

This framework extends beyond exponential families.

One only needs:

• the mean µi
• the mean function θ(µi)
• the variance function v(µi)
• the scale parameter φ
CHAPTER 13. RANDOM-EFFECTS MODELS 279

13.6.2 Quasi-likelihood For Generalized Linear


Mixed Models

• The previous quasi-likelihood, in vectorized form, is


used for y i|bi: the conditional quasi-likelihood.
• The quasi-likelihood for bi follows from the kernel of
the normal loglikelihood:
1 T
bi Gbi
2
• The joint quasi-likelihood becomes:
−1 −1 1 T −1
Q(µ, b|y) = (y Σ θ−(ψ ) Σ ψ )+ b G b
T 1/2 T 1/2
2
where all quantities have been vectorized or replaced
by block-diagonal matrices.
CHAPTER 13. RANDOM-EFFECTS MODELS 280

13.6.3 Estimation Algorithm

1. Obtain an initial estimate µ̂i.


2. Compute the pseudo data
y ∗i = η̂ i + (y − µ̂i)Di−1.
3. Fit a weighted linear mixed model with
• Data y ∗i
• Covariate matrices Xi and Zi
• Weight matrix Pi = A−1 −2
i Di

This yields Σ̂∗i and Ĝ.


4. Estimate φ by
1 
N
φ̂ = ∗ r Ti V̂i−1r i,
N i=1
where
• N ∗ is N for ML and N − p for REML,
−1/2 −1/2
• Vi = Pi Σ∗i Pi + Zi G∗ ZiT
N T −1 −1 T −1
• r i = y ∗i − Xi ( i=1 Xi Vi Xi ) Xi Vi y i

5. Determine Σ̂i = φ̂Σ̂∗i and Ĝ = φ̂Ĝ∗


6. Solve the mixed model equations.
They will be written in vectorized form, by stacking y i, Xi, and
bi, and constructing block-diagonal matrices W , Z, and G:
    



T
X WX T
X WZ 


 β  
 X Wy
T 



 −1








= 

 ∗



,
Z WX Z WZ + G
T T
b Z Wy
T
CHAPTER 13. RANDOM-EFFECTS MODELS 281

where
Wi = DiΣ−1i Di ,
∂µi
Di = ,
∂η i
Σi = Var(ε),
and W , D, and Σ are block-diagonal matrices built
from Wi, Di, and Σi respectively.
The estimates are:
β̂ = (X T V̂ −1X)−1X T V̂ −1y ∗,
b̂ = Ĝ∗Z T V̂ −1r̂.

7. Compute
µ̂i = g −1(Xiβ̂ + Zib̂i).
8. Iterate until convergence.
CHAPTER 13. RANDOM-EFFECTS MODELS 282

13.7 Linear Mixed Model Using


GLIMMIX

For continuous outcome birthweight in NTP:

• linear mixed model with random intercept (PROC


MIXED);
• equivalent GLIMMIX coding

data help;
set m.dehp2;
dose=dose/292;
collaps = ((visceral-1) or (skeletal-1) or (external-1));
if visceral=. then delete;
skeletal=skeletal-1;
visceral=visceral-1;
external=external-1;
run;

proc mixed data=help method=reml;


title ’PROC MIXED, dehp2, weight, random intercept’;
class litter;
id litter dose;
model weight=dose / solution predmeans predicted;
make ’Predicted’ out=m.predwt noprint;
make ’PredMeans’ out=m.prmwt noprint;
make ’SolutionR’ out=m.solrwt noprint;
random intercept / subject=litter solution;
run;
CHAPTER 13. RANDOM-EFFECTS MODELS 283

%include ’c:\sas\stat\sample\glimmix.sas’;

%glimmix(
data=help,
procopt=method=reml,
stmts=%str(
class litter;
id litter dose;
model weight=dose / solution predmeans;
random intercept / subject=litter solution;),
error=normal,
link=identity,
title=’GLIMMIX, dehp2, weight, random intercept’,
options=mixprintlast
);

proc print data=m.prmwt;


title ’Predicted Means’;
var dose litter _pred_;
proc print data=m.predwt;
title ’Predicted Values’;
var dose litter _pred_;
run;
CHAPTER 13. RANDOM-EFFECTS MODELS 284

13.7.1 Selected Output

Output from PROC MIXED call

PROC MIXED, dehp2, weight, random intercept

The MIXED Procedure

Covariance Parameter Estimates (REML)

Cov Parm Subject Estimate

INTERCEPT LITTER 0.00591700


Residual 0.00714764

Model Fitting Information for WEIGHT

Description Value

Observations 1082.000
Res Log Likelihood 1014.980
Akaike’s Information Criterion 1012.980
Schwarz’s Bayesian Criterion 1007.995
-2 Res Log Likelihood -2029.96

Solution for Fixed Effects

Effect Estimate Std Error DF t Pr > |t|

INTERCEPT 0.96651850 0.01105760 106 87.41 0.0001


DOSE -0.13117707 0.02680668 974 -4.89 0.0001

Solution for Random Effects

Effect LITTER Estimate SE Pred DF t Pr > |t|

INTERCEPT 38 -0.12859215 0.02820009 974 -4.56 0.0001


INTERCEPT 39 -0.02065678 0.02642638 974 -0.78 0.4346
INTERCEPT 40 -0.06864851 0.03027538 974 -2.27 0.0236
INTERCEPT 49 -0.01571926 0.02336063 974 -0.67 0.5012
INTERCEPT 50 0.02420205 0.02947096 974 0.82 0.4117
CHAPTER 13. RANDOM-EFFECTS MODELS 285

...
INTERCEPT 203 0.02616233 0.02521825 974 1.04 0.2998

Tests of Fixed Effects

Source NDF DDF Type III F Pr > F

DOSE 1 974 23.95 0.0001

Predicted Means

LITTER DOSE WEIGHT Predicted SE Pred L95 U95 Residual

38 0 0.8310 0.9665 0.0111 0.9448 0.9882 -0.1355


38 0 0.6800 0.9665 0.0111 0.9448 0.9882 -0.2865
38 0 0.9260 0.9665 0.0111 0.9448 0.9882 -0.0405
38 0 0.8800 0.9665 0.0111 0.9448 0.9882 -0.0865
38 0 0.8840 0.9665 0.0111 0.9448 0.9882 -0.0825
38 0 0.8460 0.9665 0.0111 0.9448 0.9882 -0.1205
38 0 0.8170 0.9665 0.0111 0.9448 0.9882 -0.1495
38 0 0.8400 0.9665 0.0111 0.9448 0.9882 -0.1265
38 0 0.6820 0.9665 0.0111 0.9448 0.9882 -0.2845
49 0 0.9890 0.9665 0.0111 0.9448 0.9882 0.0225
49 0 0.8980 0.9665 0.0111 0.9448 0.9882 -0.0685
49 0 0.9450 0.9665 0.0111 0.9448 0.9882 -0.0215
49 0 0.8990 0.9665 0.0111 0.9448 0.9882 -0.0675
49 0 0.9330 0.9665 0.0111 0.9448 0.9882 -0.0335
49 0 0.8420 0.9665 0.0111 0.9448 0.9882 -0.1245
49 0 0.8960 0.9665 0.0111 0.9448 0.9882 -0.0705
49 0 1.0060 0.9665 0.0111 0.9448 0.9882 0.0395
49 0 1.1150 0.9665 0.0111 0.9448 0.9882 0.1485
49 0 1.0070 0.9665 0.0111 0.9448 0.9882 0.0405
49 0 0.9580 0.9665 0.0111 0.9448 0.9882 -0.0085
49 0 0.9990 0.9665 0.0111 0.9448 0.9882 0.0325
49 0 0.9090 0.9665 0.0111 0.9448 0.9882 -0.0575
49 0 0.8480 0.9665 0.0111 0.9448 0.9882 -0.1185
49 0 0.9990 0.9665 0.0111 0.9448 0.9882 0.0325
...
39 0.15 0.9610 0.9468 0.0087 0.9296 0.9639 0.0142
39 0.15 0.9210 0.9468 0.0087 0.9296 0.9639 -0.0258
...
60 0.15 0.9560 0.9468 0.0087 0.9296 0.9639 0.0092
60 0.15 0.6240 0.9468 0.0087 0.9296 0.9639 -0.3228
...
40 0.31 0.8600 0.9256 0.0079 0.9101 0.9412 -0.0656
CHAPTER 13. RANDOM-EFFECTS MODELS 286

40 0.31 0.5590 0.9256 0.0079 0.9101 0.9412 -0.3666


...
52 0.31 1.0350 0.9256 0.0079 0.9101 0.9412 0.1094
52 0.31 0.9490 0.9256 0.0079 0.9101 0.9412 0.0234
...
53 0.65 0.7510 0.8807 0.0126 0.8560 0.9054 -0.1297
53 0.65 0.9020 0.8807 0.0126 0.8560 0.9054 0.0213
...
53 0.65 0.8750 0.8807 0.0126 0.8560 0.9054 -0.0057
53 0.65 0.9640 0.8807 0.0126 0.8560 0.9054 0.0833
...
70 0.65 0.8210 0.8807 0.0126 0.8560 0.9054 -0.0597
70 0.65 0.9870 0.8807 0.0126 0.8560 0.9054 0.1063
...
57 1 0.8890 0.8353 0.0207 0.7948 0.8759 0.0537
57 1 0.8940 0.8353 0.0207 0.7948 0.8759 0.0587
...
66 1 0.7060 0.8353 0.0207 0.7948 0.8759 -0.1293
66 1 0.7980 0.8353 0.0207 0.7948 0.8759 -0.0373
...
CHAPTER 13. RANDOM-EFFECTS MODELS 287

Predicted Values

LITTER DOSE WEIGHT Predicted SE Pred L95 U95 Residual

38 0 0.8310 0.8379 0.0265 0.7859 0.8899 -0.0069


38 0 0.6800 0.8379 0.0265 0.7859 0.8899 -0.1579
38 0 0.9260 0.8379 0.0265 0.7859 0.8899 0.0881
38 0 0.8800 0.8379 0.0265 0.7859 0.8899 0.0421
38 0 0.8840 0.8379 0.0265 0.7859 0.8899 0.0461
38 0 0.8460 0.8379 0.0265 0.7859 0.8899 0.0081
38 0 0.8170 0.8379 0.0265 0.7859 0.8899 -0.0209
38 0 0.8400 0.8379 0.0265 0.7859 0.8899 0.0021
38 0 0.6820 0.8379 0.0265 0.7859 0.8899 -0.1559
49 0 0.9890 0.9508 0.0210 0.9096 0.9920 0.0382
49 0 0.8980 0.9508 0.0210 0.9096 0.9920 -0.0528
49 0 0.9450 0.9508 0.0210 0.9096 0.9920 -0.0058
49 0 0.8990 0.9508 0.0210 0.9096 0.9920 -0.0518
49 0 0.9330 0.9508 0.0210 0.9096 0.9920 -0.0178
49 0 0.8420 0.9508 0.0210 0.9096 0.9920 -0.1088
49 0 0.8960 0.9508 0.0210 0.9096 0.9920 -0.0548
49 0 1.0060 0.9508 0.0210 0.9096 0.9920 0.0552
49 0 1.1150 0.9508 0.0210 0.9096 0.9920 0.1642
49 0 1.0070 0.9508 0.0210 0.9096 0.9920 0.0562
49 0 0.9580 0.9508 0.0210 0.9096 0.9920 0.0072
49 0 0.9990 0.9508 0.0210 0.9096 0.9920 0.0482
49 0 0.9090 0.9508 0.0210 0.9096 0.9920 -0.0418
49 0 0.8480 0.9508 0.0210 0.9096 0.9920 -0.1028
49 0 0.9990 0.9508 0.0210 0.9096 0.9920 0.0482
...
39 0.15 0.9610 0.9261 0.0253 0.8765 0.9757 0.0349
39 0.15 0.9210 0.9261 0.0253 0.8765 0.9757 -0.0051
...
60 0.15 0.9560 0.9259 0.0217 0.8834 0.9685 0.0301
60 0.15 0.6240 0.9259 0.0217 0.8834 0.9685 -0.3019
...
40 0.31 0.8600 0.8570 0.0295 0.7990 0.9149 0.0030
40 0.31 0.5590 0.8570 0.0295 0.7990 0.9149 -0.2980
...
52 0.31 1.0350 0.9257 0.0233 0.8800 0.9713 0.1093
52 0.31 0.9490 0.9257 0.0233 0.8800 0.9713 0.0233
...
53 0.65 0.7510 0.9122 0.0233 0.8665 0.9579 -0.1612
53 0.65 0.9020 0.9122 0.0233 0.8665 0.9579 -0.0102
...
70 0.65 0.8210 0.8431 0.0279 0.7883 0.8978 -0.0221
CHAPTER 13. RANDOM-EFFECTS MODELS 288

70 0.65 0.9870 0.8431 0.0279 0.7883 0.8978 0.1439


...
57 1 0.8890 0.7983 0.0266 0.7462 0.8505 0.0907
57 1 0.8940 0.7983 0.0266 0.7462 0.8505 0.0957
...
66 1 0.7060 0.7769 0.0342 0.7099 0.8440 -0.0709
66 1 0.7980 0.7769 0.0342 0.7099 0.8440 0.0211
...
CHAPTER 13. RANDOM-EFFECTS MODELS 289

Output from GLIMMIX call

GLIMMIX, dehp2, weight, random intercept

The MIXED Procedure

Covariance Parameter Estimates (REML)

Cov Parm Subject Estimate

INTERCEPT LITTER 0.00591700


Residual 0.00714764

Model Fitting Information for _Z


Weighted by _W

Description Value

Observations 1082.000
Res Log Likelihood 1014.980
Akaike’s Information Criterion 1012.980
Schwarz’s Bayesian Criterion 1007.995
-2 Res Log Likelihood -2029.96

Solution for Fixed Effects

Effect Estimate Std Error DF t Pr > |t|

INTERCEPT 0.96651850 0.01105760 106 87.41 0.0001


DOSE -0.13117707 0.02680668 974 -4.89 0.0001

Tests of Fixed Effects

Source NDF DDF Type III F Pr > F

DOSE 1 974 23.95 0.0001


CHAPTER 13. RANDOM-EFFECTS MODELS 290

Covariance Parameter Estimates (MIVQUE0)

Cov Parm Subject Estimate

INTERCEPT LITTER 0.00552982


Residual 0.00726150

Model Fitting Information for _Z


Weighted by _W

Description Value

Observations 1082.000
Res Log Likelihood 1014.834
Akaike’s Information Criterion 1012.834
Schwarz’s Bayesian Criterion 1007.849
-2 Res Log Likelihood -2029.67

Solution for Fixed Effects

Effect Estimate Std Error DF t Pr > |t|

INTERCEPT 0.96648041 0.01074208 106 89.97 0.0001


DOSE -0.13088310 0.02608864 974 -5.02 0.0001

Tests of Fixed Effects

Source NDF DDF Type III F Pr > F

DOSE 1 974 25.17 0.0001


CHAPTER 13. RANDOM-EFFECTS MODELS 291

GLIMMIX, dehp2, weight, random intercept

Covariance Parameter Estimates

Cov Parm Estimate

INTERCEPT 0.00552982

GLIMMIX Model Statistics

Extra-Dispersion Scale 0.0073

Parameter Estimates

Effect Estimate Std Error DF t Pr > |t|

INTERCEPT 0.9665 0.0107 106 89.97 0.0001


DOSE -0.1309 0.0261 974 -5.02 0.0001

Random Effects Estimates

Effect LITTER Estimate SE Pred DF t Pr > |t|

INTERCEPT 38 -0.1272 0.0281 974 -4.52 0.0001


INTERCEPT 39 -0.0205 0.0264 974 -0.77 0.4386
INTERCEPT 40 -0.0678 0.0303 974 -2.24 0.0252
INTERCEPT 49 -0.0156 0.0233 974 -0.67 0.5037
INTERCEPT 50 0.0240 0.0294 974 0.81 0.4154
...
INTERCEPT 203 0.0258 0.0251 974 1.03 0.3042

Tests of Fixed Effects

Source NDF DDF Type III F Pr > F

DOSE 1 974 25.17 0.0001


CHAPTER 13. RANDOM-EFFECTS MODELS 292

Predicted Means

OBS DOSE LITTER _PRED_

1 0 38 0.9665
10 0 49 0.9665

331 0.15068 39 0.9468


341 0.15068 60 0.9468

619 0.31164 40 0.9257


626 0.31164 52 0.9257

1033 1.00000 57 0.8356


1042 1.00000 66 0.8356
CHAPTER 13. RANDOM-EFFECTS MODELS 293

13.7.2 Discussion

Let us combine

• dose level
• predicted mean
• random intercept
• predicted value

LITTER DOSE MEAN R.INT. PRED.

38.0000 0.0000 0.9665 -0.1286 0.8379


49.0000 0.0000 0.9665 -0.0157 0.9508
50.0000 0.0000 0.9665 0.0242 0.9907
61.0000 0.0000 0.9665 -0.0391 0.9274
62.0000 0.0000 0.9665 -0.1608 0.8057
73.0000 0.0000 0.9665 -0.0963 0.8703
85.0000 0.0000 0.9665 -0.0603 0.9062
86.0000 0.0000 0.9665 -0.0083 0.9582
97.0000 0.0000 0.9665 -0.0370 0.9295
98.0000 0.0000 0.9665 0.0518 1.0183
109.0000 0.0000 0.9665 0.0041 0.9706
110.0000 0.0000 0.9665 0.1109 1.0775
119.0000 0.0000 0.9665 -0.0717 0.8948
120.0000 0.0000 0.9665 0.0694 1.0359
129.0000 0.0000 0.9665 -0.0575 0.9090
130.0000 0.0000 0.9665 -0.0003 0.9663
139.0000 0.0000 0.9665 0.0704 1.0369
140.0000 0.0000 0.9665 0.0366 1.0031
149.0000 0.0000 0.9665 0.0001 0.9666
150.0000 0.0000 0.9665 0.0532 1.0197
159.0000 0.0000 0.9665 -0.0094 0.9571
160.0000 0.0000 0.9665 -0.0940 0.8725
169.0000 0.0000 0.9665 0.0010 0.9675
170.0000 0.0000 0.9665 -0.0828 0.8837
CHAPTER 13. RANDOM-EFFECTS MODELS 294

179.0000 0.0000 0.9665 -0.1467 0.8199


180.0000 0.0000 0.9665 0.0786 1.0451
189.0000 0.0000 0.9665 -0.0453 0.9212
190.0000 0.0000 0.9665 0.0428 1.0094
199.0000 0.0000 0.9665 -0.0407 0.9258
200.0000 0.0000 0.9665 0.0172 0.9837
39.0000 0.1507 0.9468 -0.0207 0.9261
60.0000 0.1507 0.9468 -0.0208 0.9259
63.0000 0.1507 0.9468 0.0376 0.9843
72.0000 0.1507 0.9468 -0.0589 0.8879
75.0000 0.1507 0.9468 -0.0873 0.8595
84.0000 0.1507 0.9468 0.0570 1.0037
87.0000 0.1507 0.9468 0.0515 0.9982
96.0000 0.1507 0.9468 -0.0296 0.9172
99.0000 0.1507 0.9468 0.0330 0.9797
108.0000 0.1507 0.9468 0.1777 1.1244
118.0000 0.1507 0.9468 -0.0960 0.8507
121.0000 0.1507 0.9468 -0.0287 0.9181
128.0000 0.1507 0.9468 -0.0060 0.9408
131.0000 0.1507 0.9468 0.0903 1.0371
138.0000 0.1507 0.9468 -0.0405 0.9063
141.0000 0.1507 0.9468 0.0231 0.9698
148.0000 0.1507 0.9468 -0.0264 0.9203
151.0000 0.1507 0.9468 0.0170 0.9638
158.0000 0.1507 0.9468 0.0549 1.0016
161.0000 0.1507 0.9468 0.0053 0.9520
171.0000 0.1507 0.9468 0.1582 1.1050
181.0000 0.1507 0.9468 0.0845 1.0313
188.0000 0.1507 0.9468 0.0293 0.9761
191.0000 0.1507 0.9468 0.0837 1.0304
198.0000 0.1507 0.9468 -0.0184 0.9284
201.0000 0.1507 0.9468 0.0104 0.9572
40.0000 0.3116 0.9256 -0.0686 0.8570
52.0000 0.3116 0.9256 0.0000 0.9257
64.0000 0.3116 0.9256 -0.0533 0.8724
71.0000 0.3116 0.9256 -0.0239 0.9018
76.0000 0.3116 0.9256 -0.0002 0.9254
83.0000 0.3116 0.9256 0.0861 1.0117
88.0000 0.3116 0.9256 -0.1606 0.7650
100.0000 0.3116 0.9256 -0.1162 0.8094
107.0000 0.3116 0.9256 -0.0496 0.8760
112.0000 0.3116 0.9256 -0.0135 0.9121
117.0000 0.3116 0.9256 -0.0415 0.8841
122.0000 0.3116 0.9256 -0.0326 0.8930
127.0000 0.3116 0.9256 -0.0242 0.9015
CHAPTER 13. RANDOM-EFFECTS MODELS 295

137.0000 0.3116 0.9256 0.0210 0.9467


142.0000 0.3116 0.9256 0.0043 0.9299
147.0000 0.3116 0.9256 0.0126 0.9383
152.0000 0.3116 0.9256 -0.0416 0.8841
157.0000 0.3116 0.9256 -0.0043 0.9213
162.0000 0.3116 0.9256 0.0523 0.9780
167.0000 0.3116 0.9256 0.1476 1.0732
172.0000 0.3116 0.9256 0.0184 0.9441
177.0000 0.3116 0.9256 -0.0386 0.8871
182.0000 0.3116 0.9256 0.0799 1.0056
187.0000 0.3116 0.9256 0.0070 0.9326
192.0000 0.3116 0.9256 0.2110 1.1367
197.0000 0.3116 0.9256 0.0494 0.9750
53.0000 0.6541 0.8807 0.0315 0.9122
70.0000 0.6541 0.8807 -0.0377 0.8431
82.0000 0.6541 0.8807 -0.0335 0.8472
89.0000 0.6541 0.8807 -0.1496 0.7311
94.0000 0.6541 0.8807 -0.0622 0.8185
101.0000 0.6541 0.8807 -0.0763 0.8044
106.0000 0.6541 0.8807 0.0702 0.9509
126.0000 0.6541 0.8807 0.0938 0.9745
136.0000 0.6541 0.8807 0.1445 1.0252
153.0000 0.6541 0.8807 0.0522 0.9330
163.0000 0.6541 0.8807 0.0734 0.9542
166.0000 0.6541 0.8807 0.0453 0.9260
173.0000 0.6541 0.8807 0.1232 1.0039
186.0000 0.6541 0.8807 -0.0381 0.8426
193.0000 0.6541 0.8807 -0.0065 0.8742
196.0000 0.6541 0.8807 0.0668 0.9475
203.0000 0.6541 0.8807 0.0262 0.9069
57.0000 1.0000 0.8353 -0.0370 0.7983
66.0000 1.0000 0.8353 -0.0584 0.7769
78.0000 1.0000 0.8353 -0.0890 0.7464
90.0000 1.0000 0.8353 -0.1149 0.7204
93.0000 1.0000 0.8353 -0.0192 0.8161
105.0000 1.0000 0.8353 -0.0376 0.7977
124.0000 1.0000 0.8353 0.0358 0.8711
154.0000 1.0000 0.8353 0.0907 0.9260
184.0000 1.0000 0.8353 -0.0606 0.7747
CHAPTER 13. RANDOM-EFFECTS MODELS 296

Remarks

• mean of random intercepts is zero


• mean of average weights over litters is 0.9275
• mean of predicted weights over litters is 0.9275
• consider histogram of random intercepts
CHAPTER 13. RANDOM-EFFECTS MODELS 297
CHAPTER 13. RANDOM-EFFECTS MODELS 298

13.7.3 GLIMMIX Code

We will now apply the same procedure to the binary


outcome VISCERAL.

%include ’c:\sas\stat\sample\glimmix.sas’;

%glimmix(
data=m.dehp3,
procopt=method=reml,
stmts=%str(
class litter;
id litter dose;
model visceral=dose / solution predmeans predicted;
random intercept / subject=litter solution;
repeated / subject=litter type=simple;
),
error=binomial,
link=logit,
maxit=100,
options=mixprintlast,
title=’GLIMMIX, dehp3, visceral, random intercept and SIMPLE’
);
CHAPTER 13. RANDOM-EFFECTS MODELS 299

The following syntax DOES NOT WORK PROPERLY:

%glimmix(
data=m.dehp3,
procopt=method=reml,
stmts=%str(
class litter;
id litter dose;
model visceral=dose / solution predmeans predicted;
make ’Predicted’ out=m.predvisc noprint; <------------
make ’PredMeans’ out=m.prmvisc noprint;
make ’SolutionR’ out=m.solrvisc noprint;
random intercept / subject=litter solution;
repeated / subject=litter type=simple;
),
error=binomial,
link=logit,
maxit=100,
options=mixprintlast,
title=’GLIMMIX, dehp3, visceral, random intercept and SIMPLE’
);

• Predicted values are used internally by GLIMMIX


• Datasets can be obtained by copying from
SASWORKS
CHAPTER 13. RANDOM-EFFECTS MODELS 300

13.7.4 Output From GLIMMIX

GLIMMIX, dehp3, visceral, random intercept and SIMPLE

The MIXED Procedure

Covariance Parameter Estimates (REML)

Cov Parm Subject Estimate

INTERCEPT LITTER 2.73470639


DIAG LITTER 0.34412800

Model Fitting Information for _Z


Weighted by _W

Description Value

Observations 1082.000
Res Log Likelihood -3355.59
Akaike’s Information Criterion -3357.59
Schwarz’s Bayesian Criterion -3362.57
-2 Res Log Likelihood 6711.179

Solution for Fixed Effects

Effect Estimate Std Error DF t Pr > |t|

INTERCEPT -5.60186511 0.37338672 106 -15.00 0.0001


DOSE 5.78288569 0.70050956 974 8.26 0.0001

Tests of Fixed Effects

Source NDF DDF Type III F Pr > F

DOSE 1 974 68.15 0.0001


CHAPTER 13. RANDOM-EFFECTS MODELS 301

Predicted Means

VISCERAL LITTER DOSE _Z Predicted

0 38 0 -6.8176 -5.6019
0 49 0 -6.9235 -5.6019

0 39 0.17 -6.1193 -4.6381


0 60 0.17 -6.2357 -4.6381

0 40 0.33 -5.3822 -3.6742


0 52 0.33 -3.6386 -3.6742

0 57 1 -2.4654 0.1810
1 66 1 2.0691 0.1810
CHAPTER 13. RANDOM-EFFECTS MODELS 302

GLIMMIX, dehp3, visceral, random intercept and SIMPLE

Covariance Parameter Estimates

Cov Parm Estimate

INTERCEPT 2.73470639
DIAG 0.34412800

GLIMMIX Model Statistics

Description Value

Deviance 282.3032
Scaled Deviance 282.3032
Pearson Chi-Square 353.0951
Scaled Pearson Chi-Square 353.0951
Extra-Dispersion Scale 1.0000

Parameter Estimates

Effect Estimate Std Error DF t Pr > |t|

INTERCEPT -5.6019 0.3734 106 -15.00 0.0001


DOSE 5.7829 0.7005 974 8.26 0.0001

Random Effects Estimates

ffect LITTER Estimate SE Pred DF t Pr > |t|

INTERCEPT 38 -0.2128 1.5035 974 -0.14 0.8875


INTERCEPT 39 -0.4752 1.3660 974 -0.35 0.7280
INTERCEPT 40 -0.6953 1.2769 974 -0.54 0.5862
INTERCEPT 49 -0.3190 1.4432 974 -0.22 0.8251
INTERCEPT 50 -0.1929 1.5157 974 -0.13 0.8988
INTERCEPT 203 0.0390 0.5066 974 0.08 0.9387
CHAPTER 13. RANDOM-EFFECTS MODELS 303

13.7.5 Discussion

Let us combine

• dose level
• predicted mean
• random intercept
• predicted value

In this case, also the probability scale is relevant:

• Conversion of predicted means:


exp(Xiβ̂)
P (Yij = 1|Xi, β̂) =
1 + exp(Xiβ̂)
• Conversion of predicted values:
exp(Xiβ̂ + Zib̂i)
P (Yij = 1|Xi, Zi, β̂, b̂i) =
1 + exp(Xiβ̂ + Zib̂i)
CHAPTER 13. RANDOM-EFFECTS MODELS 304

linear scale probability scale


------------ -----------------
LITTER DOSE MEAN R.INT. PRED MEAN(P) R.E.(P) PRED(P)

38.000 0.000 -5.602 -0.213 -5.815 0.004 -0.001 0.003


49.000 0.000 -5.602 -0.319 -5.921 0.004 -0.001 0.003
50.000 0.000 -5.602 -0.193 -5.795 0.004 -0.001 0.003
61.000 0.000 -5.602 -0.151 -5.753 0.004 -0.001 0.003
62.000 0.000 -5.602 -0.172 -5.774 0.004 -0.001 0.003
73.000 0.000 -5.602 -0.213 -5.815 0.004 -0.001 0.003
85.000 0.000 -5.602 -0.335 -5.937 0.004 -0.001 0.003
86.000 0.000 -5.602 2.749 -2.853 0.004 0.051 0.055
97.000 0.000 -5.602 2.496 -3.106 0.004 0.039 0.043
98.000 0.000 -5.602 -0.286 -5.888 0.004 -0.001 0.003
109.000 0.000 -5.602 -0.172 -5.774 0.004 -0.001 0.003
110.000 0.000 -5.602 -0.286 -5.888 0.004 -0.001 0.003
119.000 0.000 -5.602 -0.319 -5.921 0.004 -0.001 0.003
120.000 0.000 -5.602 -0.213 -5.815 0.004 -0.001 0.003
129.000 0.000 -5.602 -0.286 -5.888 0.004 -0.001 0.003
130.000 0.000 -5.602 -0.286 -5.888 0.004 -0.001 0.003
139.000 0.000 -5.602 3.684 -1.918 0.004 0.124 0.128
140.000 0.000 -5.602 3.106 -2.496 0.004 0.072 0.076
149.000 0.000 -5.602 -0.268 -5.870 0.004 -0.001 0.003
150.000 0.000 -5.602 -0.335 -5.937 0.004 -0.001 0.003
159.000 0.000 -5.602 -0.268 -5.870 0.004 -0.001 0.003
160.000 0.000 -5.602 -0.268 -5.870 0.004 -0.001 0.003
169.000 0.000 -5.602 -0.286 -5.888 0.004 -0.001 0.003
170.000 0.000 -5.602 -0.250 -5.853 0.004 -0.001 0.003
179.000 0.000 -5.602 -0.303 -5.905 0.004 -0.001 0.003
180.000 0.000 -5.602 -0.268 -5.870 0.004 -0.001 0.003
189.000 0.000 -5.602 -0.081 -5.683 0.004 -0.000 0.003
190.000 0.000 -5.602 -0.213 -5.815 0.004 -0.001 0.003
199.000 0.000 -5.602 -0.193 -5.795 0.004 -0.001 0.003
200.000 0.000 -5.602 -0.172 -5.774 0.004 -0.001 0.003
39.000 0.167 -4.638 -0.475 -5.113 0.010 -0.004 0.006
60.000 0.167 -4.638 -0.592 -5.231 0.010 -0.004 0.005
63.000 0.167 -4.638 -0.537 -5.175 0.010 -0.004 0.006
72.000 0.167 -4.638 -0.565 -5.203 0.010 -0.004 0.005
75.000 0.167 -4.638 -0.565 -5.203 0.010 -0.004 0.005
84.000 0.167 -4.638 -0.475 -5.113 0.010 -0.004 0.006
87.000 0.167 -4.638 -0.537 -5.175 0.010 -0.004 0.006
96.000 0.167 -4.638 -0.287 -4.925 0.010 -0.002 0.007
99.000 0.167 -4.638 -0.565 -5.203 0.010 -0.004 0.005
108.000 0.167 -4.638 -0.618 -5.257 0.010 -0.004 0.005
118.000 0.167 -4.638 -0.370 -5.008 0.010 -0.003 0.007
CHAPTER 13. RANDOM-EFFECTS MODELS 305

121.000 0.167 -4.638 -0.507 -5.145 0.010 -0.004 0.006


128.000 0.167 -4.638 -0.537 -5.175 0.010 -0.004 0.006
131.000 0.167 -4.638 -0.537 -5.175 0.010 -0.004 0.006
138.000 0.167 -4.638 -0.507 -5.145 0.010 -0.004 0.006
141.000 0.167 -4.638 -0.537 -5.175 0.010 -0.004 0.006
148.000 0.167 -4.638 -0.592 -5.231 0.010 -0.004 0.005
151.000 0.167 -4.638 -0.507 -5.145 0.010 -0.004 0.006
158.000 0.167 -4.638 2.018 -2.620 0.010 0.058 0.068
161.000 0.167 -4.638 -0.592 -5.231 0.010 -0.004 0.005
171.000 0.167 -4.638 -0.565 -5.203 0.010 -0.004 0.005
181.000 0.167 -4.638 -0.507 -5.145 0.010 -0.004 0.006
188.000 0.167 -4.638 -0.442 -5.080 0.010 -0.003 0.006
191.000 0.167 -4.638 -0.507 -5.145 0.010 -0.004 0.006
198.000 0.167 -4.638 -0.189 -4.828 0.010 -0.002 0.008
201.000 0.167 -4.638 -0.442 -5.080 0.010 -0.003 0.006
40.000 0.333 -3.674 -0.695 -4.370 0.025 -0.012 0.012
52.000 0.333 -3.674 1.113 -2.561 0.025 0.047 0.072
64.000 0.333 -3.674 -0.634 -4.308 0.025 -0.011 0.013
71.000 0.333 -3.674 1.039 -2.636 0.025 0.042 0.067
76.000 0.333 -3.674 -0.938 -4.612 0.025 -0.015 0.010
83.000 0.333 -3.674 -0.896 -4.570 0.025 -0.014 0.010
88.000 0.333 -3.674 -1.014 -4.689 0.025 -0.016 0.009
100.000 0.333 -3.674 3.337 -0.338 0.025 0.392 0.416
107.000 0.333 -3.674 -0.938 -4.612 0.025 -0.015 0.010
112.000 0.333 -3.674 -0.896 -4.570 0.025 -0.014 0.010
117.000 0.333 -3.674 1.283 -2.391 0.025 0.059 0.084
122.000 0.333 -3.674 -0.851 -4.526 0.025 -0.014 0.011
127.000 0.333 -3.674 1.194 -2.480 0.025 0.053 0.077
137.000 0.333 -3.674 1.283 -2.391 0.025 0.059 0.084
142.000 0.333 -3.674 2.008 -1.667 0.025 0.134 0.159
147.000 0.333 -3.674 -0.977 -4.652 0.025 -0.015 0.009
152.000 0.333 -3.674 -1.014 -4.689 0.025 -0.016 0.009
157.000 0.333 -3.674 -0.896 -4.570 0.025 -0.014 0.010
162.000 0.333 -3.674 -0.938 -4.612 0.025 -0.015 0.010
167.000 0.333 -3.674 -0.938 -4.612 0.025 -0.015 0.010
172.000 0.333 -3.674 1.748 -1.926 0.025 0.102 0.127
177.000 0.333 -3.674 2.231 -1.443 0.025 0.166 0.191
182.000 0.333 -3.674 -0.752 -4.426 0.025 -0.013 0.012
187.000 0.333 -3.674 1.913 -1.761 0.025 0.122 0.147
192.000 0.333 -3.674 -0.399 -4.073 0.025 -0.008 0.017
197.000 0.333 -3.674 2.366 -1.308 0.025 0.188 0.213
53.000 0.667 -1.747 -2.065 -3.811 0.148 -0.127 0.022
70.000 0.667 -1.747 1.157 -0.589 0.148 0.208 0.357
82.000 0.667 -1.747 1.393 -0.354 0.148 0.264 0.412
89.000 0.667 -1.747 0.311 -1.436 0.148 0.044 0.192
CHAPTER 13. RANDOM-EFFECTS MODELS 306

94.000 0.667 -1.747 -1.361 -3.108 0.148 -0.106 0.043


101.000 0.667 -1.747 2.235 0.489 0.148 0.471 0.620
106.000 0.667 -1.747 -2.065 -3.811 0.148 -0.127 0.022
126.000 0.667 -1.747 -0.396 -2.143 0.148 -0.043 0.105
136.000 0.667 -1.747 -2.065 -3.811 0.148 -0.127 0.022
153.000 0.667 -1.747 -0.978 -2.725 0.148 -0.087 0.062
163.000 0.667 -1.747 -1.793 -3.540 0.148 -0.120 0.028
166.000 0.667 -1.747 1.157 -0.589 0.148 0.208 0.357
173.000 0.667 -1.747 -0.292 -2.039 0.148 -0.033 0.115
186.000 0.667 -1.747 0.991 -0.756 0.148 0.171 0.319
193.000 0.667 -1.747 -0.174 -1.921 0.148 -0.021 0.128
196.000 0.667 -1.747 -0.292 -2.039 0.148 -0.033 0.115
203.000 0.667 -1.747 0.039 -1.708 0.148 0.005 0.153
57.000 1.000 0.181 -1.329 -1.148 0.545 -0.304 0.241
66.000 1.000 0.181 -0.531 -0.350 0.545 -0.132 0.413
78.000 1.000 0.181 2.348 2.529 0.545 0.381 0.926
90.000 1.000 0.181 0.789 0.970 0.545 0.180 0.725
93.000 1.000 0.181 1.848 2.029 0.545 0.339 0.884
105.000 1.000 0.181 1.250 1.431 0.545 0.262 0.807
124.000 1.000 0.181 -0.531 -0.350 0.545 -0.132 0.413
154.000 1.000 0.181 -2.004 -1.823 0.545 -0.406 0.139
184.000 1.000 0.181 0.468 0.649 0.545 0.112 0.657
CHAPTER 13. RANDOM-EFFECTS MODELS 307

Remarks

• On the linear scale:


– mean of random intercepts is zero
– mean of average over litters is −3.8171
– mean of predicted value over litters is −3.8171
• On the probability scale:
– mean of random effect is 0.0207
– mean of average probabilities over litters is 0.0781
– mean of predicted probabilities over litters is
0.0988

This property is well known:


 
−1 −1
g (Xiβ̂) = E g (Xiβ̂ + Zib̂i)

• It is seen in plots
• It shows through main effects estimates
• Neuhaus and Jewell (1993)
CHAPTER 13. RANDOM-EFFECTS MODELS 308
CHAPTER 13. RANDOM-EFFECTS MODELS 309

13.8 The NTP Data

GEE1 and GLIMMIX Estimates (Model Based Standard Errors; Robust Standard Errors) for the
DEHP Data. Exchangeable Working Assumptions/Random Intercept Model.

Outcome Parameter GENMOD PRENTICE GLIMMIX (rep) GLIMMIX (rand)


External β0 -4.98(0.40;0.37) -4.99(0.46;0.37) -5.00(0.36;0.37) -6.62(0.46;0.40)
βd 5.33(0.57;0.55) 5.32(0.65;0.55) 5.32(0.51;0.55) 7.25(0.81;0.64)
φ 0.88 0.65
ρ 0.11 0.11(0.04) 0.06
rand.int. 3.63
residual 0.25
Visceral β0 -4.50(0.37;0.37) -4.51(0.40;0.37) -4.50(0.36;0.37) -5.60(0.37;0.40)
βd 4.55(0.55;0.59) 4.59(0.58;0.59) 4.55(0.54;0.59) 5.78(0.70;0.71)
φ 1.00 0.92
ρ 0.08 0.11(0.05) 0.08
rand.int. 2.73
residual 0.34
Skeletal β0 -4.83(0.44;0.45) -4.82(0.47;0.44) -4.82(0.46;0.45) -6.63(0.48;0.53)
βd 4.84(0.62;0.63) 4.84(0.67;0.63) 4.84(0.65;0.63) 6.65(0.84;0.89)
φ 0.98 0.86
ρ 0.12 0.14(0.06) 0.13
rand.int. 3.63
residual 0.25
Collapsed β0 -4.05(0.32;0.31) -4.06(0.35;0.31) -4.04(0.33;0.31) -4.85(0.32;0.34)
βd 5.84(0.57;0.61) 5.89(0.62;0.61) 5.82(0.58;0.61) 7.20(0.67;0.72)
φ 1.00 0.96
ρ 0.11 0.15(0.05) 0.11
rand.int. 2.30
residual 0.48
CHAPTER 13. RANDOM-EFFECTS MODELS 310

13.9 Transition Models

• In this approach we model the conditional distribution


of each Yij given its predecessors Yi1, . . . , Yi,j−1.
• Typically, this is achieved by including the
predecessors as additional covariates in a classical
GLM
• Thus we assume
– E(Yij |Yi1, . . . , Yi,j−1) = µij
– η(µij ) = xTij β + y T(ij)α
where y (ij) = (yi1, . . . , yi,j−1)
– Var(Yij |Yi1, . . . , Yi,j−1) = φv(µij )
• Then, construct a likelihood for β and α via
f (y i) = f (yi1)f (yi2|yi1)f (yi3|yi1, yi2) . . . f (yini |yi1, . . . , yi,ni −1)
CHAPTER 13. RANDOM-EFFECTS MODELS 311

• In practice one will make the “recent past”


assumption: only the most recent measurements are
needed:
f (yij |yi1, . . . , yi,j−1) = f (yij |yi,j−k , . . . , yi,j−1)
a Markov dependence of order k.
• The joint distribution simplifies to
f (y i) = f (yi1)f (yi2 |yi1) . . . f (yik |yi1, . . . , yi,k−1)
n
i
× f (yij |yi,j−k , . . . , yi,j−1)
j=k+1

• The transition model specifies the form of the


conditional densities within the product sign.
It does not explicitly specify the remaining terms in
f (y i) and these may be impossible to evaluate.
• Pragmatic solution: ignore the unspecified terms.
This represents a loss of information which may or
may not be serious, depending on the values of k and
ni.
It is a potentially serious disadvantage for data
consisting of short sequences.
CHAPTER 13. RANDOM-EFFECTS MODELS 312

13.10 Differences Between Families of


Models

The parameter β does not have the same substantive


meaning in the three approaches:

• in marginal modelling, β is unequivocally a population


parameter; it determines the effect of explanatory
variables on the population mean response
• in conditional, transition or random effects modelling,
β is still a population parameter, in the sense that it
operates on all of the subjects, but it determines the
effects of explanatory variables on the conditional
mean response of an individual subject
– given that subject’s measurement history (transition model),
OR

– given that subject’s own random characteristics Ui (random effects)


OR

– given that subject’s other outcomes (conditional model)

• the three classes of model are fundamentally different,


and no easy conversion is possible
Chapter 14

Case Study: Analgesic Trial

14.1 PROC NLMIXED Code

proc nlmixed data=gsa;


parms beta0=3 beta1=-0.8 beta2=0.2 beta3=-0.2 s2u=1;
eta = beta0 + beta1*time + beta2*time2 + beta3*pca0 + u;
expeta = exp(eta);
p = expeta/(1+expeta);
model gsabin ~ binary(p);
random u ~ normal(0,s2u) subject=patid;
estimate ’ICC’ s2u/(arcos(-1)**2/3 + s2u);
run;

313
CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 314

14.1.1 Output

The NLMIXED Procedure

Specifications

Description Value

Data Set WORK.GSA


Dependent Variable gsabin
Distribution for Dependent Variable Binary
Random Effects u
Distribution for Random Effects Normal
Subject Variable PATID
Optimization Technique Dual Quasi-Newton
Integration Method Gaussian Quadrature

Dimensions

Description Value

Observations Used 1137


Observations Not Used 0
Total Observations 1137
Subjects 395
Max Obs Per Subject 4
Parameters 5
Quadrature Points 20

Parameters

beta0 beta1 beta2 beta3 s2u NegLogLike

3 -0.8 0.2 -0.2 1 512.393718

Iteration History

Iter Calls NegLogLike Diff MaxGrad Slope

1 4 511.382562 1.011155 8.728454 -654.408


2 6 511.150086 0.232476 8.37513 -7.92042
3 8 507.712837 3.437249 8.276495 -4.42161
4 10 506.777649 0.935189 1.938972 -0.84523
CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 315

5 12 506.53802 0.239629 4.142048 -0.11795


6 13 506.380646 0.157374 20.07718 -0.1081
7 15 506.287593 0.093053 2.582222 -0.15011
8 17 506.275845 0.011748 0.605879 -0.01344
9 19 506.274764 0.001081 0.292118 -0.00145
10 21 506.274723 0.000041 0.044318 -0.00007
11 23 506.274723 3.296E-7 0.002903 -5.24E-7

NOTE: GCONV convergence criterion satisfied.

Fit Statistics

Description Value

-2 Log Likelihood 1012.5


AIC (smaller is better) 1022.5
BIC (smaller is better) 1042.4
Log Likelihood -506.3
AIC (larger is better) -511.3
BIC (larger is better) -521.2

Parameter Estimates

Standard
Parameter Estimate Error DF t Value Pr > |t| Alpha Lower

beta0 4.0474 0.7097 394 5.70 <.0001 0.05 2.6521


beta1 -1.1600 0.4657 394 -2.49 0.0131 0.05 -2.0756
beta2 0.2445 0.09472 394 2.58 0.0102 0.05 0.05826
beta3 -0.2997 0.1428 394 -2.10 0.0365 0.05 -0.5805
s2u 2.5326 0.6764 394 3.74 0.0002 0.05 1.2027

Parameter Estimates

Parameter Upper Gradient

beta0 5.4428 0.000556


beta1 -0.2445 -0.00006
beta2 0.4307 -0.0029
beta3 -0.01893 0.001801
s2u 3.8624 -0.00006

Additional Estimates
CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 316

Standard
Label Estimate Error DF t Value Pr > |t| Alpha Lower Upper

ICC 0.4350 0.06564 394 6.63 <.0001 0.05 0.3059 0.5640


CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 317

14.1.2 Population Averaged Profiles

Example code to derive population-averaged fitted


(complete) profiles

• Need to calculate

+∞ exp(xTij β + x) 1 −x2 /(2σu2 )
−∞ . √ e dx.
1 + exp(xTij β + x) 2πσu
• Take mean value of the covariates to evaluate this
expression.
• This gives fitted complete profiles, that is, what would
be obtained had all the patients stayed in the study.
CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 318

*** use the following statement in PROC NLMIXED to get parameter estimates;
ods output parameterestimates=parmest;

proc iml;
*** read in fixed param. estimates;
use parmest;
read all var{estimate} into parmest;
beta=parmest[1:(nrow(parmest)-1)];
sig2=parmest[nrow(parmest)];

*** module that evaluates the function to be integrated;


start integr(x) global(sig2,xbeta);
f=(exp(xbeta+x)/(1+exp(xbeta+x)))*exp(-0.5*(x**2)/sig2)/sqrt(2*arcos(-1)*sig2);
return(f);
finish;

cif=probit(0.975);
do t=1 to 4;
xcov={1} // t // t**2 // {3};*** Note: 3 = median baseline PCA;
xbeta=t(xcov)*beta;
call quad(prc,"integr",{.M .P});
*** approximate confidence intervals (ignoring variability in the estimates);
low_prc=exp(xbeta-cif*sqrt(sig2))/(1+exp(xbeta-cif*sqrt(sig2)));
upp_prc=exp(xbeta+cif*sqrt(sig2))/(1+exp(xbeta+cif*sqrt(sig2)));
pdrespc=pdrespc // (t || prc || low_prc || upp_prc);
end;
CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 319

14.1.3 Fitted Model


CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 320
CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 321
CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 322

14.2 Comparison of Different


Approaches/Programs

14.2.1 PROC NLMIXED (Gaussian quadrature


and N-R)

• Code:
proc nlmixed data=gsa npoints=20 noad noadscale tech=newrap;
parms beta0=3 beta1=-0.8 beta2=0.2 beta3=-0.2 su=1;
eta = beta0 + beta1*time + beta2*time2 + beta3*pca0 + u;
expeta = exp(eta);
p = expeta/(1+expeta);
model gsabin ~ binary(p);
random u ~ normal(0,su**2) subject=patid;
estimate ’ICC’ su**2/(arcos(-1)**2/3 + su**2);
run;

• Output:
The NLMIXED Procedure

Specifications

Description Value

Data Set WORK.GSA


Dependent Variable gsabin
Distribution for Dependent Variable Binary
Random Effects u
Distribution for Random Effects Normal
Subject Variable PATID
Optimization Technique Newton-Raphson
Integration Method Gaussian Quadrature

Dimensions

Description Value

Observations Used 1137


CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 323

Observations Not Used 0


Total Observations 1137
Subjects 395
Max Obs Per Subject 4
Parameters 5
Quadrature Points 20

Parameters

beta0 beta1 beta2 beta3 su NegLogLike

3 -0.8 0.2 -0.2 1 512.393718

Iteration History

Iter Calls NegLogLike Diff MaxGrad Slope

1 14 506.413768 5.97995 11.45115 -11.1936


2 21 506.275166 0.138602 0.266335 -0.26772
3 28 506.274723 0.000443 0.0004 -0.00088
4 35 506.274723 6.782E-9 4.515E-9 -1.36E-8

NOTE: GCONV convergence criterion satisfied.

Fit Statistics

Description Value

-2 Log Likelihood 1012.5


AIC (smaller is better) 1022.5
BIC (smaller is better) 1042.4
Log Likelihood -506.3
AIC (larger is better) -511.3
BIC (larger is better) -521.2

Parameter Estimates

Standard
Parameter Estimate Error DF t Value Pr > |t| Alpha Lower

beta0 4.0474 0.7097 394 5.70 <.0001 0.05 2.6521


beta1 -1.1600 0.4657 394 -2.49 0.0131 0.05 -2.0755
beta2 0.2445 0.09472 394 2.58 0.0102 0.05 0.05826
beta3 -0.2997 0.1428 394 -2.10 0.0365 0.05 -0.5805
su 1.5914 0.2125 394 7.49 <.0001 0.05 1.1736

Parameter Estimates
CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 324

Parameter Upper Gradient

beta0 5.4427 -1.56E-9


beta1 -0.2445 -2E-9
beta2 0.4307 1.313E-9
beta3 -0.01893 1.468E-9
su 2.0092 -4.52E-9

Additional Estimates

Standard
Label Estimate Error DF t Value Pr > |t| Alpha Lower Upper

ICC 0.4350 0.06564 394 6.63 <.0001 0.05 0.3059 0.5640


CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 325

14.2.2 MIXOR

MIXOR - The program for mixed-effects ordinal regression analysis


(version 2)

Global Satisfaction Assessment

Response function: logistic

Random-effects distribution: normal

Covariate(s) and random-effect(s) mean subtracted from thresholds


==> positive coefficient = positive association between regressor
and ordinal outcome

Numbers of observations
-----------------------

Level 1 observations = 1137


Level 2 observations = 395

The number of level 1 observations per level 2 unit are:

1 1 4 2 1 4 3 4 3 3 4 4 4 3 1 1 1 3 2
...

Descriptive statistics for all variables


----------------------------------------

Variable Minimum Maximum Mean Stand. Dev.

GSAbin 0.00000 1.00000 0.81882 0.38534


intcpt 1.00000 1.00000 1.00000 0.00000
Time 1.00000 4.00000 2.25330 1.12238
Time2 1.00000 16.00000 6.33597 5.55444
PCA0 1.00000 5.00000 3.02375 0.89519

Categories of the response variable GSAbin


--------------------------------------------

Category Frequency Proportion

0.00 206.00 0.18118


1.00 931.00 0.81882
CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 326

Starting values
---------------

mean 1.022
covariates 0.295 -0.066 0.079
var. terms 0.574

==> The number of level 2 observations with non-varying responses


= 284 ( 71.90 percent )

---------------------------------------------------------
* Final Results - Maximum Marginal Likelihood Estimates *
---------------------------------------------------------

Total Iterations = 10
Quad Pts per Dim = 20
Log Likelihood = -506.275
Deviance (-2logL) = 1012.549
Ridge = 0.000

Variable Estimate Stand. Error Z p-value


-------- ------------ ------------ ------------ ------------
intcpt 4.04741 0.71278 5.67835 0.00000 (2)
Time -1.16003 0.47453 -2.44457 0.01450 (2)
Time2 0.24449 0.09678 2.52624 0.01153 (2)
PCA0 -0.29971 0.15375 -1.94932 0.05126 (2)

Random effect variance term (standard deviation)


intcpt 1.59139 0.20578 7.73355 0.00000 (1)

note: (1) = 1-tailed p-value


(2) = 2-tailed p-value

Calculation of the intracluster correlation


-------------------------------------------
residual variance = pi*pi / 3 (assumed)
cluster variance = (1.591 * 1.591) = 2.533

intracluster correlation = 2.533 / ( 2.533 + (pi*pi/3)) = 0.435


CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 327

14.2.3 PQL2 (MLwiN) without and with


extra-dispersion parameter
CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 328

14.2.4 PQL (MLwiN) without and with


extra-dispersion parameter
CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 329

14.2.5 PQL (GLIMMIX)

• Code:
%glimmix(data=gsa, procopt=%str(method=ml noclprint covtest),
stmts=%str(
class patid timecls;
model gsabin = time|time pca0 / s;
repeated timecls / sub=patid type=un rcorr=3;
),
error=binomial);

• Output:
The Mixed Procedure

Model Information

Data Set WORK._DS


Dependent Variable _z
Weight Variable _w
Covariance Structure Variance Components
Subject Effect PATID
Estimation Method ML
Residual Variance Method Profile
Fixed Effects SE Method Model-Based
Degrees of Freedom Method Containment

Dimensions

Covariance Parameters 2
Columns in X 4
Columns in Z Per Subject 1
Subjects 395
Max Obs Per Subject 4
Observations Used 1137
Observations Not Used 0
Total Observations 1137

Parameter Search
CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 330

CovP1 CovP2 Variance Log Like -2 Log Like

3.1651 0.4954 0.4954 -2833.4382 5666.8764

Iteration History

Iteration Evaluations -2 Log Like Criterion

1 1 5666.87638845 0.00000000

Convergence criteria met.

Covariance Parameter Estimates

Standard Z
Cov Parm Subject Estimate Error Value Pr Z

Intercept PATID 3.1651 0.3911 8.09 <.0001


Residual 0.4954 0.02521 19.65 <.0001

Fit Statistics

Log Likelihood -2833.4


Akaike’s Information Criterion -2835.4
Schwarz’s Bayesian Criterion -2839.4
-2 Log Likelihood 5666.9

PARMS Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

1 0.00 1.0000

Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 4.0292 0.5476 394 7.36 <.0001


TIME -1.2788 0.3341 739 -3.83 0.0001
CHAPTER 14. CASE STUDY: ANALGESIC TRIAL 331

TIME*TIME 0.2590 0.06800 739 3.81 0.0002


pca0 -0.2922 0.1300 739 -2.25 0.0249

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

TIME 1 739 14.65 0.0001


TIME*TIME 1 739 14.50 0.0002
pca0 1 739 5.05 0.0249

GLIMMIX Model Statistics

Description Value

Deviance 564.5908
Scaled Deviance 1139.5937
Pearson Chi-Square 451.3494
Scaled Pearson Chi-Square 911.0227
Extra-Dispersion Scale 0.4954
Chapter 15

Analgesic Trial: Ordinal Data

15.1 Proportional odds model

• Multinomial model for a response variable with K


ordered categories (1,. . . ,K):
P [Yi ≤ r] = F (µr + xTi β) for r = 1, . . . , K − 1.
where F is a cumulative distribution function.
• F = cdf for the logistic distribution ⇒ proportional
odds model.

332
CHAPTER 15. ANALGESIC TRIAL: ORDINAL DATA 333

• PROC LOGISTIC code (parameter estimates can


serve as initial values for PROC NLMIXED):
proc logistic data=gsa;
class patid timecls;
model gsa = time|time pca0;
run;

• Output:

The LOGISTIC Procedure

Model Information

Data Set WORK.GSA


Response Variable GSA
Number of Response Levels 5
Number of Observations 1137
Link Function Logit
Optimization Technique Fisher’s scoring

Response Profile

Ordered Total
Value GSA Frequency

1 Bad 163
2 Good 329
3 Moderate 439
4 Very Bad 43
5 Very Good 163

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Score Test for the Proportional Odds Assumption

Chi-Square DF Pr > ChiSq


CHAPTER 15. ANALGESIC TRIAL: ORDINAL DATA 334

30.0808 9 0.0004

Model Fit Statistics

Intercept
Intercept and
Criterion Only Covariates

AIC 3207.617 3212.857


SC 3227.761 3248.110
-2 Log L 3199.617 3198.857

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 0.7597 3 0.8591


Score 0.7336 3 0.8653
Wald 0.7820 3 0.8538

Analysis of Maximum Likelihood Estimates

Standard
Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -1.7603 0.3411 26.6377 <.0001


Intercept2 1 -0.2421 0.3359 0.5196 0.4710
Intercept3 1 1.5386 0.3395 20.5387 <.0001
Intercept4 1 1.8178 0.3413 28.3632 <.0001
TIME 1 0.1201 0.2697 0.1984 0.6561
TIME*TIME 1 -0.0262 0.0545 0.2305 0.6312
pca0 1 -0.0441 0.0601 0.5379 0.4633

Odds Ratio Estimates

Point 95% Wald


Effect Estimate Confidence Limits

pca0 0.957 0.850 1.077

Association of Predicted Probabilities and Observed Responses

Percent Concordant 44.3 Somers’ D 0.012


CHAPTER 15. ANALGESIC TRIAL: ORDINAL DATA 335

Percent Discordant 43.1 Gamma 0.013


Percent Tied 12.5 Tau-a 0.009
Pairs 468410 c 0.506
CHAPTER 15. ANALGESIC TRIAL: ORDINAL DATA 336

15.2 GEE Model

• Only the IND structure is available for the


multinomial distribution in PROC GENMOD.
• Code:

proc genmod data=gsa;


ods listing exclude classlevels parminfo parameterestimates;
class patid timecls;
model gsa = time|time pca0 / dist=multinomial link=cumlogit;
repeated sub=patid / type=ind within=timecls;
run;

• Output:

The GENMOD Procedure

Model Information

Data Set WORK.GSA


Distribution Multinomial
Link Function Cumulative Logit
Dependent Variable GSA
Observations Used 1137

Response Profile

Ordered Ordered
Level Value Count

1 Bad 163
2 Good 329
3 Moderate 439
4 Very Bad 43
5 Very Good 163
CHAPTER 15. ANALGESIC TRIAL: ORDINAL DATA 337

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Log Likelihood -1599.4286

Algorithm converged.

GEE Model Information

Correlation Structure Independent


Within-Subject Effect timecls (4 levels)
Subject Effect PATID (395 levels)
Number of Clusters 395
Correlation Matrix Dimension 4
Maximum Cluster Size 4
Minimum Cluster Size 1

Algorithm converged.

Analysis Of GEE Parameter Estimates


Empirical Standard Error Estimates

Standard 95% Confidence


Parameter Estimate Error Limits Z Pr > |Z|

Intercept1 -1.7598 0.3661 -2.4774 -1.0423 -4.81 <.0001


Intercept2 -0.2416 0.3703 -0.9674 0.4842 -0.65 0.5141
Intercept3 1.5392 0.3727 0.8087 2.2696 4.13 <.0001
Intercept4 1.8184 0.3841 1.0656 2.5711 4.73 <.0001
TIME 0.1201 0.2634 -0.3962 0.6365 0.46 0.6483
TIME*TIME -0.0262 0.0527 -0.1295 0.0772 -0.50 0.6196
pca0 -0.0443 0.0817 -0.2044 0.1158 -0.54 0.5876
CHAPTER 15. ANALGESIC TRIAL: ORDINAL DATA 338

15.3 Random-effects Model

Hedeker D. & Gibbons R. (1994). A random-effects ordinal


regression model for multilevel analysis. Biometrics, 50, 933–44.

• Model assuming an underlying latent continuous


variable ỹij (e.g. normal or logistic):
ỹij = xTij β + ui + ij
• Instead of observing ỹij , we observe only whether it
falls in one of the intervals
] − ∞, γ1], ]γ1, γ2], . . . , ]γK−2, γK−1], ]γK−1, +∞[

• The model can then be written as


P [yij = r|ui] = F (γr − zij ) − F (γr−1 − zij ), r = 1, . . . , K − 1

with zij = xTij β + ui, γ0 = −∞ and γK = +∞


• This has been implemented in the MIXOR program.
CHAPTER 15. ANALGESIC TRIAL: ORDINAL DATA 339

• PROC NLMIXED code:

proc nlmixed data=gsa qpoints=20;


parms i1=-1.8 i2=-0.2 i3=1.5 i4=1.8 b1=0.1 b2=0 b3=0 sd=1;
eta = b1*time + b2*time2 + b3*pca0 + u;
if gsa=1 then z = 1/(1+exp(-(i1-eta)));
else if gsa=2 then z = 1/(1+exp(-(i2-eta))) - 1/(1+exp(-(i1-eta)));
else if gsa=3 then z = 1/(1+exp(-(i3-eta))) - 1/(1+exp(-(i2-eta)));
else if gsa=4 then z = 1/(1+exp(-(i4-eta))) - 1/(1+exp(-(i3-eta)));
else z = 1 - 1/(1+exp(-(i4-eta)));
if z > 1e-8 then ll = log(z);
else ll = -1e100;
model gsa ~ general(ll);
random u ~ normal(0,sd*sd) subject=patid;
estimate ’var_u’ sd*sd;
estimate ’icc’ sd*sd/(sd*sd+arcos(-1)**2/3);
run;

• Output:

The NLMIXED Procedure

Specifications

Description Value

Data Set WORK.GSA


Dependent Variable GSA
Distribution for Dependent Variable General
Random Effects u
Distribution for Random Effects Normal
Subject Variable PATID
Optimization Technique Dual Quasi-Newton
Integration Method Adaptive Gaussian
Quadrature

Dimensions

Description Value

Observations Used 1137


CHAPTER 15. ANALGESIC TRIAL: ORDINAL DATA 340

Observations Not Used 0


Total Observations 1137
Subjects 395
Max Obs Per Subject 4
Parameters 8
Quadrature Points 20

Parameters

i1 i2 i3 i4 b1 b2 b3 sd

-1.8 -0.2 1.5 1.8 0.1 0 0 1

Parameters

NegLogLike

1716.69515

Iteration History

Iter Calls NegLogLike Diff MaxGrad Slope

1 3 1694.65105 22.0441 989.5756 -7996.23


2 4 1615.93733 78.71373 919.9434 -4222.5
3 5 1542.58202 73.3553 739.0983 -386.772
4 7 1498.82212 43.7599 200.5629 -245.556
5 8 1470.57325 28.24887 86.6201 -289.557
6 10 1459.63373 10.93952 101.1084 -30.3395
7 12 1454.03215 5.601585 42.71724 -4.98865
8 14 1451.81078 2.221367 39.48043 -1.31487

Iteration History

Iter Calls NegLogLike Diff MaxGrad Slope

9 16 1450.83262 0.978159 15.13727 -0.77169


10 18 1450.35766 0.474961 22.16116 -0.31726
11 19 1449.68318 0.674483 9.663066 -0.195
12 21 1447.25905 2.42413 5.373867 -0.8011
13 22 1445.77923 1.479822 19.19658 -2.05807
14 24 1445.25326 0.525968 6.559206 -0.92561
15 26 1445.24347 0.009789 1.730594 -0.01624
16 28 1445.24043 0.003041 2.766239 -0.00236
17 30 1445.23852 0.001914 0.297433 -0.00122
CHAPTER 15. ANALGESIC TRIAL: ORDINAL DATA 341

18 32 1445.23842 0.0001 0.238891 -0.00015


19 34 1445.23839 0.000023 0.041654 -0.00002
20 36 1445.23839 3.868E-7 0.001023 -7.06E-7

NOTE: GCONV convergence criterion satisfied.

Fit Statistics

Description Value

-2 Log Likelihood 2890.5


AIC (smaller is better) 2906.5
BIC (smaller is better) 2938.3
Log Likelihood -1445
AIC (larger is better) -1453
BIC (larger is better) -1469

Parameter Estimates

Standard
Parameter Estimate Error DF t Value Pr > |t| Alpha Lower

i1 -1.5585 0.5481 394 -2.84 0.0047 0.05 -2.6360


i2 1.0292 0.5442 394 1.89 0.0593 0.05 -0.04061
i3 3.8916 0.5624 394 6.92 <.0001 0.05 2.7860
i4 6.2144 0.5990 394 10.37 <.0001 0.05 5.0368
b1 0.5410 0.3078 394 1.76 0.0796 0.05 -0.06420
b2 -0.1123 0.06187 394 -1.82 0.0702 0.05 -0.2340
b3 0.3173 0.1386 394 2.29 0.0226 0.05 0.04476
sd 2.1082 0.1412 394 14.94 <.0001 0.05 1.8307

Parameter Estimates

Parameter Upper Gradient

i1 -0.4810 -0.00002
i2 2.0991 -0.00011
i3 4.9973 0.000138
i4 7.3920 -0.00003
b1 1.1463 0.00026
b2 0.009308 0.001023
b3 0.5898 0.000191
sd 2.3857 -0.0001
CHAPTER 15. ANALGESIC TRIAL: ORDINAL DATA 342

Additional Estimates

Standard
Label Estimate Error DF t Value Pr > |t| Alpha Lower Upper

var_u 4.4447 0.5952 394 7.47 <.0001 0.05 3.2746 5.6148


icc 0.5747 0.03273 394 17.56 <.0001 0.05 0.5103 0.6390
Chapter 16

Missing Data

16.1 Missing Data Notation

• Subject i at occasion (time) j = 1, . . . , ni

• Measurement Yij

• Dropout indicator





 1 if Yij is observed,
Rij = 

 0 otherwise.

343
CHAPTER 16. MISSING DATA 344

• Group Yij into a vector


Y i = (Yi1, . . . , Yini ) = (Y oi, Y m
i )





 Y oi contains Yij for which Rij = 1,




 Ym
i contains Yij for which Rij = 0.

• Group Rij into a vector Ri = (Ri1, . . . , Rini )

Dropout

• Monotone patterns only

• Di: time of dropout (relevant for monotone


processes)

• Possible definition:

ni
Di = 1 + Rij
j=1
CHAPTER 16. MISSING DATA 345

16.2 The Name of the Game

Complete data Y i: the scheduled measurements


the outcome vector that would be recorded if there were no missing data.

Missing data indicators Ri, also called the missing


data process.

Full data (Y i, Ri): the complete data, together with


the missing data indicators.

Observed data Yio.

Missing data Yim.

One observes both the observed data or measurements


Yio, together with the dropout indicators Ri.
CHAPTER 16. MISSING DATA 346

16.3 Factorizing the Distribution

Consider the distribution of the full data:


f (Yi, Di|θ, ψ)

• θ parametrizes the measurement distribution,


• β parametrizes the missingness process.

Several routes possible:

Selection Models:
f (Yi|θ)f (Di|Yi, ψ)

Pattern-Mixture Models:
f (Yi|Di, θ)f (Di|ψ)

Shared parameter models:


f (Yi, Di|bi, θ, ψ)
CHAPTER 16. MISSING DATA 347

16.4 Selection Models

Most models are based on the following factorization:


f (Yi, Di|θ, ψ) = f (Yi|θ)f (Di|Yi, ψ)

• the first factor is the marginal density of the


measurement process

• the second factor is the density of the missingness


process, given the outcomes.

This framework is called selection modeling.

The second factor corresponds to the (self-)selection of individuals into the


“observed” and “missing” groups.
CHAPTER 16. MISSING DATA 348

16.5 Missing Data Processes

f (Di|Y i, ψ) = f (Di|Y oi, Y m


i , ψ).

Missing Completely At Random (MCAR)


Missingness is independent of the measurements:
f (Di|ψ).

Missing At Random (MAR) Missingness is


independent of the unobserved (missing)
measurements, possibly depending on the observed
measurements:
f (Di|Y oi, ψ).
Missing Not At Random (MNAR) Missingness
depends on the missing values.

Above terminology is independent of the statistical


framework chosen to analyse the data.

This is to be contrasted with the terms ignorable and


nonignorable missing data.
CHAPTER 16. MISSING DATA 349

16.6 Ignorability

Let us decide to use likelihood based estimation.

The full data likelihood contribution for subject i:


L∗(θ, ψ|Y i, Di) ∝ f (Y i, Di|θ, ψ).

Base inference on the observed data:


L(θ, ψ|Y i, Di) ∝ f (Y oi, Di|θ, ψ)
with

f (Y i , Di |θ, ψ)
o
= f (Y i, Di|θ, ψ)dY m
i

= f (Y oi, Y m
i |θ)f (Di |Y i , Y i , ψ)dY i .
o m m

Under a MAR process:



f (Y i , Di |θ, ψ)
o
= f (Y oi, Y m i |θ)f (Di |Y i , ψ)dY i
o m

= f (Y oi|θ)f (Di|Y oi, ψ),

The likelihood factorizes into two components.


CHAPTER 16. MISSING DATA 350

16.7 Ignorability ←− Separability

If further θ and β are disjoint


then inference can be based on the marginal observed
data density only.

Within the likelihood framework ignorability is equivalent


to the union of MAR and MCAR (assuming separability)

• Counterexamples:
– Generalized Estimating Equations (Liang and
Zeger)
– Least Squares

• General Account: (Rubin, Bka 1976)


– Sampling Distribution (frequentist) Theory
– Likelihood Based Estimation
– Bayesian Inference
CHAPTER 16. MISSING DATA 351

16.8 Simple Methods

• Rectangular matrix by deletion: complete case


analysis
• Rectangular matrix by completion: imputation
– Vertical: Unconditional mean imputation
– Horizontal: Last observation carried forward
– Diagonal: Conditional mean imputation
• Using data as is: available case analysis
– Frequentist: difficult and not generally valid
– Likelihood: the thing to do – consistent with an
ignorable analysis
CHAPTER 16. MISSING DATA 352

16.9 Three Likelihood Approaches

• Direct likelihood maximization


– continuous: SAS PROC MIXED,. . .
– categorical: generalized linear mixed models (SAS PROC
NLMIXED,. . . )
– NOT: generalized estimating equations !!!

• EM algorithm
Match the data to the “complete” model

• Multiple Imputation
Accounts properly for uncertainty due to missingness
CHAPTER 16. MISSING DATA 353

16.10 An Ignorable Likelihood Analysis

Likelihood based inference is valid, whenever

• the mechanism is MAR,


• the parameters describing the missingness mechanism
are distinct from the measurement model parameters.

PROC MIXED gives valid inference !

Almost. . .

Warnings

• When the research question is concerned with


missingness parameters, a more complex analysis is
needed.
• Precision estimation poses problems (Kenward and
Molenberghs, Stat Sci 1997).
• MNAR hard to rule out: OSWALD (PCMID),. . . .
CHAPTER 16. MISSING DATA 354

16.11 A Selection Model

Measurements: The linear mixed model

yi = Xiβ + Zibi + εi










bi ∼ N (0, D)






independent







 εi ∼ N (0, Σi)

⇓ Vi = ZiDZi + Σi

yi ∼ Nni (Xiβ, Vi)

(Laird and Ware, Bcs 1982)


CHAPTER 16. MISSING DATA 355

16.12 Dropout Model

• Monotone dropout
• Dropout probability at occasion j:
P (Di = j|Di ≥ j, y i, Wi) = g(hij , yij )

• hij : vector with all responses prior to occasion j,


and possibly covariates wij .

• Dropout model:
logit[g(hij , yij )] = logit [P (Di = j|Di ≥ j, y i, Wi)]

= hij ψ + ωyij , i = 1, . . . , N

• MAR if ω = 0

• Non-random if ω = 0
CHAPTER 16. MISSING DATA 356

16.13 Contributions Combined

• Dropout probability:
f (di|y i, Wi, ψ)
 n
 i







[1 − g(hij , yij )] for Di = ni + 1,

 j=2




= 





 d−1
 






[1 − g(hij , yij )]g(hid, yid) for Di = d ≤ ni.
j=2
CHAPTER 16. MISSING DATA 357

16.14 A Paradox

Glynn, Laird and Rubin (1986)

• Two measurements (Y1, Y2)


• Y1 always observed.
• Y2 observed (R = 1) or missing (R = 0).
• Selection model versus pattern-mixture model
f (y1, y2)g(r = 1|y1, y2) = f1(y1, y2)p(r = 1)
f (y1, y2)g(r = 0|y1, y2) = f0(y1, y2)p(r = 0)
or
f (y1, y2)g(y1, y2) = f1(y1, y2)p
f (y1, y2)[1 − g(y1, y2)] = f0(y1, y2)[1 − p]
of which the ratio yields:
1 − g(y1, y2) p
f0(y1, y2) = f1(y1, y2)
g(y1, y2) 1 − p
• The right hand side is identifiable
• The left hand side is not. . .
CHAPTER 16. MISSING DATA 358

16.15 Pattern-Mixture Models

f (Yi, Di|θ, ψ) = f (Yi|Di, θ)f (Di|ψ).

• Natural parameters of selection models and


pattern-mixture models have different meaning.

+ SeM: useful framework of missing data processes.

− SeM: MNAR =⇒ untestable assumptions.

+ PMM: identifiable parts are unambiguous.

• Little (JASA 1993) suggests the use of identifying


relationships.
CHAPTER 16. MISSING DATA 359

16.16 Pattern-Mixture Models

• Cohen and Cohen (1983)


• Muthén, Kaplan, and Hollis (Psychometrika 1987)
• Allison (Soc Method 1987)
• McArdle and Hamagani (Experimental Aging
Research 1992)
• Little (JASA 1993, Bka 1994, JASA 1995)
• Little and Wang (Bcs 1996)
• Hedeker and Gibbons (Psychological Methods 1997)
• Hogan and Laird (SiM 1997)
• Ekholm and Skinner (AppStat 1998)
• Molenberghs, Michiels, and Kenward (Biom J 1998)
• Verbeke, Lesaffre, and Spiessens (1998)
• Molenberghs, Michiels, and Lipsitz (CommStat 1999)
• Michiels, Molenberghs, and Lipsitz (Bcs 1999)
CHAPTER 16. MISSING DATA 360

16.17 Pattern-Mixture Modeling

• Little (1993, 1994 and 1995):


Defines pattern-mixture models
Clear lack of information ⇒ Fair modeling
Identifying restrictions (CCMV, ...)

• Molenberghs, Michiels, Kenward, and Diggle


(Stat Neerl 1998):
Classification possible (cf. selection modeling)
MAR ⇔ ACMV (for monotone patterns)
∗ Selection Models:
f (Di|Y oi, ψ).

∗ Pattern-mixture Models:
f (Yi1, . . . , Yid|Di = d) = f (Yi1, . . . , Yid|Di > d)
CHAPTER 16. MISSING DATA 361

16.18 Estimating Marginal Effects


From PMM

• Pattern-membership probabilities:
π1, . . . , πt, . . . , πT .

• The marginal effects:


n
β = β tπt, = 1, . . . , g
t=1

• Their variance:
Var(β1, . . . , βg ) = AV A
where  

 Var(β t) 0 

V = 






0 Var(πt) 

and
∂(β1, . . . , βg )
A=
∂(β11, . . . , βng , π1, . . . , πn)
CHAPTER 16. MISSING DATA 362

16.19 Random-Coefficient Models

• Sofar, selection models and pattern-mixture models


did not contain random effects, shared between
measurement and dropout model

Shared-parameter models

• Often, one can assume a latent variable driving both


processes
• Other term: Random-coefficient-based models
(Little 1995)
• versus outcome-based models
CHAPTER 16. MISSING DATA 363

16.20 Example

• Latent variable Z drives individual’s response


Yij | α0i, α1i ∼ N(α0i + α1iti; σ 2)
and αi = (α0i, α1i) satisfy
αi ∼ N(α, Φ)

• Corresponding selection model


f (y, r | z) = f (y | z)P(r | z)
and then
 
f (yo, r) = f (y | z)P(r | z)f (z)dzdym,
and
 
f (yo, r) = { f (y | z)dym} P(r | z)f (z)dz

= f (yo | z)P(r | z)f (z)dz.
CHAPTER 16. MISSING DATA 364

16.21 Literature

• Continuous outcomes: Wu and Carroll (Bcs 1988):


– Gaussian random effects model
– PH, logistic, or probit for dropout times
• Joint models for time-to-event (dropout) and
measurements:
– Schluchter (SiM 1992)
– DeGruttola and Tu (Bcs 1994)
– Taylor, Cumberland, and Sy (JASA 1994)
– Tsiatis, DeGruttola, and Wulfsohn (JASA 1995)
– Satten and Longini (App Stat 1996)
– Faucett and Thomas (SiM 1996)
– Wulfsohn and Tsiatis (Bcs 1997)
– Bycott and Taylor (SiM 1998)
CHAPTER 16. MISSING DATA 365

16.22 Non-Normal Outcomes

• generalized linear model for outcomes


• generalized linear model for dropout process
• shared random effects between them
• Literature:
– Wu and Bailey (SiM 1988)
– Wu and Bailey (Bcs 1989)
– Mori, Woolson, and Woodworth (Bcs 1994)
– Follman and Wu (Bcs 1995)
– Pulkstenis et al (JASA 1998)
– Ten Have et al (Bcs 1998)
– Albert and Follman (Bcs 2000)
CHAPTER 16. MISSING DATA 366

16.23 Pros and Cons

• Easier to handle intermittent missingness

• May be viewed as a natural framework for genesis of


data

• Share computational complexity with outcome-based


selection models

• High dependence of inferences on modelling


assumptions

• Modelling assumptions can not be verified from the


data to full satisfaction
(cf. paradox)
CHAPTER 16. MISSING DATA 367

16.24 Less Parametric Approaches

• Generalized estimating equations

• Weights to reflect selection/dropout probability

• Literature
– Robins (SiM 1997)
– Robins and Gill (SiM 1997)
– Rotnitzky and Robins (Scand J Stat 1995)
– Rotnitzky and Robins (SiM 1997)
– Robins, Rotnitzky and Zhao (JASA 1995)
– Robins and Rotnizky (JASA 1995)
– Robins, Rotnitzky, and Scharfstein (JASA 1998)
Chapter 17

Case Study: Analgesic Trial

17.1 Weighted GEE

• Strictly, GEE inference is correct under the MCAR


missing data mechanism.
• A way to reduce bias in the parameter estimates when
the mechanism is MAR is to use Weighted GEE
(WGEE).
• References: Robins, Rotnitzky & Zhao (JASA, 1995)
Fitzmaurice, Molenberghs & Lipsitz (JRSSB, 1995).

368
CHAPTER 17. CASE STUDY: ANALGESIC TRIAL 369

• The idea is to weight each subject’s contribution in


the GEEs by the inverse probability that a subject
drops out at the time he dropped out.
• This can be calculated as
i −1
d
P [Di = di ] = { (1 − P [Rik = 0|Ri2 = . . . = Ri,k−1 = 1])}
k=2
P [Ridi = 0|Ri2 = . . . = Ri,di −1 = 1]I{di ≤T }
CHAPTER 17. CASE STUDY: ANALGESIC TRIAL 370/1

17.2 Analgesic Trial Example

• The analgesic data show some evidence of MAR.


• Model on conditional dropout P [Di = j|Di ≥ j]
shows dependence on previous GSA measurements.
• This model includes previous GSA, baseline PCA,
physical functioning and genetic/congenital disorder.

The GENMOD Procedure

Model Information

Data Set WORK.GSAC


Distribution Binomial
Link Function Logit
Dependent Variable dropout
Observations Used 963
Probability Modeled Pr( dropout = 1 )
Missing Values 15

Class Level Information

Class Levels Values

prevgsa 5 1 2 3 4 5

Response Profile

Ordered Ordered
Level Value Count

1 0 800
2 1 163

Criteria For Assessing Goodness Of Fit


CHAPTER 17. CASE STUDY: ANALGESIC TRIAL 371/1

Criterion DF Value Value/DF

Deviance 955 832.4611 0.8717


Scaled Deviance 955 832.4611 0.8717
Pearson Chi-Square 955 967.4301 1.0130
Scaled Pearson X2 955 967.4301 1.0130
Log Likelihood -416.2306

Algorithm converged.

Analysis Of Parameter Estimates

Standard Wald 95% Chi-


Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq

Intercept 1 -1.8043 0.4856 -2.7562 -0.8525 13.80 0.0002


prevgsa 1 1 -1.0183 0.4131 -1.8278 -0.2087 6.08 0.0137
prevgsa 2 1 -1.0374 0.3772 -1.7767 -0.2980 7.56 0.0060
prevgsa 3 1 -1.3439 0.3716 -2.0721 -0.6156 13.08 0.0003
prevgsa 4 1 -0.2636 0.3828 -1.0140 0.4867 0.47 0.4910
prevgsa 5 0 0.0000 0.0000 0.0000 0.0000 . .
pca0 1 0.2542 0.0986 0.0609 0.4474 6.65 0.0099
PHYSFCT 1 0.0090 0.0038 0.0015 0.0165 5.54 0.0186
gendis 1 0.5863 0.2420 0.1120 1.0607 5.87 0.0154
Scale 0 1.0000 0.0000 1.0000 1.0000
CHAPTER 17. CASE STUDY: ANALGESIC TRIAL 372/1

• PROC GENMOD code to fit WGEE (assuming the


variable wi is the inverse of the probability that
subject i drops out at time di):

proc genmod data=repbin.gsaw;


scwgt wi;
class patid timecls;
model gsabin = time|time pca0 / dist=b;
repeated subject=patid / type=un corrw within=timecls;
run;

Variable GEE WGEE


Intercept 2.950 (0.465) 2.166 (0.694)
Time -0.842 (0.325) -0.437 (0.443)
Time2 0.181 (0.066) 0.120 (0.089)
Baseline PCA -0.244 (0.097) -0.159 (0.130)

Parameter estimates and standard errors (empirical).


Working correlation structure is UN.
CHAPTER 17. CASE STUDY: ANALGESIC TRIAL 373/1

17.2.1 Estimated working correlation structures:

 GEE   WGEE 
 1 0.173 0.246 0.201   1 0.215 0.253 0.167 
   
   
   



1 0.177 0.113 





1 0.196 0.113 


   
   



1 0.456 





1 0.409 


   
   
1 1
Chapter 18

PROC NLMIXED

18.1 Features

• New SAS (V.7 and later) procedure to fit nonlinear


mixed models (i.e. models in which both fixed and
random effects can enter nonlinearly).
• Relies on likelihood inference, that is, PROC
NLMIXED maximizes an approximation (∼ numerical
integration) to the likelihood integrated over the
random effects.

370
CHAPTER 18. PROC NLMIXED 371

18.2 Particularities

• Different integral approximations are available, the


principal one being (adaptative) Gaussian quadrature.
• Different optimization algorithms are available to
carry out the maximization of the likelihood.
• Constraints on parameters are also allowed in the
optimization process.
• The conditional distribution (given the random
effects) can be specified as Normal, Binomial,
Poisson, or as any distribution for which you can
specify the likelihood by programming statements.
• E-B estimates of the random effects can be obtained.
CHAPTER 18. PROC NLMIXED 372

18.3 Limitations

• Only one RANDOM statement can be specified (i.e.,


it can handle 2-level models only).
• Only normal random effects are allowed (though this
is probably the most commonly used choice).
• Does not calculate automatic initial values.
• Make sure your data set is sorted by cluster ID !!!
• No missing values should be left in the (dependent or
independent) variables.
CHAPTER 18. PROC NLMIXED 373

18.4 MIXOR
• Program in the public domain specifically designed for
mixed-effects ordinal regression analysis. The program
can be downloaded at
http://www.uic.edu/ hedeker/mixreg.html

• Performs numerical integration (Gaussian quadrature)


and uses Newton-Raphson algorithm to maximize the
marginal likelihood.

• Differences/similarities with PROC NLMIXED:


– PROC NLMIXED can perform Gaussian
quadrature by using the options NOAD and
NOADSCALE. The number of quadrature points
can be specified with the option QPOINTS=m.
– PROC NLMIXED can maximize the marginal
likelihood using the Newton-Raphson algorithm by
specifying the option TECHNIQUE=NEWRAP.
– When comparing the output from both programs,
there will be some discrepancies in the standard
errors of the parameters. This is because MIXOR
uses an approximation to the (empirical)
information matrix, whereas PROC NLMIXED
uses numerical derivatives.
Chapter 19

Introduction to Multilevel Modeling

19.1 Introduction

• Data with a hierarchical or clustered structure


• Some examples of hierarchies:
– Longitudinal data
level 1: occasions, level 2: subjects
– Teratologic data
level 1: offsprings, level 2: litters
– Education sciences
level 1: students, level 2: classrooms, level 3 : schools

374
CHAPTER 19. INTRODUCTION TO MULTILEVEL MODELING 375

• Units in a cluster are more alike than units from other


clusters (correlation)
⇒ inference is misleading if the hierarchical structure
is ignored.
• Multilevel modeling acknowledges for clustering
through the sharing of latent variables (random
effects).
• These latent variables can be introduced at any level
in the hierarchy.
• Some references:
– Random Coefficient Models, Longford N. (1993)
– Multilevel Statistical Models, Goldstein H. (1995)
– Introducing Multilevel Modeling, Kreft I. & De Leeuw J. (1998)
– Multilevel Analysis, Snijders T. & Bosker R. (2000)

• On the web, see the Multilevel Models Project page at


http://www.ioe.ac.uk/multilevel
CHAPTER 19. INTRODUCTION TO MULTILEVEL MODELING 376

19.2 Multilevel Model Formulation

• We illustrate the general structure using a 3-level


model:

Y = Xβ + Z (3)v + Z (2)u + Z (1)e,


or
q3 (3) q2 (2) q1 (1)
Yijk = Xijk β+ Zhijk vhi+ Zhijk uhij + Zhijk ehijk ,
h=0 h=0 h=0

• where:
– Ωe = cov[eijk ],
– Ωu = cov[uij ],
– Ωv = cov[vi].
CHAPTER 19. INTRODUCTION TO MULTILEVEL MODELING 377

Assumptions:
– Level 1 residuals are independent across level 1
units and are N (0, V3(1)), where V3(1) is diagonal
2 (1)T (1)
with elements σeijk= Zijk ΩeZijk .
– Level 2 residuals are independent across level 2
units and are N (0, V3(2)), where V3(2) is block-
(2)T (2)
diagonal with blocks V3(2)ij = Zijk ΩuZijk .
– Level 3 residuals are independent across level 3
units and are N (0, V3(3)), where V3(3) is block-
(3)T (3)
diagonal with blocks V3(3)i = Zijk Ωv Zijk .
• Thus, cov[Y ] is block-diagonal with ith block given
by:
  2
V3i = V3(3)i + V3(2)ij + σeijk .
j j,k
CHAPTER 19. INTRODUCTION TO MULTILEVEL MODELING 378

19.3 Estimation Procedure

• Multilevel data sets are typically big


⇒ need for efficient estimation methods !
• Iterative Generalized Least Squares (IGLS) algorithm:
CHAPTER 19. INTRODUCTION TO MULTILEVEL MODELING 379

19.3.1 IGLS Procedure

• Suppose we know the value of all (co)variance


parameters. Then the usual GLS estimation procedure
can be applied to estimate the fixed coefficients:
β̂ = (X T V −1X)−1X T V −1Y.

• Suppose we know the value of β. We can form the


cross-product matrix of residuals Ỹ Ỹ T , with
Ỹ = Y − Xβ.
• If Y ∗ denotes Ỹ Ỹ T , we have E[Y ∗] = V .
• If Y ∗∗ = Y  Y = vec(Ỹ Ỹ T ), then E[Y ∗∗] can be
written as Z ∗θ, where θ comprises all (co)variance
parameters and Z ∗ is a suitable design matrix.
• The vector θ can then be estimated using standard
GLS estimation:
−1 −1
θ̂ = (Z ∗ V ∗ Z ∗)−1Z ∗ V ∗ Y ∗∗,
T T

with V ∗ = V 
V.
CHAPTER 19. INTRODUCTION TO MULTILEVEL MODELING 380

19.3.2 Remarks

• Starting with initial values (OLS estimates for β), the


IGLS algorithm alternates between the random and
fixed parameter estimation until the procedure
converges.
• Note:
– The IGLS algorithm converges to ML estimates.
– The IGLS algorithm can be modified to mimick
REML estimation (by taking the sampling
variation of β̂ into account).
CHAPTER 19. INTRODUCTION TO MULTILEVEL MODELING 381

19.4 Illustration of the IGLS Algorithm

Consider the following simple 2-level model:

yij = β0 + β1 xij + u0i + e0ij .

• β̂ = (X T V −1 X)−1 X T V −1 Y, with
   
 1 x11   y11 
   
   

 1 x12 


 y12 

X= 
 .. .. ,

Y = 
 .. .

 . .   . 
   
   
1 xmnm ymnm

• Calculate ỹij = yij − βˆ0 − βˆ1 xij .


• We can write:
       
2 2 2
 ỹ11   σu0 + σe0   1   1 
       
   2     

 ỹ11 ỹ12 


 σu0 
 2 
 1 
 2 
 0 


 ..  
= 
 .. 

+R = σ0u 
 .. 

+ σe0 
 .. 

+R
 .   .   .   . 
       
       
2 2 2
ỹmnm σu0 + σe0 1 1

where R is a residual vector.


T −1 T −1
• θ̂ = (Z ∗ V ∗ Z ∗ )−1 Z ∗ V ∗ Y ∗∗ with
   
2
 1 1   ỹ11 
   
   

 1 0 


 ỹ11 ỹ12 

Z∗ = 
 .. ,

Y ∗∗ = 
 ..  
.
 .   .  
  
   
2
1 1 ỹmn m
CHAPTER 19. INTRODUCTION TO MULTILEVEL MODELING 382

19.5 Example

• GROWTH data set (2-level model)


• Model fit with MLwiN (see MLwiN homepage at
http://www.ioe.ac.uk/mlwin)
• Data:

• Covariates:
– SEX: xi = 1 for boys and 0 for girls
– AGE: 8, 10, 12, and 14
CHAPTER 19. INTRODUCTION TO MULTILEVEL MODELING 383

• Model:
Yij = β0+β01xi+β10tj (1−xi)+β11tj xi+b0i+b1itj +εij ,

Number Effect Fixed Level 2 Level 1


0 Intercept β0 b0i εij
1 Male β01
2 Female∗Age β10
3 Male∗Age β11
4 Age b1i
CHAPTER 19. INTRODUCTION TO MULTILEVEL MODELING 384
CHAPTER 19. INTRODUCTION TO MULTILEVEL MODELING 385

• Predicted means:

• Predicted (E-B) individual profiles:


CHAPTER 19. INTRODUCTION TO MULTILEVEL MODELING 386

19.6 Multilevel Models for Discrete


Response Data

• Model formulation (binary data):


– logit(πij ) = XijT β + ZijT ui

– yij = πij + eij vij , vij = πij (1 − πij )/nij
– Take eij such that yij ∼ Bin(πij , nij ), with
var(eij ) = 1.
• Parameter estimation:
– ML will be hardly feasible in general
⇒ approximate methods have been proposed to
avoid numerical integration
– Some references:
∗ MQL: Goldstein (1991)
∗ PQL: Breslow & Clayton (1993), Wolfinger & O’Connell (1993)
∗ PQL2: Goldstein & Rasbash (1996)
CHAPTER 19. INTRODUCTION TO MULTILEVEL MODELING 387

19.7 MQL/PQL Procedure

• Use first order Taylor expansion for the fixed and


random part of the mean function about Ht:
– MQL: Ht = XijT β t (fixed part predictor)
µij = f (Ht) + XijT (β − β t)f (Ht) + ZijT uif (Ht)
– PQL: Ht = XijT β t + ZijT ûi (current predicted
value)
µij = f (Ht)+XijT (β−β t)f (Ht)+ZijT (ui−ûi)f (Ht)
• Rewrite as a linear model:
Yij − f (Ht) + XijT β t [+f (Ht)ZijT ûi] =
f (Ht)XijT β + f (Ht)ZijT ui + eij vij .

• Update the fixed and random parameters as in the


IGLS algorithm.
• Iterate until convergence.
• MQL2/PQL2 procedures: further add second-order
terms in the above Taylor expansion.
CHAPTER 19. INTRODUCTION TO MULTILEVEL MODELING 388

• Pros:
– The algorithm is quick and efficient (compared to
ML).
– Allows to estimate an overdispersion parameter
(since the algorithm iteratively fits linear models).
Just write


yij = πij + eij vij , vij = πij (1 − πij )/nij


with var(eij ) = σe2.
• Cons:
– MQL/PQL gives biased estimates (downward),
mostly for variance parameters !!
– The bias is worst for binary data
– Bias increases with increasing variance components
– Bias increases with decreasing cluster size
– PQL2 less biased
– Convergence problems are common (especially
with PQL2 procedure).
CHAPTER 19. INTRODUCTION TO MULTILEVEL MODELING 389

Note

• PQL/MQL can be fitted by the SAS macro


GLIMMIX.
• MQL, PQL, MQL2 and PQL2 can be fitted in the
MLwiN package.
Chapter 20

The Use of SPlus

20.1 Fitting Mixed Models Using SPlus


SPlus provides various ways to estimate mixed models. On the one hand, the built-in functions
lme() for linear mixed-effects models can be used. Note that there is a companion function for
nonlinear mixed-effects models, nmle(). These functions are based on work by Lindstrom and
Bates (1988), Laird and Ware (1982), Box, Jenkins, and Reinsel (1994), and Davidian and
Giltinan (1995).

Figure 20.1: Growth Data. Predicted individual profiles.

We will use the growth data to illustrate the built-in function mle(). SPlus Version 4.5 is used.
Apart from the references mentioned earlier which give the theoretical underpinning, there is

390
CHAPTER 20. THE USE OF SPLUS 391

ample documentation within SPlus. The on-line manual provides a 53-page discussion of linear
and nonlinear mixed-effects models. The function lme() is generic. The on-line help system of
SPlus provides a brief account of the syntax of this generic function. Methods functions are
being developed for specific classes of objects. The methods function lme.formula() comes
with ample documentation.

Let us discuss the main arguments:

Fixed effects. The structure is specified by means of the fixed argument, using standard
formulas.

Random effects. The random-effects structure is specified through random. Additional


arguments to tune the random-effects model are re.block (describing the blocking
structure), re.structure (specifying the form of the D matrix), and re.paramtr
(specifying how the D matrix is internally parameterized). The latter argument is
included to improve numerical stability and to ensure that the resulting D matrix is
positive define. Values of this argument refer to the Cholesky decomposition, the matrix
logarithm, and several others.

Serial correlation. This structure is defined by means of the argument serial.structure.


In the case that a serial correlation structure depending on time is assumed, the
arguments serial.covariate and serial. covariate.transformation can be used
to specify this aspect of the serial process.

Residual variance. The residual variance function is defined by means of var.function.


Fine-tuning can be done using var.covariate and var.estimate (indicating whether
the variance parameters are to be estimated or to be kept fixed at their initial values).

Clusters. The clusters (subjects, units, etc.) are defined using cluster.

Method of estimation. Both maximum likelihood and REML are provided. The user’s
preference can be specified by means of the argument est.method.

Other tools include subsetting, specifying the action to be undertaken on missing data, and
control over the estimation algorithm.

Let us apply the function lme.formula() to fit Model 6 to the growth data.

The following program can be used.

my.lme <- lme.formula(


fixed = MEASURE ~ 1 + MALE + MALEAGE + FEMAGE,
random = ~ 1 + AGE,
CHAPTER 20. THE USE OF SPLUS 392

cluster = ~ IDNR,
data = growth5.df,
re.structure = "unstructured",
na.action = "na.omit",
est.method = "ML")

Printing the object my.lme produces

Call:
Fixed: MEASURE ~ 1 + MALE + MALEAGE + FEMAGE
Random: ~ 1 + AGE
Cluster: ~ (IDNR)
Data: growth5.df

Variance/Covariance Components Estimate(s):

Structure: unstructured
Parametrization: matrixlog
Standard Deviation(s) of Random Effect(s)
(Intercept) AGE
2.134752 0.1541473
Correlation of Random Effects
(Intercept)
AGE -0.6025632

Cluster Residual Variance: 1.716206

Fixed Effects Estimate(s):


(Intercept) MALE MALEAGE FEMAGE
17.37273 -1.032102 0.784375 0.4795455

Number of Observations: 108


Number of Clusters: 27

Although the above output is rather brief, one can obtain a more extensive summary:

> my.lme.2 <- summary(my.lme)


> my.lme.2

Call:
Fixed: MEASURE ~ 1 + MALE + MALEAGE + FEMAGE
CHAPTER 20. THE USE OF SPLUS 393

Random: ~ 1 + AGE
Cluster: ~ (IDNR)
Data: growth5.df

Estimation Method: ML
Convergence at iteration: 6
Log-likelihood: -213.903
AIC: 443.806
BIC: 465.263

Variance/Covariance Components Estimate(s):


Structure: unstructured
Parametrization: matrixlog
Standard Deviation(s) of Random Effect(s)
(Intercept) AGE
2.134752 0.1541473
Correlation of Random Effects
(Intercept)
AGE -0.6025632

Cluster Residual Variance: 1.716206

Fixed Effects Estimate(s):


Value Approx. Std.Error z ratio(C)
(Intercept) 17.3727273 1.18203467 14.6973077
MALE -1.0321023 1.53550808 -0.6721568
MALEAGE 0.7843750 0.08275405 9.4783886
FEMAGE 0.4795455 0.09980513 4.8048175
CHAPTER 20. THE USE OF SPLUS 394

Conditional Correlation(s) of Fixed Effects Estimates


(Intercept) MALE MALEAGE
MALE -7.698004e-001
MALEAGE 6.198039e-016 -5.617972e-001
FEMAGE -8.801671e-001 6.775530e-001 -1.691642e-016

Random Effects (Conditional Modes):


(Intercept) AGE
1 -0.68278894 -0.039972872
2 -0.45926352 0.071886460
3 -0.03109489 0.093020178
4 1.61182535 0.030832363
5 0.43850471 -0.043000835
.....
25 0.50935427 -0.055453935
26 -0.10573027 0.083999487
27 -0.89462307 -0.076992100

Standardized Population-Average Residuals:


Min Q1 Med Q3 Max
-3.335979 -0.4153858 0.01039114 0.4916851 3.858188

Number of Observations: 108


Number of Clusters: 27

The estimates and standard errors coincide with those obtained with, for example, MLwiN. This
is immediately clear for the fixed-effects estimates, their standard errors, and the residual
variance. The components of the D matrix have to be derived from the standard deviations and
correlation of the random effects:

d11 = 2.1347522 = 4.577,


d12 = (−0.6025632)(2.134752)(0.1541473) = −0.198,
d22 = 0.15414732 = 0.024.

As is the case with MLwiN, SPlus in general, and lme() in particular, have extensive graphical
capabilities.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy