0% found this document useful (0 votes)
31 views91 pages

Linear Model and Extensions Peng Ding Instant Download

The document is a comprehensive resource on linear models and their extensions, authored by Peng Ding. It covers various topics including ordinary least squares, statistical inference, and model fitting, providing theoretical insights along with practical applications using R. Additionally, it includes exercises and homework problems to reinforce learning.

Uploaded by

madinanott
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views91 pages

Linear Model and Extensions Peng Ding Instant Download

The document is a comprehensive resource on linear models and their extensions, authored by Peng Ding. It covers various topics including ordinary least squares, statistical inference, and model fitting, providing theoretical insights along with practical applications using R. Additionally, it includes exercises and homework problems to reinforce learning.

Uploaded by

madinanott
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

Linear Model And Extensions Peng Ding download

https://ebookbell.com/product/linear-model-and-extensions-peng-
ding-54747274

Explore and download more ebooks at ebookbell.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Generalized Linear Models And Extensions 4th Edition James W Hardin

https://ebookbell.com/product/generalized-linear-models-and-
extensions-4th-edition-james-w-hardin-7239834

Loglinear Models Extensions And Applications Aleksandr Aravkin

https://ebookbell.com/product/loglinear-models-extensions-and-
applications-aleksandr-aravkin-42997960

The Linear Model And Hypothesis A General Unifying Theory 1st Edition
George Seber Auth

https://ebookbell.com/product/the-linear-model-and-hypothesis-a-
general-unifying-theory-1st-edition-george-seber-auth-5235770

Spectral Mixture For Remote Sensing Linear Model And Applications 1st
Ed Yosio Edemir Shimabukuro

https://ebookbell.com/product/spectral-mixture-for-remote-sensing-
linear-model-and-applications-1st-ed-yosio-edemir-shimabukuro-7320754
Linear Model Theory Exercises And Solutions 1st Ed Dale L Zimmerman

https://ebookbell.com/product/linear-model-theory-exercises-and-
solutions-1st-ed-dale-l-zimmerman-22503060

Linear Model Theory With Examples And Exercises 1st Ed Dale L


Zimmerman

https://ebookbell.com/product/linear-model-theory-with-examples-and-
exercises-1st-ed-dale-l-zimmerman-22503054

Linear Model Theory Univariate Multivariate And Mixed Models 1st


Edition Keith E Muller

https://ebookbell.com/product/linear-model-theory-univariate-
multivariate-and-mixed-models-1st-edition-keith-e-muller-2447236

The Social Aspects Of Environmental And Climate Change Institutional


Dynamics Beyond A Linear Model E C H Keskitalo

https://ebookbell.com/product/the-social-aspects-of-environmental-and-
climate-change-institutional-dynamics-beyond-a-linear-model-e-c-h-
keskitalo-38172174

From Data To Decisions In Music Education Research Data Analytics And


The General Linear Model Using R 1st Edition Brian C Wesolowski

https://ebookbell.com/product/from-data-to-decisions-in-music-
education-research-data-analytics-and-the-general-linear-model-
using-r-1st-edition-brian-c-wesolowski-51448806
arXiv:2401.00649v1 [stat.ME] 1 Jan 2024

Peng Ding

Linear Model and Extensions


To students and readers
who are interested in linear models
Contents

Acronyms xiii

Symbols xv

Useful R packages xvii

Preface xix

I Introduction 1
1 Motivations for Statistical Models 3
1.1 Data and statistical models . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Why linear models? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Ordinary Least Squares (OLS) with a Univariate Covariate 7


2.1 Univariate ordinary least squares . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Final comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

II OLS and Statistical Inference 11


3 OLS with Multiple Covariates 13
3.1 The OLS formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 The geometry of OLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 The projection matrix from OLS . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 The Gauss–Markov Model and Theorem 21


4.1 Gauss–Markov model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Properties of the OLS estimator . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Variance estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4 Gauss–Markov Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.5 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Normal Linear Model: Inference and Prediction 29


5.1 Joint distribution of the OLS coefficient and variance estimator . . . . . . 29
5.2 Pivotal quantities and statistical inference . . . . . . . . . . . . . . . . . . 30
5.2.1 Scalar parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2.2 Vector parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3 Prediction based on pivotal quantities . . . . . . . . . . . . . . . . . . . . . 34
5.4 Examples and R implementation . . . . . . . . . . . . . . . . . . . . . . . . 35
5.4.1 Univariate regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.4.2 Multivariate regression . . . . . . . . . . . . . . . . . . . . . . . . . . 35

v
vi Contents

5.5 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6 Asymptotic Inference in OLS: the Eicker–Huber–White (EHW) robust


standard error 41
6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.1.1 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.1.2 Goal of this chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.2 Consistency of OLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.3 Asymptotic Normality of the OLS estimator . . . . . . . . . . . . . . . . . 44
6.4 Eicker–Huber–White standard error . . . . . . . . . . . . . . . . . . . . . . 45
6.4.1 Sandwich variance estimator . . . . . . . . . . . . . . . . . . . . . . 45
6.4.2 Other “HC” standard errors . . . . . . . . . . . . . . . . . . . . . . . 46
6.4.3 Special case with homoskedasticity . . . . . . . . . . . . . . . . . . . 47
6.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.5.1 LaLonde experimental data . . . . . . . . . . . . . . . . . . . . . . . 48
6.5.2 Data from King and Roberts (2015) . . . . . . . . . . . . . . . . . . 49
6.5.3 Boston housing data . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.6 Final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.7 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

III Interpretation of OLS Based on Partial Regressions 57


7 The Frisch–Waugh–Lovell Theorem 59
7.1 Long and short regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.2 FWL theorem for the regression coefficients . . . . . . . . . . . . . . . . . . 59
7.3 FWL theorem for the standard errors . . . . . . . . . . . . . . . . . . . . . 62
7.4 Gram–Schmidt orthogonalization, QR decomposition, and the computation
of OLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.5 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

8 Applications of the Frisch–Waugh–Lovell Theorem 69


8.1 Centering regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.2 Partial correlation coefficient and Simpson’s paradox . . . . . . . . . . . . 71
8.3 Hypothesis testing and analysis of variance . . . . . . . . . . . . . . . . . . 74
8.4 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

9 Cochran’s Formula and Omitted-Variable Bias 81


9.1 Cochran’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9.2 Omitted-variable bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
9.3 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

IV Model Fitting, Checking, and Misspecification 87


10 Multiple Correlation Coefficient 89
10.1 Equivalent definitions of R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10.2 R2 and the F statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.3 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
10.4 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Contents vii

11 Leverage Scores and Leave-One-Out Formulas 95


11.1 Leverage scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
11.2 Leave-one-out formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
11.3 Applications of the leave-one-out formulas . . . . . . . . . . . . . . . . . . 99
11.3.1 Gauss updating formula . . . . . . . . . . . . . . . . . . . . . . . . . 99
11.3.2 Outlier detection based on residuals . . . . . . . . . . . . . . . . . . 100
11.3.3 Jackknife . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
11.4 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

12 Population Ordinary Least Squares and Inference with a Misspecified


Linear Model 107
12.1 Population OLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
12.2 Population FWL theorem and Cochran’s formula . . . . . . . . . . . . . . 109
12.3 Population R2 and partial correlation coefficient . . . . . . . . . . . . . . . 110
12.4 Inference for the population OLS . . . . . . . . . . . . . . . . . . . . . . . . 112
12.4.1 Inference with the EHW standard errors . . . . . . . . . . . . . . . . 112
12.5 To model or not to model? . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
12.5.1 Population OLS and the restricted mean model . . . . . . . . . . . . 113
12.5.2 Anscombe’s Quartet: the importance of graphical diagnostics . . . . 114
12.5.3 More on residual plots . . . . . . . . . . . . . . . . . . . . . . . . . . 117
12.6 Conformal prediction based on exchangeability . . . . . . . . . . . . . . . . 118
12.7 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

V Overfitting, Regularization, and Model Selection 127


13 Perils of Overfitting 129
13.1 David Freedman’s simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 129
13.2 Variance inflation factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
13.3 Bias-variance trade-off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
13.4 Model selection criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
13.4.1 RSS, R2 and adjusted R2 . . . . . . . . . . . . . . . . . . . . . . . . 133
13.4.2 Information criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
13.4.3 Cross-validation (CV) . . . . . . . . . . . . . . . . . . . . . . . . . . 134
13.5 Best subset and forward/backward selection . . . . . . . . . . . . . . . . . 135
13.6 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

14 Ridge Regression 141


14.1 Introduction to ridge regression . . . . . . . . . . . . . . . . . . . . . . . . 141
14.2 Ridge regression via the SVD of X . . . . . . . . . . . . . . . . . . . . . . . 143
14.3 Statistical properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
14.4 Selection of the tuning parameter . . . . . . . . . . . . . . . . . . . . . . . 145
14.4.1 Based on parameter estimation . . . . . . . . . . . . . . . . . . . . . 145
14.4.2 Based on prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
14.5 Computation of ridge regression . . . . . . . . . . . . . . . . . . . . . . . . 147
14.6 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
14.6.1 Uncorrelated covariates . . . . . . . . . . . . . . . . . . . . . . . . . 147
14.6.2 Correlated covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
14.7 Further commments on OLS, ridge, and PCA . . . . . . . . . . . . . . . . 149
14.8 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
viii Contents

15 Lasso 155
15.1 Introduction to the lasso . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
15.2 Comparing the lasso and ridge: a geometric perspective . . . . . . . . . . . 155
15.3 Computing the lasso via coordinate descent . . . . . . . . . . . . . . . . . . 158
15.3.1 The soft-thresholding lemma . . . . . . . . . . . . . . . . . . . . . . 158
15.3.2 Coordinate descent for the lasso . . . . . . . . . . . . . . . . . . . . 158
15.4 Example: comparing OLS, ridge and lasso . . . . . . . . . . . . . . . . . . . 159
15.5 Other shrinkage estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
15.6 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

VI Transformation and Weighting 167


16 Transformations in OLS 169
16.1 Transformation of the outcome . . . . . . . . . . . . . . . . . . . . . . . . . 169
16.1.1 Log transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
16.1.2 Box–Cox transformation . . . . . . . . . . . . . . . . . . . . . . . . . 170
16.2 Transformation of the covariates . . . . . . . . . . . . . . . . . . . . . . . . 172
16.2.1 Polynomial, basis expansion, and generalized additive model . . . . 172
16.2.2 Regression discontinuity and regression kink . . . . . . . . . . . . . . 174
16.3 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

17 Interaction 179
17.1 Two binary covariates interact . . . . . . . . . . . . . . . . . . . . . . . . . 179
17.2 A binary covariate interacts with a general covariate . . . . . . . . . . . . . 180
17.2.1 Treatment effect heterogeneity . . . . . . . . . . . . . . . . . . . . . 180
17.2.2 Johnson–Neyman technique . . . . . . . . . . . . . . . . . . . . . . . 180
17.2.3 Blinder–Oaxaca decomposition . . . . . . . . . . . . . . . . . . . . . 180
17.2.4 Chow test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
17.3 Difficulties of intereaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
17.3.1 Removable interaction . . . . . . . . . . . . . . . . . . . . . . . . . . 182
17.3.2 Main effect in the presence of interaction . . . . . . . . . . . . . . . 182
17.3.3 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
17.4 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

18 Restricted OLS 187


18.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
18.2 Algebraic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
18.3 Statistical inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
18.4 Final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
18.5 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

19 Weighted Least Squares 193


19.1 Generalized least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
19.2 Weighted least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
19.3 WLS motivated by heteroskedasticity . . . . . . . . . . . . . . . . . . . . . 196
19.3.1 Feasible generalized least squares . . . . . . . . . . . . . . . . . . . . 196
19.3.2 Aggregated data and ecological regression . . . . . . . . . . . . . . . 197
19.4 WLS with other motivations . . . . . . . . . . . . . . . . . . . . . . . . . . 199
19.4.1 Local linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . 199
19.4.2 Regression with survey data . . . . . . . . . . . . . . . . . . . . . . . 200
19.5 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Contents ix

VII Generalized Linear Models 207


20 Logistic Regression for Binary Outcomes 209
20.1 Regression with binary outcomes . . . . . . . . . . . . . . . . . . . . . . . . 209
20.1.1 Linear probability model . . . . . . . . . . . . . . . . . . . . . . . . . 209
20.1.2 General link functions . . . . . . . . . . . . . . . . . . . . . . . . . . 209
20.2 Maximum likelihood estimator of the logistic model . . . . . . . . . . . . . 211
20.3 Statistics with the logit model . . . . . . . . . . . . . . . . . . . . . . . . . 214
20.3.1 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
20.3.2 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
20.4 More on interpretations of the coefficients . . . . . . . . . . . . . . . . . . . 216
20.5 Does the link function matter? . . . . . . . . . . . . . . . . . . . . . . . . . 218
20.6 Extensions of the logistic regression . . . . . . . . . . . . . . . . . . . . . . 219
20.6.1 Penalized logistic regression . . . . . . . . . . . . . . . . . . . . . . . 219
20.6.2 Case-control study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
20.7 Other model formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
20.7.1 Latent linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
20.7.2 Inverse model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
20.8 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

21 Logistic Regressions for Categorical Outcomes 227


21.1 Multinomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
21.2 Multinomial logistic model for nominal outcomes . . . . . . . . . . . . . . . 228
21.2.1 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
21.2.2 MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
21.3 A latent variable representation for the multinomial logistic regression . . . 230
21.4 Proportional odds model for ordinal outcomes . . . . . . . . . . . . . . . . 232
21.5 A case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
21.5.1 Binary logistic for the treatment . . . . . . . . . . . . . . . . . . . . 234
21.5.2 Binary logistic for the outcome . . . . . . . . . . . . . . . . . . . . . 235
21.5.3 Multinomial logistic for the outcome . . . . . . . . . . . . . . . . . . 236
21.5.4 Proportional odds logistic for the outcome . . . . . . . . . . . . . . . 237
21.6 Discrete choice models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
21.6.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
21.6.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
21.6.3 More comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
21.7 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

22 Regression Models for Count Outcomes 245


22.1 Some random variables for counts . . . . . . . . . . . . . . . . . . . . . . . 245
22.1.1 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
22.1.2 Negative-Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
22.1.3 Zero-inflated count distributions . . . . . . . . . . . . . . . . . . . . 247
22.2 Regression models for counts . . . . . . . . . . . . . . . . . . . . . . . . . . 248
22.2.1 Poisson regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
22.2.2 Negative-Binomial regression . . . . . . . . . . . . . . . . . . . . . . 250
22.2.3 Zero-inflated regressions . . . . . . . . . . . . . . . . . . . . . . . . . 251
22.3 A case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
22.3.1 Linear, Poisson, and Negative-Binomial regressions . . . . . . . . . . 252
22.3.2 Zero-inflated regressions . . . . . . . . . . . . . . . . . . . . . . . . . 253
22.4 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
x Contents

23 Generalized Linear Models: A Unification 259


23.1 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
23.1.1 Exponential family . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
23.1.2 Generalized linear model . . . . . . . . . . . . . . . . . . . . . . . . . 262
23.2 MLE for GLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
23.3 Other GLMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
23.4 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

24 From Generalized Linear Models to Restricted Mean Models: the Sand-


wich Covariance Matrix 267
24.1 Restricted mean model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
24.2 Sandwich covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 268
24.3 Applications of the sandwich standard errors . . . . . . . . . . . . . . . . . 270
24.3.1 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
24.3.2 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
24.3.2.1 An application . . . . . . . . . . . . . . . . . . . . . . . . . 271
24.3.2.2 A misspecified logistic regression . . . . . . . . . . . . . . . 271
24.3.3 Poisson regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
24.3.3.1 A correctly specified Poisson regression . . . . . . . . . . . 272
24.3.3.2 A Negative-Binomial regression model . . . . . . . . . . . . 272
24.3.3.3 Misspecification of the conditional mean . . . . . . . . . . . 273
24.3.4 Poisson regression for binary outcomes . . . . . . . . . . . . . . . . . 273
24.3.5 How robust are the robust standard errors? . . . . . . . . . . . . . . 274
24.4 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

25 Generalized Estimating Equation for Correlated Multivariate Data 275


25.1 Examples of correlated data . . . . . . . . . . . . . . . . . . . . . . . . . . 275
25.1.1 Longitudinal data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
25.1.2 Clustered data: a neuroscience experiment . . . . . . . . . . . . . . . 275
25.1.3 Clustered data: a public health intervention . . . . . . . . . . . . . . 276
25.2 Marginal model and the generalized estimating equation . . . . . . . . . . 277
25.3 Statistical inference with GEE . . . . . . . . . . . . . . . . . . . . . . . . . 279
25.3.1 Computation using the Gauss–Newton method . . . . . . . . . . . . 279
25.3.2 Asymptotic inference . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
25.3.3 Implementation: choice of the working covariance matrix . . . . . . . 280
25.4 A special case: cluster-robust standard error . . . . . . . . . . . . . . . . . 281
25.4.1 OLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
25.4.2 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
25.5 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
25.5.1 Clustered data: a neuroscience experiment . . . . . . . . . . . . . . . 283
25.5.2 Clustered data: a public health intervention . . . . . . . . . . . . . . 284
25.5.3 Longitudinal data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
25.6 Critiques on the key assumptions . . . . . . . . . . . . . . . . . . . . . . . 286
25.6.1 Assumption (25.4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
25.6.2 Assumption (25.5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
25.6.3 Explanation and prediction . . . . . . . . . . . . . . . . . . . . . . . 288
25.7 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

VIII Beyond Modeling the Conditional Mean 291


Contents xi

26 Quantile Regression 293


26.1 From the mean to the quantile . . . . . . . . . . . . . . . . . . . . . . . . . 293
26.2 From the conditional mean to conditional quantile . . . . . . . . . . . . . . 296
26.3 Sample regression quantiles . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
26.3.1 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
26.3.2 Asymptotic inference . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
26.4 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
26.4.1 Sample quantiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
26.4.2 OLS versus LAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
26.5 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
26.5.1 Parents’ and children’s heights . . . . . . . . . . . . . . . . . . . . . 301
26.5.2 U.S. wage structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
26.6 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
26.7 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

27 Modeling Time-to-Event Outcomes 307


27.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
27.1.1 Survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
27.1.2 Duration analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
27.2 Time-to-event data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
27.3 Kaplan–Meier survival curve . . . . . . . . . . . . . . . . . . . . . . . . . . 312
27.4 Cox model for time-to-event outcome . . . . . . . . . . . . . . . . . . . . . 315
27.4.1 Cox model and its interpretation . . . . . . . . . . . . . . . . . . . . 315
27.4.2 Partial likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
27.4.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
27.4.4 Log-rank test as a score test from Cox model . . . . . . . . . . . . . 320
27.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
27.5.1 Stratified Cox model . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
27.5.2 Clustered Cox model . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
27.5.3 Penalized Cox model . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
27.6 Critiques on survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 325
27.7 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

IX Appendices 327
A Linear Algebra 329
A.1 Basics of vectors and matrices . . . . . . . . . . . . . . . . . . . . . . . . . 329
A.2 Vector calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
A.3 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339

B Random Variables 341


B.1 Some important univariate random variables . . . . . . . . . . . . . . . . . 341
B.1.1 Normal, χ2 , t and F . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
B.1.2 Beta–Gamma duality . . . . . . . . . . . . . . . . . . . . . . . . . . 342
B.1.3 Exponential, Laplace, and Gumbel distributions . . . . . . . . . . . 343
B.2 Multivariate distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
B.3 Multivariate Normal and its properties . . . . . . . . . . . . . . . . . . . . 347
B.4 Quadratic forms of random vectors . . . . . . . . . . . . . . . . . . . . . . 348
B.5 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
xii Contents

C Limiting Theorems and Basic Asymptotics 353


C.1 Convergence in probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
C.2 Convergence in distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
C.3 Tools for proving convergence in probability and distribution . . . . . . . . 356

D M-Estimation and MLE 359


D.1 M-estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
D.2 Maximum likelihood estimator . . . . . . . . . . . . . . . . . . . . . . . . . 362
D.3 Homework problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

Bibliography 367
Acronyms

I try hard to avoid using acronyms to reduce the unnecessary burden for reading. The
following are standard and will be used repeatedly.
ANOVA (Fisher’s) analysis of variance
CLT central limit theorem
CV cross-validation
EHW Eicker–Huber–White (robust covariance matrix or standard error)
FWL Frisch–Waugh–Lovell (theorem)
GEE generalized estimating equation
GLM generalized linear model
HC heteroskedasticity-consistent (covariance matrix or standard error)
IID independent and identically distributed
LAD least absolute deviations
lasso least absolute shrinkage and selection operator
MLE maximum likelihood estimate
OLS ordinary least squares
RSS residual sum of squares
WLS weighted least squares

xiii
Symbols

All vectors are column vectors as in R unless stated otherwise. Let the superscript “t ” denote
the transpose of a vector or matrix.
a
∼ approximation in distribution
R the set of all real numbers
β regression coefficient
ε error term
H hat matrix H = X(X t X)−1 X t
hii leverage score: the (i, i)the element of the hat matrix H
In identity matrix of dimension n × n
xi covariate vector for unit i
X covariate matrix
Y outcome vector
yi outcome for unit i
independence and conditional independence

xv
Useful R packages

This book uses the following R packages and functions.


package function or data use
car hccm Eicker–Huber–White robust standard error
linearHypothesis testing linear hypotheses in linear models
foreign read.dta read stata data
gee gee Generalized estimating equation
HistData GaltonFamilies Galton’s data on parents’ and children’s heights
MASS lm.ridge ridge regression
glm.nb Negative-Binomial regression
glmnet cv.glmnet Lasso with cross-validation
mlbench BostonHousing Boston housing data
polr proportional odds logistic regression
Matching lalonde LaLonde data
nnet multinom Multinomial logistic regression
quantreg rq quantile regression
survival coxph Cox proportional hazards regression
survdiff log rank test
survfit Kaplan–Meier curve

xvii
Preface

The importance of studying the linear model


A central task in statistics is to use data to build models to make inferences about the
underlying data-generating processes or make predictions of future observations. Although
real problems are very complex, the linear model can often serve as a good approximation
to the true data-generating process. Sometimes, although the true data-generating process
is nonlinear, the linear model can be a useful approximation if we properly transform the
data based on prior knowledge. Even in highly nonlinear problems, the linear model can
still be a useful first attempt in the data analysis process.
Moreover, the linear model has many elegant algebraic and geometric properties. Under
the linear model, we can derive many explicit formulas to gain insights about various aspects
of statistical modeling. In more complicated models, deriving explicit formulas may be
impossible. Nevertheless, we can use the linear model to build intuition and make conjectures
about more complicated models.
Pedagogically, the linear model serves as a building block in the whole statistical train-
ing. This book builds on my lecture notes for a master’s level “Linear Model” course at
UC Berkeley, taught over the past eight years. Most students are master’s students in
statistics. Some are undergraduate students with strong technical preparations. Some are
Ph.D. students in statistics. Some are master’s or Ph.D. students in other departments.
This book requires the readers to have basic training in linear algebra, probability theory,
and statistical inference.

Recommendations for instructors


This book has twenty-seven chapters in the main text and four chapters as the appendices.
As I mentioned before, this book grows out of my teaching of “Linear Model” at UC Berke-
ley. In different years, I taught the course in different ways, and this book is a union of my
lecture notes over the past eight years. Below I make some recommendations for instruc-
tors based on my own teaching experience. Since UC Berkeley is on the semester system,
instructors on the quarter system should make some adjustments to my recommendations
below.

Version 1: a basic linear model course assuming minimal technical preparations


If you want to teach a basic linear model course without assuming strong technical prepara-
tions from the students, you can start with the appendices by reviewing basic linear algebra,
probability theory, and statistical inference. Then you can cover Chapters 2–17. If time per-
mits, you can consider covering Chapter 20 due to the importance of the logistic model for
binary data.

Version 2: an advanced linear model course assuming strong technical preparations


If you want to teach an advanced linear model course assuming strong technical preparations
from the students, you can start with the main text directly. When I did this, I asked

xix
xx Preface

my teaching assistants to review the appendices in the first two lab sessions and assigned
homework problems from the appendices to remind the students to review the background
materials. Then you can cover Chapters 2–24. You can omit Chapter 18 and some sections
in other chapters due to their technical complications. If time permits, you can consider
covering Chapter 25 due to the importance of the generalized estimating equation as well
as its byproduct called the “cluster-robust standard error”, which is important for many
social science applications. Furthermore, you can consider covering Chapter 27 due to the
importance of the Cox proportional hazards model.

Version 3: an advanced generalized linear models course


If you want to teach a course on generalized linear models, you can use Chapters 20–27.

Additional recommendations for readers and students


Readers and students can first read my recommendations for instructors above. In addition,
I have three other recommendations.

More simulation studies


This book contains some basic simulation studies. I encourage the readers to conduct more
intensive simulation studies to deepen their understanding of the theory and methods.

Practical data analysis


Box wrote wisely that “all models are wrong but some are useful.” The usefulness of models
strongly depends on the applications. When teaching “Linear Model”, I sometimes replaced
the final exam with the final project to encourage students to practice data analysis and
make connections between the theory and applications.

Homework problems
This book contains many homework problems. It is important to try some homework prob-
lems. Moreover, some homework problems contain useful theoretical results. Even if you do
not have time to figure out the details for those problems, it is helpful to at least read the
statements of the problems.

Omitted topics
Although “Linear Model” is a standard course offered by most statistics departments, it
is not entirely clear what we should teach as the field of statistics is evolving. Although
I made some suggestions to the instructors above, you may still feel that this book has
omitted some important topics related to the linear model.

Advanced econometric models


After the linear model, many econometric textbooks cover the instrumental variable models
and panel data models. For these more specialized topics, Wooldridge (2010) is a canonical
textbook.

Advanced biostatistics models


This book covers the generalized estimating equation in Chapter 25. For analyzing longitu-
dinal data, linear and generalized linear mixed effects models are powerful tools. Fitzmaurice
Preface xxi

et al. (2012) is a canonical textbook on applied longitudinal data analysis. This book also
covers the Cox proportional hazards model in Chapter 27. For more advanced methods for
survival analysis, Kalbfleisch and Prentice (2011) is a canonical textbook.

Causal inference
I do not cover causal inference in this book intentionally. To minimize the overlap of the
materials, I wrote another textbook on causal inference (Ding, 2023). However, I did teach a
version of “Linear Model” with a causal inference unit after introducing the basics of linear
model and logistic model. Students seemed to like it because of the connections between
statistical models and causal inference.

Features of the book


The linear model is an old topic in statistics. There are already many excellent textbooks
on the linear model. This book has the following features.

• This book provides an intermediate-level introduction to the linear model. It balances


rigorous proofs and heuristic arguments.
• This book provides not only theory but also simulation studies and case studies.
• This book provides the R code to replicate all simulation studies and case studies.

• This book covers the theory of the linear model related to not only social sciences but
also biomedical studies.
• This book provides homework problems with different technical difficulties. The solu-
tions to the problems are available to instructors upon request.

Other textbooks may also have one or two of the above features. This book has the above
features simultaneously. I hope that instructors and readers find these features attractive.

Acknowledgments
Many students at UC Berkeley made critical and constructive comments on early versions of
my lecture notes. As teaching assistants for my “Linear Model” course, Sizhu Lu, Chaoran
Yu, and Jason Wu read early versions of my book carefully and helped me to improve the
book a lot.
Professors Hongyuan Cao and Zhichao Jiang taught related courses based on an early
version of the book. They made very valuable suggestions.
I am also very grateful for the suggestions from Nianqiao Ju.
When I was a student, I took a linear model course based on Weisberg (2005). In my
early years of teaching, I used Christensen (2002) and Agresti (2015) as reference books.
I also sat in Professor Jim Powell’s econometrics courses and got access to his wonderful
lecture notes. They all heavily impacted my understanding and formulation of the linear
model.
If you identify any errors, please feel free to email me.
Part I

Introduction
1
Motivations for Statistical Models

1.1 Data and statistical models


A wide range of problems in statistics and machine learning have the data structure as
below:
Unit outcome/response covariates/features/predictors
i Y X1 X2 ··· Xp
1 y1 x11 x12 ··· x1p
2 y2 x21 x22 ··· x2p
.. .. .. .. ..
. . . . .
n yn xn1 xn2 ··· xnp
For each unit i, we observe the outcome/response of interest, yi , as well as p covari-
ates/features/predictors. We often use
 
y1
 y2 
Y = . 
 
 .. 
yn

to denote the n-dimensional outcome/response vector, and


 
x11 x12 · · · x1p
 x21 x22 · · · x2p 
X= .
 
.. .. 
 .. . . 
xn1 xn2 · · · xnp

to denote the n × p covariate/feature/predictor matrix, also called the design matrix. In


most cases, the first column of X contains constants 1s.
Based on the data (X, Y ) , we can ask the following questions:
(Q1) Describe the relationship between X and Y , i.e., their association or correlation. For
example, how is the patients’ average height related to the children’s average height?
How is one’s height related to one’s weight? How are one’s education and working
experience related to one’s income?
(Q2) Predict Y ∗ based on new data X ∗ . In particular, we want to use the current data
(X, Y ) to train a predictor, and then use it to predict future Y ∗ based on future X ∗ .
This is called supervised learning in the field of machine learning. For example, how
do we predict whether an email is spam or not based on the frequencies of the most
commonly occurring words and punctuation marks in the email? How do we predict
cancer patients’ survival time based on their clinical measures?

3
4 Linear Model and Extensions

(Q3) Estimate the causal effect of some components in X on Y . What if we change some
components of X? How do we measure the impact of the hypothetical intervention of
some components of X on Y ? This is a much harder question because most statistical
tools are designed to infer association, not causation. For example, the U.S. Food and
Drug Administration (FDA) approves drugs based on randomized controlled trials
(RCT) because RCTs are most credible to infer causal effects of drugs on health
outcomes. Economists are interested in evaluating the effect of a job training program
on employment and wages. However, this is a notoriously difficult problem with only
observational data.
The above descriptions are about generic X and Y , which can be many different types.
We often use different statistical models to capture the features of different types of data.
I give a brief overview of models that will appear in later parts of this book.
(T1) X and Y are univariate and continuous. In Francis Galton’s1 classic example, X is the
parents’ average height and Y is the children’s average height (Galton, 1886). Galton
derived the following formula:
σ̂y
y = ȳ + ρ̂ (x − x̄)
σ̂x
which is equivalent to
y − ȳ x − x̄
= ρ̂ , (1.1)
σ̂y σ̂x
where
n
X n
X
x̄ = n−1 xi , ȳ = n−1 yi
i=1 i=1
are the sample means,
n
X n
X
σ̂x2 = (n − 1)−1 (xi − x̄)2 , σ̂y2 = (n − 1)−1 (yi − ȳ)2
i=1 i=1

are the sample variances, and ρ̂ = σ̂xy /(σ̂x σ̂xy ) is the sample Pearson correlation
coefficient with the sample covariance
n
X
σ̂xy = (n − 1)−1 (xi − x̄)(yi − ȳ).
i=1

This is the famous formula of “regression towards mediocrity” or “regression towards


the mean”. Galton first introduced the terminology “regression.” Galton called regres-
sion because the relative deviation of the children’s average height is smaller than that
of the parents’ average height if |ρ̂| < 1. We will derive the above Galtonian formula
in Chapter 2. The name “regression” is widely used in statistics now. For instance,
we sometimes use “linear regression” interchangeably with “linear model”; we also
extend the name to “logistic regression” or “Cox regression” which will be discussed
in later chapters of this book.
(T2) Y univariate and continuous, and X multivariate of mixed types. In the R package
ElemStatLearn, the dataset prostate has an outcome of interest as the log of the prostate-
specific antigenlpsa and some potential predictors including the log cancer volume
lcavol, the log prostate weight lweight, age age, etc.

1 Who was Francis Galton? He was Charles Darwin’s nephew and was famous for his pioneer work in

statistics and for devising a method for classifying fingerprints that proved useful in forensic science. He
also invented the term eugenics, a field that causes a lot of controversies nowadays.
Motivations for Statistical Models 5

(T3) Y binary or indicator of two classes, and X multivariate of mixed types. For example,
in the R package wooldridge, the dataset mroz contains an outcome of interest being the
binary indicator for whether a woman was in the labor force in 1975, and some useful
covariates are

covariate name covariate meaning


kidslt6 number of kids younger than six years old
kidsge6 number of kids between six and eighteen years old
age age
educ years of education
husage husband’s age
huseduc husband’s years of education

(T4) Y categorical without ordering. For example, the choice of housing type, single-family
house, townhouse, or condominium, is a categorical variable.
(T5) Y categorical and ordered. For example, the final course evaluation at UC Berkeley
can take value in {1, 2, 3, 4, 5, 6, 7}. These numbers have clear ordering but they are
not the usual real numbers.
(T6) Y counts. For example, the number of times one went to the gym last week is a
non-negative integer representing counts.

(T7) Y time-to-event outcome. For example, in medical trials, a major outcome of interest
is the survival time; in labor economics, a major outcome of interest is the time to
find the next job. The former is called survival analysis in biostatistics and the latter
is called duration analysis in econometrics.
(T8) Y multivariate and correlated. In medical trials, the data are often longitudinal, mean-
ing that the patient’s outcomes are measured repeatedly over time. So each patient
has a multivariate outcome. In field experiments of public health and development
economics, the randomized interventions are often at the village level but the out-
come data are collected at the household level. So within villages, the outcomes are
correlated.

1.2 Why linear models?


Why do we study linear models if most real problems may have nonlinear structures? There
are important reasons.

(R1) Linear models are simple but non-trivial starting points for learning.
(R2) Linear models can provide insights because we can derive explicit formulas based on
elegant algebra and geometry.
(R3) Linear models can handle nonlinearity by incorporating nonlinear terms, for example,
X can contain the polynomials or nonlinear transformations of the original covariates.
In statistics, “linear” often means linear in parameters, not necessarily in covariates.

(R4) Linear models can be good approximations to nonlinear data-generating processes.


6 Linear Model and Extensions

(R5) Linear models are simpler than nonlinear models, but they do not necessarily perform
worse than more complicated nonlinear models. We have finite data so we cannot fit
arbitrarily complicated models.
If you are interested in nonlinear models, you can take another machine learning course.
2
Ordinary Least Squares (OLS) with a Univariate
Covariate

2.1 Univariate ordinary least squares


Figure 2.1 shows the scatterplot of Galton’s dataset which can be found in the R package
HistData as GaltonFamilies. In this dataset, father denotes the height of the father and mother
denotes the height of the mother. The x-axis denotes the mid-parent height, calculated as
(father + 1.08*mother)/2, and the y-axis denotes the height of a child.

Galton's regression
80

75 ● ● ● ●


● ● ●● ● ● ●

● ●
● ● ●
● ● ● ● ●●● ●
●● ● ● ● ● ● ●● ●●
● ●
● ● ●

● ● ● ● ● ●● ● ●● ● ● ● ●●
● ● ●● ● ●● ● ● ●
● ● ●

● ● ● ● ●● ● ●
● ● ●
● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●●

● ● ● ● ● ●● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ● ●
● ● ●
70 ● ● ●
● ● ●●● ● ● ● ● ●● ● ●
● ● ●
● ●
● ●●●●●●● ●●●●

● ●● ●● ● ● ●
childHeight

●● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ●●● ● ●● ● ● ● ● ●●
● ●
● ● ●● ● ●● ● ●●●● ● ● ●● ● ● ● ●
● ● ● ● ● ●
● ● ● ●● ● ● ●● ●● ●● ●● ● ● ● ● ● ●

● ● ● ● ● ●●● ●●● ● ●● ●● ● ●●● ● ●● ● ● ●●● ● ●● ● ● ●● ● ●
● ● ● ● ● ●
● ● ●● ● ● ● ● ● ● ●● ●
● ● ●
● ● ● ●● ● ● ● ● ● ●●●● ● ●● ●
● ● ● ● ● ●● ● ●● ●● ●
● ● ● ●● ●●

● ● ●

● ● ●●

●●● ●

●●●● ●

● ● ● ●


● ● ● ● ● ● ●● ● ● ●●●● ●
● ●● ● ● ● ●● ● ●●●
● ●● ●●●● ● ● ●
● ●
● ● ●
● ● ●● ● ●●● ●●
● ● ●●●●● ● ● ● ●● ● ● ●● ● ●
● ● ● ● ●
65 ● ● ●


● ●


● ●

●●


● ● ● ●●●●●●

● ● ●● ● ●


●●● ●●

●●
● ●●●●●

● ●

● ●●
● ●

●●

● ●

● ● ● ●● ●● ● ● ●● ● ● ●● ●●
●●● ●●●● ● ●
● ●● ● ● ●
● ● ● ● ●
● ● ●● ● ● ● ●
● ●● ● ●●
●●● ●●● ● ● ●
● ● ●
● ● ● ●● ● ● ●●● ● ●● ● ● ●●●●
● ● ● ●●● ●●● ● ●
● ●
● ●● ●
●●● ● ● ●● ● ● ● ●● ●
● ●
● ● ● ●● ● ● ●● ● ●
● ●●●●● ●●
● ● ●● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ●

● ● ● ●● ● ●● ●● ● ● ●● ● ●●

● ● ● ●

60 ● ● ● ● ● ● ● ●● ● ● ●● ● ●

● ●
fitted line: y=22.64+0.64x

55
64 68 72
midparentHeight

FIGURE 2.1: Galton’s dataset

With n data points (xi, yi )ni=1 , our goal is to find the best linear fit of the data

(xi, ŷi = α̂ + β̂xi )ni=1 .

What do we mean by the “best” fit? Gauss proposed to use the following criterion, called

7
8 Linear Model and Extensions

the ordinary least squares (OLS) 1 :


n
X
(α̂, β̂) = arg min n−1 (yi − a − bxi )2 .
a,b
i=1

The OLS criterion is based on the squared “misfits” yi − a − bxi . Another intuitive
criterion is based on the absolute values of those misfits, which is called the least absolute
deviation (LAD). However, OLS is simpler because the objective function is smooth in (a, b).
We will discuss LAD in Chapter 26.
How to solve the OLS minimization problem? The objective function is quadratic, and
as a and b diverge, it diverges to infinity. So it must has a unique minimizer (α̂, β̂) which
satisfies the first-order condition:
( Pn
− n2 i=1 (yi − α̂ − β̂xi ) = 0,
Pn
− n2 i=1 xi (yi − α̂ − β̂xi ) = 0.

These two equations are called the Normal Equations of OLS. The first equation implies

ȳ = α̂ + β̂ x̄, (2.1)

that is, the OLS line must go through the sample mean of the data (x̄, ȳ). The second
equation implies
xy = α̂x̄ + β̂x2 , (2.2)
where xy is the sample mean of the xi yi ’s, and x2 is the sample mean of the x2i ’s. Subtracting
(2.1)×x̄ from (2.2), we have

xy − x̄ȳ = β̂(x2 − x̄2 )


=⇒ σ̂xy = β̂ σ̂x2
σ̂xy
=⇒ β̂ = .
σ̂x2

So the OLS coefficient of x equals the sample covariance between x and y divided by the
sample variance of x. From (2.1), we obtain that

α̂ = ȳ − β̂ x̄.

Finally, the fitted line is

y = α̂ + β̂x = ȳ − β̂ x̄ + β̂x
=⇒ y − ȳ = β̂(x − x̄)
σ̂xy ρ̂xy σ̂x σ̂y
=⇒ y − ȳ = 2 (x − x̄) = (x − x̄)
σ̂x σ̂x2
y − ȳ x − x̄
=⇒ = ρ̂xy ,
σ̂y σ̂x

which is the Galtonian formula mentioned in Chapter 1.


We can obtain the fitted line based on Galton’s data using the R code below.
1 The idea of OLS is often attributed to Gauss and Legendre. Gauss used it in the process of discovering

Ceres, and his work was published in 1809. Legendre’s work appeared in 1805 but Gauss claimed that he
had been using it since 1794 or 1795. Stigler (1981) reviews the history of OLS.
Ordinary Least Squares (OLS) with a Univariate Covariate 9

> library ( " HistData " )


> xx = Galton Families $ m id pa re n tH ei gh t
> yy = Galton Families $ childHeight
>
> center _ x = mean ( xx )
> center _ y = mean ( yy )
> sd _ x = sd ( xx )
> sd _ y = sd ( yy )
> rho _ xy = cor ( xx , yy )
>
> beta _ fit = rho _ xy * sd _ y / sd _ x
> alpha _ fit = center _ y - beta _ fit * center _ x
> alpha _ fit
[1] 22.63624
> beta _ fit
[1] 0.6373609

This generates Figure 2.1.

2.2 Final comments


We can write the sample mean as the solution to the OLS with only the intercept:
n
X
ȳ = arg min n−1 (yi − µ)2 .
µ
i=1

It is rare to fit OLS of yi on xi without the intercept:


n
X
β̂ = arg min n−1 (yi − bxi )2
b
i=1

which equals Pn
xi yi ⟨x, y⟩
β̂ = Pi=1
n 2 = ,
i=1 xi ⟨x, x⟩
where
Pn x and y are the n-dimensional vectors containing all observations, and ⟨x, y⟩ =
i=1 i yi denotes the inner product. Although not directly useful, this formula will be the
x
building block for many discussions later.

2.3 Homework problems


2.1 Pairwise slopes
Given (xi , yi )ni=1 with univariate xi and yi , show that Galton’s slope equals
X
β̂ = wij bij ,
(i,j)

where the summation is over all pairs of observations (i, j),

bij = (yi − yj )/(xi − xj )


10 Linear Model and Extensions

is the slope determined by two points (xi , yi ) and (xj , yj ), and


X
wij = (xi − xj )2 / (xi′ − xj ′ )2
(i′ ,j ′ )

is the weight proportional to the squared distance between xi and xj . In the above formulas,
we define bij = 0 if xi = xj .
Remark: Wu (1986) and Gelman and Park (2009) used this formula. Problem 3.9 gives
a more general result.
Part II

OLS and Statistical Inference


3
OLS with Multiple Covariates

3.1 The OLS formula


Recall that we have the outcome vector
 
y1
 y2 
Y =
 
.. 
 . 
yn
and covariate matrix
   
x11 x12 ··· x1p xt1
 x21 x22 ··· x2p   xt2 
X= =  = (X1 , . . . , Xp )
   
.. .. .. ..
 . . .   . 
xn1 xn2 ··· xnp xtn

where xti = (xi1 , . . . , xip ) is the row vector consisting of the covariates of unit i, and Xj =
(x1j , . . . , xnj )t is the column vector of the j-th covariate for all units.
We want to find the best linear fit of the data (xi , ŷi )ni=1 with

ŷi = xti β̂ = β̂1 xi1 + · · · + β̂p xip

in the sense that


n
X
β̂ = arg min n−1 (yi − xti b)2
b
i=1
= arg min n−1 ∥Y − Xb∥2 ,
b

where β̂ is called the OLS coefficient, the ŷi ’s are called the fitted values, and the yi − ŷi ’s
are called the residuals.
The objective function is quadratic in b which diverges to infinity when b diverges to
infinity. So it must have a unique minimizer β̂ satisfying the first order condition
n
2X
− xi (yi − xti β̂) = 0,
n i=1

which simplifies to
n
X
xi (yi − xti β̂) = 0 ⇐⇒ X t (Y − X β̂) = 0. (3.1)
i=1

The above equation (3.1) is called the Normal equation of the OLS, which implies the main
theorem:

13
14 Linear Model and Extensions

Theorem 3.1 The OLS coefficient equals


n
!−1 n
!
X X
β̂ = xi xti xi yi
i=1 i=1
= (X X)−1 X t Y
t

Pn
if X t X = i=1 xi xti is non-degenerate.
The equivalence of the two forms of the OLS coefficient follows from
 t 
x1
n
 xt2  X
X t X = (x1 , . . . , xn )  .  = xi xti ,
 
 ..  i=1
xtn

and  
y1
 y2  X n
X t Y = (x1 , . . . , xn )  = xi yi .
 
.. 
 .  i=1
yn
For different purposes, both forms can be useful.
The non-degeneracy of X t X in Theorem 3.1 requires that for any non-zero vector α ∈
p
R , we must have
αt X t Xα = ∥Xα∥2 ̸= 0
which is equivalent to
Xα ̸= 0,
i.e., the columns of X are linearly independent 1 . This effectively rules out redundant
columns in the design matrix X. If X1 can be represented by other columns X1 =
c2 X2 + · · · + cp Xp for some (c2 , . . . , cp ), then X t X is degenerate.
Throughout the book, we invoke the following condition unless stated otherwise.
Condition 3.1 The column vectors of X are linearly independent.

3.2 The geometry of OLS


The OLS has very clear geometric interpretations. Figure 3.1 illustrate its geometry with
n = 3 and p = 2. For any b = (b1 , . . . , bp )t ∈ Rp and X = (X1 , . . . , Xp ) ∈ Rn×p ,

Xb = b1 X1 + · · · + bp Xp

represents a linear combination of the column vectors of the design matrix X. So the OLS
problem is to find the best linear combination of the column vectors of X to approximate the
response vector Y . Recall that all linear combinations of the column vectors of X constitute
1 This book uses different notions of “independence” which can be confusing sometimes. In linear algebra,

a set of vectors is linearly independent if any nonzero linear combination of them is not zero; see Chapter A.
In probability theory, two random variables are independent if their joint density factorizes into the product
of the marginal distributions; see Chapter B.
OLS with Multiple Covariates 15

FIGURE 3.1: The geometry of OLS

the column space of X, denoted by C(X) 2 . So the OLS problem is to find the vector in C(X)
that is the closest to Y . Geometrically, the vector must be the projection of Y onto C(X).
By projection, the residual vector ε̂ = Y −X β̂ must be orthogonal to C(X), or, equivalently,
the residual vector is orthogonal to X1 , . . . , Xp . This geometric intuition implies that

X1t ε̂ = 0, . . . , Xpt ε̂ = 0,
 t 
X1 ε̂
t  .. 
⇐⇒ X ε̂ =  .  = 0,
Xpt ε̂
⇐⇒ X t (Y − X β̂) = 0,

which is essentially the Normal equation (3.1). The above argument gives a geometric deriva-
tion of the OLS formula in Theorem 3.1.
In Figure 3.1, since the triangle ABC is rectangular, the fitted vector Ŷ = X β̂ is orthog-
onal to the residual vector ε̂, and moreover, the Pythagorean Theorem implies that

∥Y ∥2 = ∥X β̂∥2 + ∥ε̂∥2 .

In most applications, X contains a column of intercepts 1n = (1, . . . , 1)t . In those cases,


we have
n
X
1tn ε̂ = 0 =⇒ n−1 ε̂i = 0,
i=1

2 Please review Chapter A for some basic linear algebra background.


16 Linear Model and Extensions

so the residuals are automatically centered.


The following theorem states an algebraic fact that gives an alternative proof of the
OLS formula. It is essentially the Pythagorean Theorem for the rectangular triangle BCD
in Figure 3.1.

Theorem 3.2 For any b ∈ Rp , we have the following decomposition

∥Y − Xb∥2 = ∥Y − X β̂∥2 + ∥X(β̂ − b)∥2 ,

where implies that ∥Y − Xb∥2 ≥ ∥Y − X β̂∥2 with equality holding if and only if b = β̂.

Proof of Theorem 3.2: We have the following decomposition:

∥Y − Xb∥2 = (Y − Xb)t (Y − Xb)


= (Y − X β̂ + X β̂ − Xb)t (Y − X β̂ + X β̂ − Xb)
= (Y − X β̂)t (Y − X β̂) + (X β̂ − Xb)t (X β̂ − Xb)
+(Y − X β̂)t (X β̂ − Xb) + (X β̂ − Xb)t (Y − X β̂).

The first term equals ∥Y − X β̂∥2 and the second term equals ∥X(β̂ − b)∥2 . We need to show
the last two terms are zero. By symmetry of these two terms, we only need to show that
the last term is zero. This is true by the Normal equation (3.1) of the OLS:

(X β̂ − Xb)t (Y − X β̂) = (β̂ − b)t X t (Y − X β̂) = 0.

3.3 The projection matrix from OLS


The geometry in Section 3.2 also shows that Ŷ = X β̂ is the solution to the following problem

Ŷ = arg min ∥Y − v∥2 .


v∈C(X)

Using Theorem 3.1, we have Ŷ = X β̂ = HY , where

H = X(X t X)−1 X t

is an n × n matrix. It is called the hat matrix because it puts a hat on Y when multiplying
Y . Algebraically, we can show that H is a projection matrix because

H2 = X(X t X)−1 X t X(X t X)−1 X t


= X(X t X)−1 X t
= H,

and
t
X(X t X)−1 X t

Ht =
= X(X t X)−1 X t
= H.
OLS with Multiple Covariates 17

Its rank equals its trace, so equals

trace X(X t X)−1 X t



rank(H) = trace(H) =
trace (X t X)−1 X t X

=
= trace(Ip ) = p.

The projection matrix H has the following geometric interpretations.

Proposition 3.1 The projection matrix H = X(X t X)−1 X t satisfies


(G1) Hv = v ⇐⇒ v ∈ C(X);
(G2) Hw = 0 ⇐⇒ w ⊥ C(X).

Recall that C(X) is the column space of X. (G1) states that projecting any vector in
C(X) onto C(X) does not change the vector, and (G2) states that projecting any vector
orthogonal to C(X) onto C(X) results in a zero vector.
Proof of Proposition 3.1: I first prove (G1). If v ∈ C(X), then v = Xb for some b,
which implies that Hv = X(X t X)−1 X t Xb = Xb = v. Conversely, if v = Hv, then v =
X(X t X)−1 X t v = Xu with u = (X t X)−1 X t v, which ensures that v ∈ C(X).
I then prove (G2). If w ⊥ C(X), then w is orthogonal to all column vectors of X. So

Xjt w = 0 (j = 1, . . . , p)
t
=⇒ X w = 0
=⇒ Hw = X(X t X)−1 X t w = 0.

Conversely, if Hw = X(X t X)−1 X t w = 0, then wt X(X t X)−1 X t w = 0. Because (X t X)−1


is positive definite, we have X t w = 0 ensuring that w ⊥ C(X). □
Writing H = (hij )1≤i,j≤n and ŷ = (ŷ1 , . . . , ŷn )t , we have another basic identity
n
X X
ŷi = hij yj = hii yi + hij yj .
j=1 j̸=i

It shows that the predicted value ŷi is a linear combination of all the outcomes. Moreover,
if X contains a column of intercepts 1n = (1, . . . , 1)t , then
n
X
H1n = 1n =⇒ hij = 1 (i = 1, . . . , n),
j=1

which implies that ŷi is a weighted average of all the outcomes. Although the sum of the
weights is one, some of them can be negative.
In general, the hat matrix has complex forms, but when the covariates are dummy
variables, it has more explicit forms. I give two examples below.

Example 3.1 In a treatment-control experiment with m treated and n control units, the
matrix X contains 1 and a dummy variable for the treatment:
 
1m 1m
X= .
1n 0n

We can show that


H = diag{m−1 1m 1tm , n−1 1n 1tn }.
18 Linear Model and Extensions

Example 3.2 In an experiment with nj units receiving treatment level j (j = 1, . . . , J),


the covariate matrix X contains J dummy variables for the treatment levels:
X = diag{1n1 , . . . , 1nJ }.
We can show that
H = diag{n−1 t −1 t
1 1n1 1n1 , . . . , nJ 1nJ 1nJ }.

3.4 Homework problems


3.1 Univariate and multivariate OLS
Derive the univariate OLS based on the multivariate OLS formula with
 
1 x1
X =  ... .. 

. 
1 xn
where the xi ’s are scalars.

3.2 OLS via vector and matrix calculus


Using vector and matrix calculus, show that the OLS estimator minimizes (Y − Xb)t (Y −
Xb).

3.3 OLS based on pseudo inverse


Show that β̂ = X + Y .
Remark: Recall the definition of the pseudo inverse in Chapter A.

3.4 Invariance of OLS


Assume that X t X is non-degenerate and Γ is a p×p non-degenerate matrix. Define X̃ = XΓ.
From the OLS fit of Y on X, we obtain the coefficient β̂, the fitted value Ŷ , and the residual
ε̂; from the OLS fit of Y on X̃, we obtain the coefficient β̃, the fitted value Ỹ , and the residual
ε̃.
Prove that
β̂ = Γβ̃, Ŷ = Ỹ , ε̂ = ε̃.
Remark: From a linear algebra perspective, X and XΓ have the same column space if
Γ is a non-degenerate matrix:
{Xb : b ∈ Rp } = {XΓc : c ∈ Rp }.
Consequently, there must be a unique projection of Y onto the common column space.

3.5 Invariance of the hat matrix


Show that H does not change if we change X to XΓ where Γ ∈ Rp×p is a non-degenerate
matrix.

3.6 Special hat matrices


Verify the formulas of the hat matrices in Examples 3.1 and 3.2.
OLS with Multiple Covariates 19

3.7 OLS with multiple responses


For each unit i = 1, . . . , n, we have multiple responses yi = (yi1 , . . . , yiq )t ∈ Rq and multiple
covariates xi = (xi1 , . . . , xip )t ∈ Rp . Define
   t
y11 · · · y1q y1
 .. ..  =  ... 
. n×q
Y = .  = (Y1 , . . . , Yq ) ∈ R
 

yn1 ··· ynq ynt

and    t
x11 ··· x1p x1
X =  ... ..  =  ..  = (X , . . . , X ) ∈ Rn×p

.   .  1 p
xn1 ··· xnp xtn
as the response and covariate matrices, respectively. Define the multiple OLS coefficient
matrix as
n
X
B̂ = arg minp×q
∥yi − B t xi ∥2
B∈R
i=1

Show that B̂ = (B̂1 , . . . , B̂q ) has column vectors

B̂1 = (X t X)−1 X t Y1 ,
..
.
B̂q = (X t X)−1 X t Yq .

Remark: This result tells us that the OLS fit with a vector outcome reduces to multiple
separate OLS fits, or, the OLS fit of a matrix Y on a matrix X reduces to the column-wise
OLS fits of Y on X.

3.8 Full sample and subsample OLS coefficients


Partition the full sample into K subsamples:
   
X(1) Y(1)
X =  ...  , Y =  ...  ,
   

X(K) Y(K)

where the kth sample consists of (X(k) , Y(k) ) with X(k) ∈ Rnk ×p and Y(k) ∈ Rnk being the
PK
covariate matrix and outcome vector. Note that n = k=1 nk Let β̂ be the OLS coefficient
based on the full sample, and β̂(k) be the OLS coefficient based on the kth sample. Show
that
XK
β̂ = W(k) β̂(k) ,
k=1

where the weight matrix equals

W(k) = (X t X)−1 X(k)


t
X(k) .
20 Linear Model and Extensions

3.9 Jacobi’s theorem


n

The set {1, . . . , n} has p size-p subsets. Each subset S defines a linear equation for b ∈ Rp :

YS = XS b

where YS ∈ Rp is the subvector of Y and XS ∈ Rp×p is the submatrix of X, corresponding


to the units in S. Define the subset coefficient

β̂S = XS−1 YS

if XS is invertible and β̂S = 0 otherwise. Show that the OLS coefficient equals a weighted
average of these subset coefficients:
X
β̂ = wS β̂S
S

where the summation is over all subsets and


| det(XS )|2
wS = P 2
.
S ′ | det(XS )|

Remark: To prove this result, we can use Cramer’s rule to express the OLS coefficient
and use the Cauchy–Binet formula to expand the determinant of X t X. This result extends
Problem 2.1. Berman (1988) attributed it to Jacobi. Wu (1986) used it in analyzing the
statistical properties of OLS.
4
The Gauss–Markov Model and Theorem

4.1 Gauss–Markov model


Without any stochastic assumptions, the OLS in Chapter 3 is purely algebraic. From now
on, we want to discuss the statistical properties of β̂ and associated quantities, so we need
to invoke some statistical modeling assumptions. A simple starting point is the following
Gauss–Markov model with a fixed design matrix X and unknown parameters (β, σ 2 ).
Assumption 4.1 (Gauss–Markov model) We have
Y = Xβ + ε
where the design matrix X is fixed with linearly independent column vectors, and the random
error term ε has the first two moments
E(ε) = 0,
cov(ε) = σ 2 In .
The unknown parameters are (β, σ 2 ).
The Gauss–Markov model assumes that Y has mean Xβ and covariance matrix σ 2 In .
At the individual level, we can also write it as
yi = xti β + εi , (i = 1, . . . , n)
where the error terms are uncorrelated with mean 0 and variance σ 2 .
The assumption that X is fixed is not essential, because we can condition on X even
if we think X is random. The mean of each yi is linear in xi with the same β coefficient,
which is a rather strong assumption. So is the homoskedasticity1 assumption that the error
terms have the same variance σ 2 . The critiques on the assumptions aside, I will derive the
properties of β̂ under the Gauss–Markov model.

4.2 Properties of the OLS estimator


I first derive the mean and covariance of β̂ = (X t X)−1 X t Y .
Theorem 4.1 Under Assumption 4.1, we have
E(β̂) = β,
cov(β̂) = σ 2 (X t X)−1 .
1 In this book, I do not spell it as homoscedasticity since “k” better indicates the meaning of variance.

McCulloch (1985) gave a convincing argument. See also Paloyo (2014).

21
22 Linear Model and Extensions

Proof of Theorem 4.1: Because E(Y ) = Xβ, we have

E(β̂) = E (X t X)−1 X t Y


= (X t X)−1 X t E(Y )
= (X t X)−1 X t Xβ
= β.

Because cov(Y ) = σ 2 In , we have

cov(β̂) = cov (X t X)−1 X t Y




= (X t X)−1 X t cov(Y )X(X t X)−1


= σ 2 (X t X)−1 X t X(X t X)−1
= σ 2 (X t X)−1 .


We can decompose the response vector as

Y = Ŷ + ε̂,

where the fitted vector is Ŷ = X β̂ = HY and the residual vector is ε̂ = Y − Ŷ = (In − H)Y.
The two matrices H and In − H are the keys, which have the following properties.

Lemma 4.1 Both H and In − H are projection matrices. In particular,

HX = X, (In − H)X = 0,

and they are orthogonal:


H(In − H) = (In − H)H = 0.

These follow from simple linear algebra, and I leave the proof as Problem 4.1. It states
that H and In − H are projection matrices onto the column space of X and its complement.
Algebraically, Ŷ and ε̂ are orthogonal by the OLS projection because Lemma 4.1 implies

Ŷ t ε̂ = Y t H t (In − H)Y
= Y t H(In − H)Y
= 0.

This is also coherent with the geometry in Figure 3.1.


Moreover, we can derive the mean and covariance matrix of Ŷ and ε̂.
Theorem 4.2 Under Assumption 4.1, we have
   
Ŷ Xβ
E =
ε̂ 0

and    
Ŷ 2 H 0
cov =σ .
ε̂ 0 In − H

So Ŷ and ε̂ are uncorrelated.


The Gauss–Markov Model and Theorem 23

Please do not be confused with the two statements above. First, Ŷ and ε̂ are orthogonal.
Second, Ŷ and ε̂ are uncorrelated. They have different meanings. The first statement is an
algebraic fact of the OLS procedure. It is about a relationship between two vectors Ŷ
and ε̂ which holds without assuming the Gauss–Markov model. The second statement is
stochastic. It is about a relationship between two random vectors Ŷ and ε̂ which requires
the Gauss–Markov model assumption.
Proof of Theorem 4.2: The conclusion follows from the simple fact that
     
Ŷ HY H
= = Y
ε̂ (In − H)Y In − H
is a linear transformation of Y . It has mean
   
Ŷ H
E = E(Y )
ε̂ In − H
 
H
= Xβ
In − H
 
HXβ
=
(In − H) Xβ
 

= ,
0
and covariance matrix
   
Ŷ H 
cov = cov(Y ) H t (In − H)t
ε̂ In − H
 
2 H 
=σ H In − H
In − H
H2
 
2 H(In − H)

(In − H)H (In − H)2
 
2 H 0
=σ ,
0 In − H
where the last step follows from Lemma 4.1. □
Assume the Gauss–Markov model. Although the original responses and error terms are
uncorrelated between units with cov(εi , εj ) = 0 for i ̸= j, the fitted values and the residuals
are correlated with
cov(ŷi , ŷj ) = σ 2 hij , cov(ε̂i , ε̂j ) = σ 2 (1 − hij )
for i ̸= j based on Theorem 4.2.

4.3 Variance estimation


Theorem 4.1 quantifies the uncertainty of β̂ by its covariance matrix. However, it is not
directly useful because σ 2 is still unknown. Our next task is to estimate σ 2 based on the
observed data. It is the variance of each εi , but the εi ’s are not observable either. Their
empirical analogues are the residuals ε̂i = yi − xti β̂. It seems intuitive to estimate σ 2 by
σ̃ 2 = rss/n
24 Linear Model and Extensions

where
n
X
rss = ε̂2i
i=1

is the residual sum of squares. However, Theorem 4.2 shows that ε̂i has mean zero and
variance σ 2 (1 − hii ), which is not the same as the variance of original εi . Consequently, rss
has mean
n
X
E(rss) = σ 2 (1 − hii )
i=1
= σ 2 {n − trace(H)}
= σ 2 (n − p),

which implies the following theorem.

Theorem 4.3 Define


n
X
σ̂ 2 = rss/(n − p) = ε̂2i /(n − p).
i=1
2 2
Then E(σ̂ ) = σ under Assumption 4.1.

Theorem 4.3 implies that σ̃ 2 is a biased estimator for σ 2 because E(σ̃ 2 ) = σ 2 (n − p)/n.
It underestimates σ 2 but with a large sample size n, the bias is small.

4.4 Gauss–Markov Theorem


So far, we have focused on the OLS estimator. It is intuitive, but we have not answered
the fundamental question yet. Why should we focus on it? Are there any other better
estimators? Under the Gauss–Markov model, the answer is definite: we focus on the OLS
estimator because it is optimal in the sense of having the smallest covariance matrix among
all linear unbiased estimators. The following famous Gauss–Markov theorem quantifies this
claim, which was named after Carl Friedrich Gauss and Andrey Markov2 . It is for this reason
that I call the corresponding model the Gauss–Markov model. The textbook by Monahan
(2008) also uses this name.

Theorem 4.4 Under Assumption 4.1, the OLS estimator β̂ for β is the best linear unbiased
estimator (BLUE) in the sense that3

cov(β̃) ⪰ cov(β̂)

for any estimator β̃ satisfying


(C1) β̃ = AY for some A ∈ Rp×n not depending on Y ;

(C2) E(β̃) = β for any β.


2 David and Neyman (1938) used the name Markoff theorem. Lehmann (1951) appeared to first use the

name Gauss–Markov theorem.


3 We write M ≻ M is M − M is positive semi-definite. See Chapter A for a review.
1 2 1 2
The Gauss–Markov Model and Theorem 25

Before proving Theorem 4.4, we need to understand its meaning and immediate impli-
cations. We do not compare the OLS estimator with any arbitrary estimators. In fact, we
restrict to the estimators that are linear and unbiased. Condition (C1) requires that β̃ is
a linear estimator. More precisely, it is a linear transformation of the response vector Y ,
where A can be any complex and possibly nonlinear function of X. Condition (C2) requires
that β̃ is an unbiased estimator for β, no matter what true value β takes.
Why do we restrict the estimator to be linear? The class of linear estimator is actually
quite large because A can be any nonlinear function of X, and the only requirement is that
the estimator is linear in Y . The unbiasedness is a natural requirement for many problems.
However, in many modern applications with many covariates, some biased estimators can
perform better than unbiased estimators if they have smaller variances. We will discuss
these estimators in Part V of this book.
We compare the estimators based on their covariances, which are natural extensions of
variances for scalar random variables. The conclusion cov(β̃) ⪰ cov(β̂) implies that for any
vector c ∈ Rp , we have
ct cov(β̃)c ⪰ ct cov(β̂)c
which is equivalent to
var(ct β̃) ≥ var(ct β̂),
So any linear transformation of the OLS estimator has a variance smaller than or equal to
the same linear transformation of any other estimator. In particular, if c = (0, . . . , 1, . . . , 0)t
with only the jth coordinate being 1, then the above inequality implies that
var(β̃j ) ≥ var(β̂j ), (j = 1, . . . , p).
So the OLS estimator has a smaller variance than other estimators for all coordinates.
Now we prove the theorem.
Proof of Theorem 4.4: We must verify that the OLS estimator itself satisfies (C1) and
(C2). We have β̂ = ÂY with  = (X t X)−1 X t , and it is unbiased by Theorem 4.1.
First, the unbiasedness requirement implies that
E(β̃) = β =⇒ E(AY ) = AE(Y ) = AXβ = β
=⇒ AXβ = β
for any value of β. So
AX = Ip (4.1)
t −1 t
must hold. In particular, the OLS estimator satisfies ÂX = (X X) X X = Ip .
Second, we can decompose the covariance of β̃ as
cov(β̃) = cov(β̂ + β̃ − β̂)
= cov(β̂) + cov(β̃ − β̂) + cov(β̂, β̃ − β̂) + cov(β̃ − β̂, β̂).
The last two terms are in fact zero. By symmetry, we only need to show that the third term
is zero:
n o
cov(β̂, β̃ − β̂) = cov ÂY, (A − Â)Y
= Âcov(Y )(A − Â)t
= σ 2 Â(A − Â)t
= σ 2 (ÂAt − ÂÂt )
= σ 2 (X t X)−1 X t At − (X t X)−1 X t X(X t X)−1


= σ 2 (X t X)−1 Ip − (X t X)−1

(by (4.1))
= 0.
26 Linear Model and Extensions

The above covariance decomposition simplifies to


cov(β̃) = cov(β̂) + cov(β̃ − β̂),
which implies
cov(β̃) − cov(β̂) = cov(β̃ − β̂) ⪰ 0.

In the process of the proof, we have shown two stronger results
cov(β̃ − β̂, β̂) = 0
and
cov(β̃ − β̂) = cov(β̃) − cov(β̂).
They hold only when β̂ is BLUE. They do not hold when comparing two general estimators.
Theorem 4.4 is elegant but abstract. It says that in some sense, we can just focus on
the OLS estimator because it is the best one in terms of the covariance among all linear
unbiased estimators. Then we do not need to consider other estimators. However, we have
not mentioned any other estimators for β yet, which makes Theorem 4.4 not concrete
enough. From the proof above, a linear unbiased estimator β̃ = AY only needs to satisfy
AX = Ip , which imposes p2 constraints on the p × n matrix A. Therefore, we have p(n − p)
free parameters to choose from and have infinitely many linear unbiased estimators in
general. A class of linear unbiased estimators discussed more thoroughly in Chapter 19, are
the weighted least squares estimators
β̃ = (X t Σ−1 X)−1 X t Σ−1 Y,
where Σ is a positive definite matrix not depending on Y such that Σ and X t Σ−1 X are
invertible. It is linear, and we can show that it is unbiased for β:
E(β̃) = E (X t Σ−1 X)−1 X t Σ−1 Y


= (X t Σ−1 X)−1 X t Σ−1 Xβ


= β.
Different choices of Σ give different β̃, but Theorem 4.4 states that the OLS estimator with
Σ = In has the smallest covariance matrix under the Gauss–Markov model.
I will give an extension and some applications of the Gauss–Markov Theorem as home-
work problems.

4.5 Homework problems


4.1 Projection matrices
Prove Lemma 4.1.

4.2 Univariate OLS and the optimal design


Assume the Gauss–Markov model yi = α + βxi + εi (i = 1, . . . , n) with a scalar xi . Show
that the variance of the OLS coefficient for xi equals
n
.X
var(β̂) = σ 2 (xi − x̄)2 .
i=1
The Gauss–Markov Model and Theorem 27

Assume xi must be in the interval [0, 1]. We want to choose their values to minimize
var(β̂). Assume that n is an even number. Find the minimizers xi ’s.
Hint: You may find the following probability result useful. For a random variable ξ in
the interval [0, 1], we have the following inequality
var(ξ) = E(ξ 2 ) − {E(ξ)}2
≤ E(ξ) − {E(ξ)}2
= E(ξ){1 − E(ξ)}
≤ 1/4.
The first inequality becomes an equality if and only if ξ = 0 or 1; the second inequality
becomes an equality if and only if E(ξ) = 1/2.

4.3 BLUE estimator for the mean


Assume that yi has mean µ and variance σ 2 , andPyi (i = 1, . . . , n) are uncorrelated. A
n
Pn estimator of the mean µ has the form µ̂ = i=1 ai yi , which is unbiased as long as
linear
i=1 ai = 1. So there are infinitely many linear unbiased estimators for µ.
Find the BLUE for µ and prove why it is BLUE.

4.4 Consequence of useless regressors


Partition the covariate matrix and parameter into
 
β1
X = (X1 , X2 ), β= ,
β2

where X1 ∈ Rn×k , X2 ∈ Rn×l , β1 ∈ Rk and β2 ∈ Rl with k + l = p. Assume the Gauss–


Markov model with β2 = 0. Let β̂1 be the first k coordinates of β̂ = (X t X)−1 X t Y and
β̃1 = (X1t X1 )−1 X1t Y be the coefficient based on the partial OLS fit of Y on X1 only. Show
that
cov(β̂1 ) ⪰ cov(β̃1 ).

4.5 Simple average of subsample OLS coefficients


Inherit the setting P
of Problem 3.8. Define the simple average of the subsample OLS coeffi-
K
cients as β̄ = K −1 k=1 β̂(k) . Assume the Gauss–Markov model. Show that

cov(β̄) ⪰ cov(β̂).

4.6 Gauss–Markov theorem for prediction


Under Assumption 4.1, the OLS predictor Ŷ = X β̂ for the mean Xβ is the best linear
unbiased predictor in the sense that cov(Ỹ ) ⪰ cov(Ŷ ) for any predictor Ỹ satisfying
(C1) Ỹ = H̃Y for some H̃ ∈ Rn×n not depending on Y ;
(C2) E(Ỹ ) = Xβ for any β.
Prove this theorem.

4.7 Nonlinear unbiased estimator under the Gauss–Markov model


Under Assumption 4.1, prove that if
X t Qj X = 0, trace(Qj ) = 0, (j = 1, . . . , p)
28 Linear Model and Extensions

then  
Y t Q1 Y
β̃ = β̂ + 
 .. 
. 
Y t Qp Y
is unbiased for β.
Remark: The above estimator β̃ is a quadratic function of Y . It is a nonlinear unbiased
estimator for β. It is not difficult to show the unbiasedness. More remarkably, Koopmann
(1982, Theorem 4.3) showed that under Assumption 4.1, any unbiased estimator for β must
have the form of β̃.
Random documents with unrelated
content Scribd suggests to you:
locis patentibus.
quae autem
eo biennio a tempestatibus
tacta laesa fuerint,
ea in fundamenta 20
coiciantur. cetera quae non
erunt vitiata, ab
| natura rerum probata
durare poterunt
supra terram (15)
aedificata. nec solum ea in
quadratis lapidibus
sunt observanda
sed etiam in caementiciis
structuris.

VIII Structurarum genera sunt


haec, reticulatum
quo nunc 25
omnes utuntur, et
antiquum quod
incertum dicitur. ex
his | venustius est
reticulatum, sed ad
rimas faciendas ideo
(20)
paratum quod in omnes
partes dissoluta
habet cubilia et
coagmenta. incerta vero
caementa alia super
alia sedentia
inter seque imbricata non
speciosam sed
firmiorem quam 30
1-5 sed est (ē GS) firma …
permanens … habet …
solidata … nocetur x.
3 con(HG, cō
SGc)parationibus (sic
recte) x.
6 ferentis x.
8 achanthos H (-tos GS).
9 scalptos H(Gc): sculptos
GS.
10 sunt ante ras. G.
11 formis con(cō SGc)paratas
HS, formas conparatas
G.
12 is: his x.
15 palliensibus x.
16 urbę S. | qui HS: quis G.
28 cubilia HS: cubicula G.
30 inbricata H.

Plain text
2 reticulata praestant
structuram. utraque
autem ex
minutis|simis (25)
sunt instruenda, uti
materia ex calce et
harena
crebriter parietes satiati
diutius contineantur.
molli enim
et rara potestate cum sint,
exsiccant sugendo e
materia
sucum. cum autem
superarit et
abundarit copia
calcis et 5
harenae, paries plus
habens umoris non
cito fiet evanidus,
sed ab his | continetur.
simul autem umida
potestas e 47
materia per caementorum
raritatem fuerit
exsucta calxque
ab harena discedat et
dissolvatur, item
caementa non
possunt cum his cohaerere,
sed in vetustatem
parietes 10
3 efficiunt ruinosos. id |
autem licet
animadvertere etiam
(5)
de nonnullis monumentis,
quae circa urbem
facta sunt e
marmore seu lapidibus
quadratis
intrinsecusque
medio calcata
structuris, vetustate
evanida facta
materia
caementorumque
exsucta raritate, proruunt
et coagmentorum
ab 15
4 ruina dissolutis iuncturis
dissi|pantur. quodsi
qui noluerit (10)
in id vitium incidere, medio
cavo servato
secundum
orthostatas,
intrinsecus ex rubro saxo
quadrato aut ex
testa aut
ex silicibus ordinariis struat
bipedales parietes,
et cum
his ansis ferreis et plumbo
frontes vinctae sint.
ita enim 20
non acervatim sed ordine
structum | opus
poterit esse sine
(15)
vitio sempiternum, quod
cubilia et coagmenta
eorum inter
se sedentia et iuncturis
alligata non
protrudent opus
neque
orthostatas inter se
religatos labi
patiuntur.

5 Itaque non est


contemnenda
Graecorum
structura. non 25
| enim utuntur e molli
caemento structura
polita, sed cum (20)
discesserunt a quadrato,
ponunt de silice seu
lapide duro
ordinaria, et ita uti latericia
struentes alligant
eorum
2 rebus (ante sunt) add. S. |
et inharena ante ras. G.
4 e HG: a S.
6 paries: partes H et (ips. in
marg.) S(Gc), -tis G.
7 si (ante humida) add. S.
8 exsuta HG, exuta S.
9 discedat (sed ips. corr. -it)
et dissolvitur S.

11 ruinosos HS: rimosos G.


15 exstructa x.
16 qui HG: quis S.
17 orchostatas HS (ortho-
G).
21 acervati x.
24 orto statas HG (ortho- S).
25 non enim G (et qui ipse
addit post finem lineae)
S: om. H.
26 structura G: om. HS.
27 discesserunt (-ser̄ S) HS:
-rint G. | lapide: de
lapide S.
Plain text
alternis coriis coagmenta,
et sic maxime ad
aeternitatem
firmas perficiunt virtutes.
haec autem duobus
generibus
struuntur. ex his | unum
isodomum, alterum
pseudisodomum
(25)
6 appellatur. isodomum
dicitur cum omnia
coria aequa
crassitudine fuerint structa,
pseudisodomum
cum inpares 5
et inaequales ordines |
coriorum diriguntur.
ea utraque 48
sunt ideo firma, primum
quod ipsa caementa
sunt spissa
et solida proprietate neque
de materia possunt
exsugere
liquorem sed conservant
eam in suo umore
ad summam
vetustatem, ipsaque eorum
cubilia | primum
plana et librata 10
(5)
posita non patiuntur ruere
materiam, sed
perpetua parietum
crassitudine religata
continent ad
summam
vetustatem.
7 altera est quam εμπλεκτον
appellant, qua etiam
nostri
rustici utuntur. quorum
frontes poliuntur,
reliqua ita uti sunt
nata cum materia
conlocata alternis
al|ligant coagmentis.
15 (10)
sed nostri celeritati
studentes, erecta
conlocantes
frontibus serviunt et in
medio farciunt
fractis separatim
cum materia caementis. ita
tres suscitantur in
ea structura
crustae, duae frontium et
una media farturae.
Graeci
vero non ita, sed plana
conlocantes et
longitudines | eorum
20 (15)
alternis in crassitudinem
instruentes, non
media farciunt
sed e suis frontibus
perpetuam et unam
crassitudinem
parietum consolidant.
praeterea
interponunt singulos
crassitudine
perpetua utraque parte
frontatos, quos
διατονουϲ
appellant, qui maxime
religando confirmant
parietum 25
8 solidi|tatem. itaque si qui
voluerit ex his
commentariis (20)
animadvertere et eligere
genus structurae,
genus structurae,
perpetuitatis poterit
rationem habere. non enim
quae sunt e molli
caemento
subtili facie venustatis, non
eae possunt esse in
vetustate
3 (4) hisodomum x. | (et 5)
speudisodomum S et
(ante ras.) G.
8 exsugere SG: exugere H.
9 eam: ea x.
13 enplecton x.
14 quorum (ante corr. marg.)
om. S.
16 celeritate ante corr. G.
17 faciunt (sed Sc farciunt)
factis x.
18 structurę S.
22 frontibus: frontatis x. | et
unam: et in unam HG,
om. S.
23 preterea GSc: praecaetera
H, p̄ cętera (ante corr.) S.
24 parte SGc (porte G):
partes H. | diatonos HG,
diatanos S.
29 eae H et (eę) S(Gc): hae
G. | venustate x.
Plain text
non ruinosae. itaque cum
arbitri communium
parietum
su|muntur, non aestimant
eos quanti facti
fuerint, sed cum
(25)
ex tabulis inveniunt eorum
locationes, pretio
praeteritorum
annorum singulorum
deducunt
octogensimas et ita
ea reliqua
summa pacta reddi pro his
parietibus
sententiam 5
pronuntiant eos non posse
plus quam annos
lxxx durare.

9 | De latericiis vero
dummodo ad
perpendiculum sint
49
stantes nihil deducitur, sed
quanti fuerint olim
facti, tanti
esse semper aestimantur
esse semper aestimantur.
itaque nonnullus
civitatibus et
publica opera et privatas
domos etiam regias
e latere 10
structas licet | videre, et
primum Athenis
murum qui spectat
(5)
ad Hymettum montem et
Pentelensem, item
Patris
in aede Iovis et Herculis
latericias cellas, cum
circa lapidea
in aede epistylia sint et
columnae, in Italia
Arretio
vetustum egregie factum
murum, Trallibus
domum regibus 15
Attalicis factam quae ad |
habitandum semper
datur ei (10)
qui civitatis gerit
sacerdotium. item
Lacedaemone e
quibusdam
b
parietibus etiam picturae
excisae intersectis
lateribus
inclusae sunt in ligneis
formis et in
comitium ad
ornatum aedilitatis Varronis
et Murenae fuerunt
adlatae. 20
10 Croesi domus, quam
Sardiani civibus ad |
requiescendum (15)
aetatis otio, seniorum
collegio gerusiam
dedicaverunt. item
Halicarnasso potentissimi
regis Mausoli domus
cum Proconnesio
marmore omnia haberet
ornata, parietes
habet
latere structos, qui ad hoc
tempus egregiam
praestant firmitatem
25
ita tectoriis operibus
expoliti uti vitri
perlucidi|tatem (20)
videantur habere. neque is
rex ab inopia id
fecit. infinitis enim
vectigalibus erat
fartus, quod
imperabat
11 Cariae toti. acumen autem
eius et sollertiam ad
1 arbitrio x.
3 pretia x.
4 octogesimas x. | ea: ex x.
5 pacta: parte x.
ba
9 nonnullis: n̄ ın ıllıſ (sic) S.
12 hy(i S)mectiū HS,
himettiū G. |
tentelensem x. | item
paries HG, itaque paries
S.
13 lapideae (-ee G, -eę S) x.
14 arretio (sic) x.
15 tralibus x. | domus …
(16) facta x.
20 murrenę G.
22 otio H et (in ras.) S: octo
G (ī aꝉ otiū Gc in m.).
23 alicarnasso HS(Gc):
helicarnassio G. |
proconnensio x.

28 ininfinitis x.
29 toti om. (in albo) S.
Plain text
aedificia paranda sic licet
considerare. cum
esset enim
natus Mylasis et
animadvertisset
Halicarnasso locum
naturaliter
esse munitum
em|poriumque
idoneum portum
utilem, (25)
ibi sibi domum constituit. is
autem locus est
theatri curvaturae
similis. itaque in imo
secundum portum
forum 5
est constitutum. per
mediam autem
altitudinis
curvaturam
praecinctionemque platea
ampla lati|tudine
facta, in qua 50
media Mausoleum ita
egregiis operibus est
factum ut in
septem spectaculis
nominetur. in
summa arce media
Martis
fanum habens statuam
colossicam
acrolithon nobili
manu 10
Leocharis factam. hanc
autem statuam | alii
Leocharis (5)
alii Timothei putant esse. in
cornu autem summo
dextro
Veneris et Mercurii fanum
ad ipsum Salmacidis
fontem.
12 is autem falsa opinione
putatur venerio
morbo inplicare
eos qui ex eo biberint. sed
haec opinio quare
per orbem 15
terrae falso rumore sit
pervagata non
pigebit ex|ponere.
(10)
non enim quod dicitur
molles et inpudicos
ex ea aqua
fieri, id potest esse, sed est
eius fontis potestas
perlucida
saporque egregius. cum
autem Melas et
Arevanias ab
Argis et Troezene coloniam
communem eo loci
deduxerunt, 20
barbaros Caras et Lelegas
eiecerunt. hi autem
ad mon|tes (15)
fugati inter se
congregantes
discurrebant et ibi
latrocinia
facientes crudeliter eos
vastabant. postea
de colonis unus
ad eum fontem propter
bonitatem aquae
quaestus causa
tabernam omnibus copiis
instruxit eamque
exercendo eos 25
barbaros allectabat. ita
singillatim
decurrentes et ad
coetus
coetus
| convenientes e duro
feroque more
commutati in
Graecorum (20)
consuetudinem et
suavitatem sua
voluntate
reducebantur.
ergo ea aqua non inpudici
morbi vitio sed
humanitatis
2 my(i S)lasus HS (-sis
HcScG, sed -sus Gc). |
alicarnasso (-naso S) x. |
locū G: loco cum H,
lo≣͜cū S.
10 colossicam: colossi (colosi
G) quam x.
11 telocharis HS(Gc), telo
claris G. | teleocharis
HS(Gc), teleo claris G.
13 mercuri H (-rii GS).
15 biberint HS: -runt G.
16 falsorum (-oꝝ S) ore x.
19 (saporque) eius. e̶i̶u̶s̶ add.
S.
20 troezen x.
21 eiecerunt S: eicerunt HG.
26 allectabat HG: -tavit S. |
singillatim HG:
singulatim S.
27 feroque Sc: ferroque x.
29 in(m S)pudico x. | morbo
ante corr. H.
Plain text
dulcedine mollitis animis
barbarorum eam
famam
est adepta.

13 Relinquitur nunc quoniam


ad explicationem
moenium
| eorum sum invectus, ut
tota uti sunt
definiam.
quemadmodum (25)
enim in dextra parte fanum
est Veneris et fons 5
supra scriptus, ita in
sinistro cornu regia
domus, quam
rex Mausolus ad suam
rationem conlocavit.
conspicitur
enim ex ea | ad dextram
partem forum et
portus moeniumque
51
tota finitio, sub sinistram
secretus sub
moenibus latens
portus ita ut nemo posset
quid in eo gereretur
quid in eo gereretur
aspicere 10
14 nec scire, at rex ipse de
sua domo remigibus
et militibus
sine ullo sciente | quae
opus essent
imperaret. itaque (5)
post mortem Mausoli
Artemisia uxore eius
regnante Rhodii
indignantes mulierem
imperare civitatibus
Cariae totius,
armata classe profecti sunt
uti id regnum
occuparent. 15
tum Artemisiae cum esset
id renuntiatum, in
eo portu
abstrusam classem celatis
remigibus | et
epibatis comparatis,
(10)
reliquos autem cives in
muro esse iussit.
cum autem
Rhodii ornata classe in
portum maiorem
exposuissent,
plausum iussit ab muro bis
dare pollicerique se
oppidum 20
tradituros. qui cum
penetravissent intra
murum relictis
navibus inanibus, Artemisia
repente fossa facta
in | pelagus (15)
eduxit classem ex portu
minore et ita invecta
est in
maiorem. expositis autem
militibus classem
Rhodiorum
inanem abduxit in altum.
ita Rhodii non
habentes quo 25
se reciperent, in medio
conclusi in ipso foro
sunt trucidati.
15 ita Artemisia in navibus
Rhodiorum suis
militibus et
4 ut tota: tota (GSc, totā
HSGc) x (om. ut).
7 mausolus GS: manu solus
H.
8 portus HcS: portum HG.
9 sinistram x (-tra Sc). |
moenibus: montibus x.

10 geratur x.
11 at: ut x.

12 imperaret: sperarent S,
spirarent HGSc (-ret Gc).
13 artemi(e ante corr. S,
item v. 8. 19)siam (-ā S)
uxorē eius regnantem (-
tē GS) x (-re … te Sc). |

rhodii HG (≣rodii Sc):
throdii (ante ras., et sic
semper exc. v. 24. 25) S.
16 (item v. 22) tum HS: tunc
G.
19 hrodii (hic ante corr.) G. |
ornatam classem G (non
HS).
20 dare G: darent HS.
20 dare G: darent HS.
22 inanibus (ante corr.) om.
G. | pelagum (-ū GS) x (-
us L).

Plain text
remigibus | inpositis
Rhodum est
profecta. Rhodii
autem (20)
cum prospexissent suas
naves laureatas
venire, opinantes
cives victores reverti hostes
receperunt. tum
Artemisia
Rhodo capta principibus
occisis tropaeum in
urbe Rhodo
suae victoriae constituit
aeneasque duas
statuas fecit unam 5
Rhodiorum civita|tis
alteram suae
imaginis. eam ita
figuravit (25)
Rhodiorum | civitati
stigmata
inponentem. ideo
autem 52
postea Rhodii religione
inpediti quod nefas
est tropaea
dedicata removeri, circa
eum locum
aedificium
struxerunt
et ita erecta Graia statione
texerunt ne qui
posset aspicere, 10
et id αβατον voci|tari
iusserunt. (5)

16 Cum ergo tam magna


potentia reges non
contempserint
latericiorum parietum
structuras, quibus et
vectigalibus et
praeda saeptis licitum
fuerat non modo
caementicio aut
quadrato saxo sed etiam
marmoreo habere,
non puto oportere
15
| inprobari quae sunt e
latericia structura
facta aedificia, (10)
dummodo recte sint tecta.
sed id genus quid ita
populo Romano in urbe
fieri non oporteat
exponam, quaeque
sint eius rei causae et
rationes non
praetermittam.
17 leges publicae non
patiuntur maiores
crassitudines quam
20
sesquipedales constitui |
loco communi. ceteri
autem parietes, (15)
ne spatia angustiora fiant,
eadem crassitudine
conlocantur.
latericii vero nisi diplinthii
aut triplinthii fuerint,
sesquipedali
crassitudine non possunt
plus unam sustinere
contignationem.
in ea autem maiestate
urbis et civium
infinita 25
frequentia innumerabiles
habitationes | opus
est explicare. (20)
ergo cum recipere non
possent areae
planatae
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

ebookbell.com

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy