0% found this document useful (0 votes)

35 views20 pages

Lecture BDS 3 23 24 Print

This document discusses constructing a test for variable selection in linear and generalized linear models. It proposes using a test statistic based on the difference between the sum of squared errors under the null and alternative hypotheses. The test statistic follows an F-distribution under the null hypothesis for linear models and approximately so for generalized linear models. The document motivates this approach and provides details on calculating the test statistic.

Uploaded by

Victor Van der Wel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views20 pages

Lecture BDS 3 23 24 Print

Uploaded by

Victor Van der Wel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Big Data Statistics, meeting 3: Testing

and variable selection for linear and

generalized linear models

14 February 2024
Constructing a test
Let us now look at how to test the hypothesis in the Remark (testing significance), part
(ii).
■ The first thing to note is that the regression model in H0 is nested in the model in
H1 .
■ Nested means here: Any regression function belonging to H0 also belongs to H1 if
we choose β1 , . . . , βd appropriately.
■ For H0 and H1 in the Remark (testing significance), part (ii) this property holds
because for a given function

β1 xi1 + . . . + βr xir

we can choose (β1 , . . . , βr , βr+1 , . . . , βd ) in H1 as (β1 , . . . , βr , 0, . . . , 0) to end

up with
β1 xi1 + . . . + βr xir .
■ Today, only tests for nested models.

8
Constructing a test (cont’d)
A simple yet powerful idea for constructing a test statistic is as follows:
■ Under H0 unknown vector is: β r = (β1 , . . . , βr ).
■ Estimating the unknown vector β r = (β1 , . . . , βr ) by least squares we have

r T r −1 r
(Xnr )T Yn

(Xn ) Xn = β̂ n ,

where the design matrix is

 
x11 x12 . . . x1r
 x21 x22 . . . x2r 
Xnr =  .
 
.. .. .. ..
 . . . . 
xn1 xn2 . . . xnr

■ With this estimator we can calculate the sum of squared errors (or sum of squared
residuals) SSEH0 under H0 which is given by
n
X r
X
SSEH0 = (yi − β̂jr xij )2 .
i=1 j=1
9
Constructing a test (cont’d)
■ Under the alternative the unknown vector is β = (β1 , . . . , βr , βr+1 , . . . , βd ).
■ This is just our good old friend from Lecture 1 which can be estimated by
(cf. Lecture 1)
(XnT Xn )−1 XnT Yn = β̂ n .
■ With this estimator we can calculate the sum of squared errors SSEH1 under H1
which is given by
X n d
X
SSEH1 = (yi − β̂j xij )2 .
i=1 j=1

■ Intuitively, if including the covariates Xr+1 , . . . , Xd has not much added value we
should have
SSEH0 ≈ SSEH1 .

10
Constructing a test (cont’d)
■ The test statistic builds indeed on the difference between SSEH0 and SSEH1
which is
◆ large if the alternative is true;
◆ small if the null hypothesis is true.
■ Our test statistic is defined by

(n − d) (SSEH0 − SSEH1 )
F = .
d−r SSEH1

■ The factor (n − d)/(d − r) is for mathematical convenience.

■ Thus our test statistic is basically the difference between sum of squared errors
under H0 and the sum of squared errors under H1 relative to sum of squared errors
under the H1 .

11
Constructing a test (cont’d)
■ For the test statistic to be useful we need its distribution under H0 .
■ Thinking back to Lecture 1 we conjecture that this distribution will depend on
whether we
◆ whether we have the classical linear model (either with fixed or random
design):
◆ or the linear model (either with fixed or random design).
■ This conjecture is indeed true and we have that
◆ under H0 our test statistic F has exactly an F -distribution with (d − r) and
n − d degrees of freedom for the classical linear model (either with fixed or
random design);
◆ under H0 our test statistic F has approximately an F -distribution with
(d − r) and n − d degrees of freedom for the linear model (either with fixed
or random design).

12
Motivating test statistic
■ Our above test for testing the hypothesis on slide 6 for linear models builds on
◆ The squared residuals (or squared errors) under H0 ; and
◆ The squared residuals (or squared errors) under H1 .
■ If we want to use a similar test construction for testing H0 on slide 14 for GLMs
we need to define residuals for GLMs.
■ Let us recall the comparison
Pd made by residuals for linear models: They compare
yi to its expected value j=1 βj xij !
■ Because GLMs also explain the expected value of the responses it is meaningful to
define residuals (or errors) for GLMs analogously to linear models. For GLMs one
defines residuals and standardized residuals by

yi − h(β̂1 xi1 + . . . + β̂d xid )

ei = yi − h(β̂1 xi1 + . . . + β̂d xid ) and ei = q , (2)
σ̂i2

respectively, where σ̂i2 is the estimated variance; see Lecture 2, slide 38 and
Exercise sheet 2.

15
The test statistic
Let’s make it more concrete what we do.
■ Suppose we have n independent Yi each with pmf or pdf according to the first
element of GLMs, i.e. of the form

i yi θi − b(θi )
f Yθ (yi ) = exp − c(ψ, yi ) , yi ∈ D. (3)
ψ

■ We have two different models for their expected values (r < d as before)
r
X d
X
Model 0: E[Yi ] = h( βj xij ) and Model 1: E[Yi ] = h( βj xij ).
j=1 j=1

■ From slide 36 of Lecture 2 we know that the θi s in (3) are determined by the
choice of h and the linear regression function. Therefore, two models for E[Yi ]
imply two models for the θi s
X r d
X
Model 0: θi = (b′ )−1 (h( βj xij )) and Model 1: θi = (b′ )−1 (h( βj xij )).
j=1 j=1
(4)
19
The test statistic (cont’d)
■ What we want to know is whether we can work with the more parsimonious
Model 0, i.e. whether we do not need the regressors r + 1, . . . , d, or whether they
really contribute to explaining the response.
■ More formally we want to test

H0 : (βr+1 , . . . , βd ) = 0, r < d, against H1 : (βr+1 , . . . , βd ) 6= 0. (5)

■ For a concrete example think of the CPS data and the response "how often
employee i is laid off” (see slide 14).
■ We proceed as follows: We use the likelihood (see slide 36 of Lecture 2) to get
MLEs β˘1 , . . . , β̆r for β1 , . . . , βr , i.e. we fit Model 0 to the data.
■ This gives us estimated values θ̆i under Model 0 for the θi s by plugging them into
the relation (4), i.e.   
r
X
θ̆i = (b′ )−1 h  β̆j xij  .
j=1

20
The test statistic (cont’d)
■ What’s next? Right, we do the same for Model 1.
■ When fitting Model 1 to the data using the likelihood we estimate d parameters.
■ Denote the MLEs for β1 , . . . , βd under Model 1 by β̂1 , . . . , β̂d .
■ The corresponding estimators for the θi s are
  
X d
θ̂i = (b′ )−1 h  β̂j xij  .
j=1

■ Remark: Note that we estimate more parameters under Model 1 than under
Model 0 and that typically also the MLEs β̂1 , . . . , β̂r for β1 , . . . , βr are different
from β̆1 , . . . , β̆r .
■ Let’s see what we have
◆ We fitted two models; and
◆ Above we spent quite some time finding ways to transfer the idea of the SSE
based test to GLMs.
■ Well, if we have these two ingredients let us combine them.

21
The test statistic (cont’d)
■ Combining means calculating

yi θ̆i −b(θ̆i )
Q
n
i=1 exp ψ − c(ψ, yi )
Ln = −2 log  Q  .
n yi θ̂i −b(θ̂i )
i=1 exp ψ − c(ψ, yi )

■ This is the log likelihood ratio test for (5) or in terms of regression functions for
testing (1).
■ Note the idea: We compare
◆ The maximum value of the likelihood function under Model 0 (our numerator)
to
◆ The maximum value of the likelihood function under Model 1 (our
denominator).
■ Under the null hypothesis Ln has asymptotically a χ2 distribution with d − r
degrees of freedom.

22
Null model and full model
■ If you are familiar with GLMs (or in case you are not but open a book on GLMs)
you know (will learn) that GLM researchers have a soft spot for two particular
models:
◆ the full model; and
◆ the null model.
Pn
■ Definition (full model): We call the model with linear predictor j=1 βj xij = βi
for all i = 1, . . . , n the full model (cf. Week 2, Quiz 3, Question 2); this means

E[yi ] = h(βi ).

■ Definition
Pn (null model): We call the model with linear predictor
j=1 βj xij = β1 for all i = 1, . . . , n the null model; this means

E[yi ] = h(β1 ).

■ We see that these two models are the most extreme ones we can think of because:
full model has n parameters for n observations, null model has only one parameter.

23
Prediction error
■ Above we discussed hypothesis tests to choose between two nested models.
■ In this section we look at other criteria to make such a choice.
■ Our setting will be as follows
◆ For each response Yi we also have measured d covariates xi1 , . . . , xid .
◆ We assume a linear relation between the expected values of the Yi and the
covariates.
◆ In our linear regression model we include M covariates. Here M can be any
number between 1 and d. I use here M instead of r to make clear that we
follow a different approach in this section.
◆ We are interested in predicting k future responses denoted by
YF = (Y1F , . . . , YkF ).
◆ We denote the expectations of this future responses by
µY F = (µY F , . . . , µY F ).
1 k

◆ The covariates associated with YF that will be used in our model are denoted
by xFij , 1 ≤ i ≤ k, 1 ≤ j ≤ M .

26
Prediction error (cont’d)
Our setting (cont’d)
■ If we knew the regression coefficients our prediction function for Y F would be
PM i
F
j=1 βj xij .
■ !!! Note that here we may have
M
X
µY F 6= βj xFij ,
i
j=1

because we may have missed relevant covariates as M can be smaller than d.

■ Because we do not know β1 , . . . , βM we replace them by estimates β̆1 , . . . , β̆M
obtained from observed Y = (Y1 , . . . , Yn ) and xi = (xi1 , . . . , xiM ), 1 ≤ i ≤ n,
i.e. our prediction function becomes
M
X
β̆j xFij .
j=1

Here as before we do not use a hat because we have estimates for a model that
does not use all covariates. 27
Prediction error (cont’d)
■ Having described our setting a performance measure is needed.
■ As a measure for the performance of our model we look here at the squared
prediction error (SPE), i.e.
k
X M
X
E[(YiF − β̆j xFij )2 ].
i=1 j=1

■ We will look at the special case k = n and xFij = xij . In this case the derivations
simplify a bit without affecting the main message.
■ As a first step for calculating
PM the SPE we determine the squared error between
µY F and our prediction j=1 β̆j xj . We find
i

 2 
n
X M
X
E µY F − β̆j xij  
i
i=1 j=1
 2 
n
X M
X M
X M
X
= E µY F − E[ β̆j xij ] + E[ β̆j xij ] − β̆j xij  
i
i=1 j=1 j=1 j=1
28
Prediction error (cont’d)
■ The expectation on the previous slide can be re-written as
 2 
n
X M
X
E µY F − E[ β̆ij xj ] 
i
i=1 j=1
  
n
X M
X M
X M
X
+2 E µY F − E[ β̆j xij ] E[ β̆j xij ] − β̆j xij 
i
i=1 j=1 j=1 j=1
 2 
n
X M
X M
X
+ E E[ β̆j xij ] − β̆j xij  
i=1 j=1 j=1

■ The term inside the first expectation is constant. Hence, we can drop the
expectation.
■ The second expectation is zero.
■ The third expectation is actually a variance which can be re-written as on the next
slide.

29
Prediction error (cont’d)
We have
 2 
n
X M
X M
X
E E[ β̆j xij ] − β̆j xij  
i=1 j=1 j=1

= tr(Var[XnM ((XnM )T XnM )−1 (XnM )T Y])

tr XnM ((XnM )T XnM )−1 (XnM )T Var[Y](XnM ((XnM )T XnM )−1 (XnM )T )T

=
σ tr Xn ((Xn ) Xn ) (Xn ) = M σ 2 ,
2 M M T M −1 M T

=

where
■ tr denotes the trace of a matrix;
■ the matrix XnM is the design matrix as on slide 9 but with M in place of r;
■ σ 2 = Var[Yi ] with Yi as on slide 27; and
■ the last equality used that the trace of a projection matrix equals its rank (here M ).

30
Prediction error (cont’d)
■ Let us now calculate the (SPE).
■ We have
   
X n XM n
X M
X
E (YiF − β̆j xij )2  = E (YiF − µY F + µY F − β̆j xij )2 
i i
i=1 j=1 i=1 j=1
 
n
X h i Xn M
X
= E (YiF − µY F )2 + E (µY F − β̆j xij )2 
i i
i=1 i=1 j=1
 2
n
X M
X
= nσ 2 + M σ 2 + µY F − E[ β̆j xij ] ,
i
i=1 j=1

where
PM
◆ The second equality used E[(YiF
− µY F )(µY F − j=1 β̆j xij )] = 0; and
i i h i
PM
◆ The third equality that we already calculated E (µY F − j=1 β̆j xij )2 on
i
the previous slides.
31
Prediction error (cont’d)
The formula for the SPE just derived tells us several interesting things
■ The first term, i.e. nσ 2 , is unavoidable because it comes from the fact that YiF is
random and fluctuates around its expected value µY F ;
i
■ The second term, i.e. M σ 2 , stems from estimating the M regression coefficients.
It is an estimation error; 2
Pn PM
■ The third term, i.e. i=1 µY F − E[ j=1 β̆j xij ] , can be seen as an
i
approximation error because µY F is the true expectation of Yi that we approximate
i
with our M regressors;
■ We can decrease the approximation error by increasing the covariates used in our
regression function. Yet, this will increase the estimation error;
■ We can decrease the estimation error by decreasing the number of covariates used
in our regression function. Yet, this will increase the approximation error.
The last two bullets are known as variance-bias trade-off. It is a fundamental property
of prediction in all statistical models.

32
Prediction error (cont’d)
■ Given the original sample y1 , . . . , yn and x11 , . . . , xnd (and no future values) we
can estimate the sum of the second and third part of SPE, i.e.,
 2
n
X M
X
M σ2 + µY F − E[ β̂j xFij ] ,
i
i=1 j=1

by
 2
n
X M
X
M σ̂ 2 + yi − β̂j xij  , (6)
i=1 j=1

where σ̂ 2 is an estimator for the variance that uses all covariates, i.e.
n d
1 X X
σ̂ 2 = (yi − β̂j xij )2 .
n−d
i=1 j=1

■ We can determine (7) for any subset of regressors we are interested in and then
choose the one for which this is smallest.
34

Flash BTC Sender Guide
No ratings yet
Flash BTC Sender Guide
3 pages
HW 03 Sol
No ratings yet
HW 03 Sol
9 pages
Applied Linear Regression Models 4th Ed Note
No ratings yet
Applied Linear Regression Models 4th Ed Note
46 pages
803 Manual en PDF
No ratings yet
803 Manual en PDF
12 pages
OH-SFF Naval Manual
No ratings yet
OH-SFF Naval Manual
180 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
AEphd 2023 Week 3
No ratings yet
AEphd 2023 Week 3
33 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
EC501 Lecture 03
No ratings yet
EC501 Lecture 03
30 pages
2101 F 17 Assignment 1
No ratings yet
2101 F 17 Assignment 1
8 pages
Presentation SICEAMS 2024
No ratings yet
Presentation SICEAMS 2024
31 pages
Machine Learning Lecture Notes Undergrad
No ratings yet
Machine Learning Lecture Notes Undergrad
19 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
Linear Stochastic Models: 5.1 Least Squares
No ratings yet
Linear Stochastic Models: 5.1 Least Squares
12 pages
WST 311 Notes Part 2 2024
No ratings yet
WST 311 Notes Part 2 2024
21 pages
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
No ratings yet
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
23 pages
Presentation RAMSDS 2024 Final
No ratings yet
Presentation RAMSDS 2024 Final
30 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
SRM Notes
No ratings yet
SRM Notes
38 pages
Exercises
No ratings yet
Exercises
15 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
Teaching Notes 3
No ratings yet
Teaching Notes 3
10 pages
Lecture10 Regression2 TS PDF
No ratings yet
Lecture10 Regression2 TS PDF
22 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Introductory Econometrics: Multiple Regression: Inference
No ratings yet
Introductory Econometrics: Multiple Regression: Inference
38 pages
ECMT1020 Formulas 2021
No ratings yet
ECMT1020 Formulas 2021
9 pages
Formula Sheet
No ratings yet
Formula Sheet
8 pages
Final 100b w21
No ratings yet
Final 100b w21
5 pages
Econometrics - Exercise Set 2 (Solution)
No ratings yet
Econometrics - Exercise Set 2 (Solution)
12 pages
Linear Models
No ratings yet
Linear Models
35 pages
Week03 Lecture BB
No ratings yet
Week03 Lecture BB
112 pages
Gary Chamberlain Econometric S
No ratings yet
Gary Chamberlain Econometric S
152 pages
EC501 Lecture 04
No ratings yet
EC501 Lecture 04
30 pages
Chapter 3 Multiple Linear Regression: Ray-Bing Chen Institute of Statistics National University of Kaohsiung
No ratings yet
Chapter 3 Multiple Linear Regression: Ray-Bing Chen Institute of Statistics National University of Kaohsiung
45 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
Regression 101
No ratings yet
Regression 101
18 pages
4.1 Multiple Regression Models
No ratings yet
4.1 Multiple Regression Models
6 pages
125.785 Module 2.2
No ratings yet
125.785 Module 2.2
95 pages
An Introduction To Statistical Learning
No ratings yet
An Introduction To Statistical Learning
19 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Topic 3a
No ratings yet
Topic 3a
64 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
No ratings yet
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
12 pages
TS Themes4&5
No ratings yet
TS Themes4&5
15 pages
Econometrics Chapter Three
No ratings yet
Econometrics Chapter Three
55 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
CH 04
No ratings yet
CH 04
34 pages
Module 5
No ratings yet
Module 5
24 pages
Statistic SimpleLinearRegression
No ratings yet
Statistic SimpleLinearRegression
7 pages
Regress
No ratings yet
Regress
11 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
Air Act 1981 Project Arjun Dubey 4046
No ratings yet
Air Act 1981 Project Arjun Dubey 4046
3 pages
Modal Verbs - Will
No ratings yet
Modal Verbs - Will
2 pages
Adjusting Review 2
No ratings yet
Adjusting Review 2
9 pages
NEBOSH IGC1-PART-2 (Answer)
No ratings yet
NEBOSH IGC1-PART-2 (Answer)
4 pages
Gangguan Pendengaran Dan Kelainan Telinga
No ratings yet
Gangguan Pendengaran Dan Kelainan Telinga
157 pages
G4-T3 Exponential Moving Average (EMA)
No ratings yet
G4-T3 Exponential Moving Average (EMA)
4 pages
2279 Performance Based Wind Resistant Design For High Rise Structures in Japan PDF
No ratings yet
2279 Performance Based Wind Resistant Design For High Rise Structures in Japan PDF
14 pages
ZKTeco-Quốc - Phone 0904848459
No ratings yet
ZKTeco-Quốc - Phone 0904848459
10 pages
Company Profile Acurate Packtech
No ratings yet
Company Profile Acurate Packtech
6 pages
Presenatation On SIP by Saral Jain
No ratings yet
Presenatation On SIP by Saral Jain
12 pages
Brand Loyalty vs. Repeat Purchasing Behavior
No ratings yet
Brand Loyalty vs. Repeat Purchasing Behavior
9 pages
Combat Aircraft Journal (February 2021)
100% (5)
Combat Aircraft Journal (February 2021)
102 pages
Capital Gains III
No ratings yet
Capital Gains III
14 pages
Chapter: 9.8 HTML Images Topic: 9.8.1 HTML Images: E-Content of Internet Technology and Web Design
No ratings yet
Chapter: 9.8 HTML Images Topic: 9.8.1 HTML Images: E-Content of Internet Technology and Web Design
7 pages
SO12913 ORBITech PDF
No ratings yet
SO12913 ORBITech PDF
1 page
English Set: Sl. No
No ratings yet
English Set: Sl. No
12 pages
FD ch5 PPT Hull
No ratings yet
FD ch5 PPT Hull
37 pages
Canadian Manual On Foundation Engineering
No ratings yet
Canadian Manual On Foundation Engineering
297 pages
Research Proposal
No ratings yet
Research Proposal
5 pages
5081505-02-GB Servicemanual ULUF450 - 490 - 850 - 890 - 750 (G-214)
No ratings yet
5081505-02-GB Servicemanual ULUF450 - 490 - 850 - 890 - 750 (G-214)
60 pages
Q3 Brochure
No ratings yet
Q3 Brochure
24 pages
Gucci Strategic MGT
0% (1)
Gucci Strategic MGT
18 pages
Lead Stringer Monthly Check List
No ratings yet
Lead Stringer Monthly Check List
2 pages
Bourns N1027 4300 Vs 4600 FPB
No ratings yet
Bourns N1027 4300 Vs 4600 FPB
23 pages
CPT5 - Short Circuit Analysis - July 25, 2005
100% (3)
CPT5 - Short Circuit Analysis - July 25, 2005
235 pages
Quiz ôn tập thi cuối kỳ Attempt review
No ratings yet
Quiz ôn tập thi cuối kỳ Attempt review
9 pages
Technology Newsletter
No ratings yet
Technology Newsletter
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture BDS 3 23 24 Print

Uploaded by

Lecture BDS 3 23 24 Print

Uploaded by

Big Data Statistics, meeting 3: Testing

and variable selection for linear and

we can choose (β1 , . . . , βr , βr+1 , . . . , βd ) in H1 as (β1 , . . . , βr , 0, . . . , 0) to end

where the design matrix is

■ The factor (n − d)/(d − r) is for mathematical convenience.

yi − h(β̂1 xi1 + . . . + β̂d xid )

H0 : (βr+1 , . . . , βd ) = 0, r < d, against H1 : (βr+1 , . . . , βd ) 6= 0. (5)

because we may have missed relevant covariates as M can be smaller than d.

= tr(Var[XnM ((XnM )T XnM )−1 (XnM )T Y])

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.